1 C-Tagger Development Status Update Kittikul Kovitanggoon* (CU) Gerrit Van Onsem (VUB) Dinko Ferencek (RU) CMS-POG BTAG-WG Meeting June 18, 2015
Overview CMSSW-based c-tagger based on CMSSW-based CSV b-tag algorithm (using the JetTagMVAExtractor code, so called ‘VariableExtractor’ for the tree production) TMVA-based c-tagger(s) – “v1” using Clemens' setup (using ‘VariableExtractor’ for the tree production) – “v2” using Dinko and Stevens' setup (using ‘BTagAnalyzer’, PAT–based, for the tree production) C-tag twiki with work plan, link to presentations, We are aiming to use the BTagAnalyzer as the future tree production for at least the TMVA- based b/c taggers because BTagAnalyzer are in general more flexible and better maintained by BTV group (involving more expertise) than the current VariableExtractor. In this talk, we will present the status and problem we are encountering.
3 Introduction of TMVA-based v2 ● TMVA-based c-tagger (i.e. training outside CMSSW) based on the b-tagging setup of Dinko Ferencek (and others)[1][2] [1] [2] [1] BTagAnalyzer is used to extract ntuple containing variables for b- tag training Current set up is using the CSV not IVF Studying BTagAnalyzer and TagVarExtractor in order to be modified for c-tag purpose Including IVF on top of CSV Optimized the selections based on charm quark kinematics Adding ATLAS variables
4 Setting Up BTagAnalyzer ● Set up by using BTagAnalyzer 'lite' version which including the same IVF as VariableExtractor with CMSSW_5_3_20 ● Adding selections to distinguish charm jets from b jets - process.inclusiveVertexFinder.vertexMinDLen2DSig = cms.double(1.25) #2.5 sigma for b tagger. However, lifetime D mesons on average about half of lifetime of B meson -> half of significance - process.inclusiveVertexFinder.vertexMinDLenSig = cms.double(0.25) #0.5 sigma for b tagger. However, lifetime D mesons on average about half of lifetime of B meson -> half of significance - process.inclusiveSecondaryVertexFinderTagInfosAODPFlow.vertexCuts.distSig2dMin = 1.5 #default value 2.0 to release cuts on flight dist. However, lifetime D mesons on average about half of lifetime of B meson -> half of distance ● Adding variables to BTagAnalyzer to match the VariableExtractor including ATLAS variables Dataset: /TTJets_MassiveBinDECAY_TuneZ2star_8TeV-madgraph-tauola/ Summer12_DR53X-PU_S10_START53_V7A-v1/AODSIM Running on the same root file 0244AEA1-7CE1-E B D4C3C.root for the exact same 100 events (event numbers checked)
5 Workflows BTagAnalyzer (PAT Based) VariableExtractor (CMSSW Based) selectedPATAK5PFJets with PF2PAT Hadron-based jet flavour inclusiveSVFinder TagInfosAODPFlow IPTagInfos AODPFlow CombinedSVComputerV2 Ntuple PF CHS selectedAK5PFJets Hadron-based jet flavour inclusiveSVFinder TagInfos IPTagInfos CombinedSVComputerV2 Ntuple
6 Status of BTagAnalyzer for C-tag At first, there are several discrepancies between the trees of BTagAnalyzer and VariableExtractor. Most of the differences are identified and solved. All jet kinematic and SV variables are agreed between the both frameworks. More comparisons are in the BackUp. ➢ Changed Track weight in CombinedSVComputerV2 ➢ Turned off PF2PAT ➢ Turned off PF CHS ➢ Removed the filter in BtagAnalyzer ➢ Used raw jet pT ➢ Used only default sorting by sip2dSig ➢ Same GT for both VariableExtractor and BTaganalyzer “START53_V27” ➢ Raw Jet pT > 30 GeV and |eta|<2.4 ● However, The track variables at the first track (“_0”) have good agreements ● The agreements in the shape of the second (“_1”) and the third track (“_2”). There might be some selections different between the two frameworks.
7 VariableExtractor VS BTagAnalyzer
8
9 Investigating by Printing out Variables ● We are also using “cout” into the BtagAnalyzer, CombinedSVComputerV2, and TagVarExtractor. 1. BTagAnalyzer gives the same values as from CombinedSVComputerV2 for each jets. 2. If flightDistance2dVal is not exist in CombinedSVComputerV2,BTagAnalyzer will fill as set for the default value. 3. The track variables are sorted with trackSip2dsig. 4. In some events, BTagAnalyzer gives more jets than CombinedSVComputerV2. This is due to the Computer will not save the jets with less than 1 track. ● Gerrit provided the “cout” from the VariableExtractor. It was checked. We have the same jets and same track variable values. ● While BtagAnalyzer is using the TagVarExtractor to get the flat tree, VariableExtractor is using the local script to get the flat tree. We suspect that this might be the cause of discrepancies we saw in the second and third track variables.
10 Conclusions ● All variables are well agreed between BTagAnalyzer and VariableExtractor except the second and third track variables. ● Need to check the local script to get the flat tree for VariableExtractor. ● As of now our priority is to implement the C-tag in CMSSW for Run II, this study will be less piority.
Back Up 11
12 VariableExtractor VS BTagAnalyzer
13 VariableExtractor VS BTagAnalyzer
14 VariableExtractor VS BTagAnalyzer
15 VariableExtractor VS BTagAnalyzer
16 VariableExtractor VS BTagAnalyzer
17 VariableExtractor VS BTagAnalyzer
18 VariableExtractor VS BTagAnalyzer
19 VariableExtractor VS BTagAnalyzer
20 VariableExtractor VS BTagAnalyzer
21 VariableExtractor VS BTagAnalyzer
22 VariableExtractor VS BTagAnalyzer
23 VariableExtractor VS BTagAnalyzer
24 VariableExtractor VS BTagAnalyzer
25 VariableExtractor VS BTagAnalyzer
26 VariableExtractor VS BTagAnalyzer
27 VariableExtractor VS BTagAnalyzer
28 VariableExtractor VS BTagAnalyzer
29 VariableExtractor VS BTagAnalyzer
30 VariableExtractor VS BTagAnalyzer