M. Wolter 1 05.11.2008 Optimization of tau identification in ATLAS using multivariate tools On behalf of the ATLAS Collaboration Marcin Wolter Institute.

Slides:



Advertisements
Similar presentations
Tau leptons as an exciting probe for New Physics in ATLAS
Advertisements

Current limits (95% C.L.): LEP direct searches m H > GeV Global fit to precision EW data (excludes direct search results) m H < 157 GeV Latest Tevatron.
Matthew Schwartz Harvard University March 24, Boost 2011.
INTRODUCTION TO e/ ɣ IN ATLAS In order to acquire the full physics potential of the LHC, the ATLAS electromagnetic calorimeter must be able to identify.
Implementation of e-ID based on BDT in Athena EgammaRec Hai-Jun Yang University of Michigan, Ann Arbor (with T. Dai, X. Li, A. Wilson, B. Zhou) US-ATLAS.
14 Sept 2004 D.Dedovich Tau041 Measurement of Tau hadronic branching ratios in DELPHI experiment at LEP Dima Dedovich (Dubna) DELPHI Collaboration E.Phys.J.
Summary of Results and Projected Sensitivity The Lonesome Top Quark Aran Garcia-Bellido, University of Washington Single Top Quark Production By observing.
Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.
Top Turns Ten March 2 nd, Measurement of the Top Quark Mass The Low Bias Template Method using Lepton + jets events Kevin Black, Meenakshi Narain.
Particle Identification in the NA48 Experiment Using Neural Networks L. Litov University of Sofia.
Kevin Black Meenakshi Narain Boston University
Top Physics at the Tevatron Mike Arov (Louisiana Tech University) for D0 and CDF Collaborations 1.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Introduction to Single-Top Single-Top Cross Section Measurements at ATLAS Patrick Ryan (Michigan State University) The measurement.
Higgs Detection Sensitivity from GGF H  WW Hai-Jun Yang University of Michigan, Ann Arbor ATLAS Higgs Meeting October 3, 2008.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
1 Hadronic In-Situ Calibration of the ATLAS Detector N. Davidson The University of Melbourne.
WW  e ν 14 April 2007 APS April Meeting WW/WZ production in electron-neutrino plus dijet final state at CDFAPS April Meeting April 2007 Jacksonville,
Vector Boson Scattering At High Mass
Tau Jet Identification in Charged Higgs Search Monoranjan Guchait TIFR, Mumbai India-CMS collaboration meeting th March,2009 University of Delhi.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
B-tagging Performance based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with X. Li and B. Zhou) ATLAS B-tagging Meeting February 9,
W+jets and Z+jets studies at CMS Christopher S. Rogan, California Institute of Technology - HCP Evian-les-Bains Analysis Strategy Analysis Overview:
MiniBooNE Event Reconstruction and Particle Identification Hai-Jun Yang University of Michigan, Ann Arbor (for the MiniBooNE Collaboration) DNP06, Nashville,
August 30, 2006 CAT physics meeting Calibration of b-tagging at Tevatron 1. A Secondary Vertex Tagger 2. Primary and secondary vertex reconstruction 3.
25 sep Reconstruction and Identification of Hadronic Decays of Taus using the CMS Detector Michele Pioppi – CERN On behalf.
Measurement of the branching ratios for Standard Model Higgs decays into muon pairs and into Z boson pairs at 1.4 TeV CLIC Gordana Milutinovic-Dumbelovic,
B-Tagging Algorithms for CMS Physics
FIMCMS, 26 May, 2008 S. Lehti HIP Charged Higgs Project Preparative Analysis for Background Measurements with Data R.Kinnunen, M. Kortelainen, S. Lehti,
Higgs Reach Through VBF with ATLAS Bruce Mellado University of Wisconsin-Madison Recontres de Moriond 2004 QCD and High Energy Hadronic Interactions.
Tth study Lepton ID with BDT 2015/02/27 Yuji Sudo Kyushu University 1.
Tau06, 9 th Workshop on Tau Lepton Physics Pisa, Italy September 2006 Reconstruction and Identification of hadronic  -decays in ATLAS Fabien Tarrade.
CALOR April Algorithms for the DØ Calorimeter Sophie Trincaz-Duvoid LPNHE – PARIS VI for the DØ collaboration  Calorimeter short description.
Software offline tutorial, CERN, Dec 7 th Electrons and photons in ATHENA Frédéric DERUE – LPNHE Paris ATLAS offline software tutorial Detectors.
Study on search of a SM Higgs (120GeV) produced via VBF and decaying in two hadronic taus V.Cavasinni, F.Sarri, I.Vivarelli.
Analysis of H  WW  l l Based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with T.S. Dai, X.F. Li, B. Zhou) ATLAS Higgs Meeting September.
B-tagging based on Boosted Decision Trees
B-Tagging Algorithms at the CMS Experiment Gavril Giurgiu (for the CMS Collaboration) Johns Hopkins University DPF-APS Meeting, August 10, 2011 Brown University,
Mark OwenManchester Christmas Meeting Jan Search for h ->  with Muons at D  Mark Owen Manchester HEP Group Meeting January 2006 Outline: –Introduction.
Guglielmo De Nardo for the BABAR collaboration Napoli University and INFN ICHEP 2010, Paris, 23 July 2010.
 reconstruction and identification in CMS A.Nikitenko, Imperial College. LHC Days in Split 1.
Low Mass Standard Model Higgs Boson Searches at the Tevatron Andrew Mehta Physics at LHC, Split, Croatia, September 29th 2008 On behalf of the CDF and.
Study of Diboson Physics with the ATLAS Detector at LHC Hai-Jun Yang University of Michigan (for the ATLAS Collaboration) APS April Meeting St. Louis,
1 Donatella Lucchesi July 22, 2010 Standard Model High Mass Higgs Searches at CDF Donatella Lucchesi For the CDF Collaboration University and INFN of Padova.
Data Analysis Stefan Wayand 09. June 2016
ATLAS results on inclusive top quark pair
Status of the Higgs to tau tau
First Evidence for Electroweak Single Top Quark Production
Jet Production Measurements with ATLAS
Top Tagging at CLIC 1.4TeV Using Jet Substructure
Venkat Kaushik, Jae Yu University of Texas at Arlington
Search for WHlnbb at the Tevatron DPF 2009
Higgs → t+t- in Vector Boson Fusion
Multi-dimensional likelihood
陶军全 中科院高能所 Junquan Tao (IHEP/CAS, Beijing)
NIKHEF / Universiteit van Amsterdam
Overview of CMS H→ analysis and the IHEP/IPNL contributions
ISTEP 2016 Final Project— Project on
Electron Identification Based on Boosted Decision Trees
Update of Electron Identification Performance Based on BDTs
Implementation of e-ID based on BDT in Athena EgammaRec
Missing Energy and Tau-Lepton Reconstruction in ATLAS
Plans for checking hadronic energy
Jessica Leonard Oct. 23, 2006 Physics 835
Searches at LHC for Physics Beyond the Standard Model
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Performance of BDTs for Electron Identification
Status of CEPC HCAL Optimization Study in Simulation LIU Bing On behalf the CEPC Calorimeter working group.
Measurement of the Single Top Production Cross Section at CDF
Northern Illinois University / NICADD
Presentation transcript:

M. Wolter Optimization of tau identification in ATLAS using multivariate tools On behalf of the ATLAS Collaboration Marcin Wolter Institute of Nuclear Physics PAN, Kraków ACAT 2008, Erice, 3-7 November 2008

M. Wolter Higgs Processes : Exotic Processes : Standard Model : Z → ττ W→ τν τ Standard Model Higgs (VBF,ttH) qqH → qqττ, ttH → ttττ MSSM Higgs (h/A/H, H + ) h/A/H → ττ, H + → τν SUSY signature with τ ’s in final state Extra dimensions … new theories (?) important for first physics data analysis Physics processes with tau leptons ATLAS preliminary L = 100 pb -1 ATLAS preliminary E CM =14 TeV

M. Wolter Hadronic tau decays: mostly 1-prong or 3-prong decays (1 or 3 charged tracks): Observed tau jets Characteristics well-collimated calorimeter cluster Small number of associated charged tracks displaced secondary vertex Main background QCD jets – wider,higher track multiplicity, different characteristics of the track system and shapes of the calorimetric showers Tau decays τ decay π0π0 π+π+ π-π- π+π+ τ jet

M. Wolter Tau reconstruction algorithm Track-seed and calo-seed: good quality tracks (p T >6 GeV) as initial seed candidates with 1-8 quality tracks (p T >1GeV) in ΔR<0.2 from the seed the η,φ using p T weighting of tracks, check charge consistency (|Q| ≤ 2) matching cone 0.4 TopoJets (>10GeV, ΔR<0.2) as calo-seed E T (calorimetric) using H1-style calibration on cells from calo-seed E T eflow (tracking) with energy-flow method (EM calo - separating neutral/charged sources of energy) Reconstruction of π 0 subclusters Calo seed only: cone 0.4 TopoJets (>10 GeV) as calo- seed η,φ defined using calo-seed (η corrected for z vertex) looser track-quality selection, track p T >1GeV E T (calorimetric) using H1-style calibration on cells from calo-seed Track-seed only: small fraction of all candidates (few percent) Both-seeds candidates (track and calo): signal reconstruction efficiency - 64% (1-prong) and 60% (3-prong) fakes from QCD background events - 4% (1-prong) and 8% (3-prong) Significant background suppression on the reconstruction level.

M. Wolter Discriminating variables None of the variables alone gives sufficient signal/background separation: smart and efficient (multivariate) identification method needed. Analysis performed on ATLAS MC events: Signal: W->τν(had) Background: QCD jet events Calorimeter-seeded 8 identification variables. Example variables: 1)emRadius: radius of the cluster in the EM calorimeter, 2)isolationFraction: isolation fraction of transverse energy between 0.1<ΔR<0.2 around the cluster barycenter Track-seeded 3 Prong – 11 id. variables 1 Prong -- 9 id. variables Example variables: 1)m: invariant mass, 2)etisolFrac: ratio of transverse energy in 0.2 < ΔR < 0.4 to total transverse energy ATLAS preliminary ATLAS preliminary ATLAS preliminary ATLAS preliminary

M. Wolter Tau identification algorithms Cut analysis - baseline method for track-seeded candidates (fast, robust, transparent) Projected Likelihood – implemented for both track and calo-seeded candidates PDE-RS – Probability Density Estimator with Range Searches (track- seeded candidates) Neural Network – implemented for track-seeded candidates, BDT - Boosted Decision Tree algorithm, relatively new in HEP (track- seeded and calo-seeded candidates) All algorithms implemented in the TauDiscriminant package, which is a part of the ATLAS reconstruction software.

M. Wolter Cut analysis Human-optimized baseline cut method Automatically optimized - TMVA package Decorrelation → scan in signal efficiency [0  1] and maximize background rejection random sampling: robust but slow Widely used because it is transparent and robust Cut analysis applied to track-seeded candidates Signal efficienc y Background rejection ATLAS preliminary ATLAS preliminary ATLAS preliminary ATLAS preliminary

M. Wolter ● Combine probability for an event to be signal / background from individual variables to ● Method assumes uncorrelated input variables: In that case it is the optimal MVA approach, since it contains all the information usually it is not true  inferior to other more sophisticated MV methods (BDT, Neural Network, etc...) Commonly used in High Energy Physics Projected Likelihood Estimator Probability PDFs

M. Wolter Projected Likelihood Estimator Applied to both track- seeded and calo-seeded candidates Base-line method for calo- seeded candidates Good performance, low memory and CPU consumption Transparent and insensitive to small changes in the training sample. 3-prong candidates ATLAS preliminary Efficienc y

M. Wolter PDE_RS method Parzen estimation (1960s) – approximation of the unknown probability as a sum of kernel functions placed at the points x n of the training sample. Make it faster - count signal (n s ) and background (n b ) events in N-dim hypercube around the event classified – only few events from the training sample needed (PDE_RS). Hypercube dimensions are free parameters to be tuned. Discriminator D(x) given by signal and background event densities: Events stored in the binary tree – easy and fast finding of neighbor events. Adaptive version – size of the hypercube adapts to the event densities, a kernel function instead of sharp hypercube boundaries.

M. Wolter PDE_RS method Standard Adaptive Advantages:Good performance, takes correlations into account Disadvantages: High memory and CPU consumption, quite big training data samples needed Use of adaptive PDERS increases background rejection by about 5% compared to standard PDE-RS. Applied to track-seeded candidates ATLAS preliminary ATLAS preliminary

M. Wolter Neural Network Uses track-based variables Impact parameter significance added (1-prong candidates) 8 neural networks: 1,2,3-prong candidates, with/without impact parameter (1-prong candidates), with/without π 0 cluster. Good performance (uses correlations between variables, impact parameter information added). Stuttgart Neural Network Simulator (SNNS) used, feed-forward NN with two hidden layers. Trained network is converted to the C code – very fast, low memory consumption. ATLAS preliminary Efficienc y

M. Wolter Sequential application of “cuts”, the final nodes have an associated classifier value based on their purity Training: growing a decision tree: Start with Root node Split training sample according to cut on best variable at this node Splitting criterion: e.g., maximum “Gini-index” G = P  (1– P) Purity P = ∑ W s /(∑ W s + ∑ W B ) Continue splitting until min. number of events or max. purity reached Decision Trees Advantages: fast and easy to train, easy to interpret. useless/weak variables are ignored Disadvantages: small changes in training sample can give large changes in tree structure

M. Wolter Boosted Decision Trees Boosting: a general technique for improving the performance of any weak classifier. Boosted Decision Trees (1996, R.Shapire & Y.Freund): combining several decision trees (forest) derived from one training sample via the application of event weights into ONE multivariate event classifier by performing “majority vote” AdaBoost: wrongly classified training events are given a larger weight. Final classifier is a combination of weak classifiers.

M. Wolter BDT Likelihood ATLAS preliminary Boosted Decision Tree D0 experiment BDT software used for training 10 trees for boosting (AdaBoost) Similar set of variables as used by Likelihood (few additional) Use of correlations between variables – BDT has a potential to improve the performance reached by Likelihood ATLAS preliminary Efficiency

M. Wolter Summary Tau identification significantly improved by using multivariate analysis tools. All of the presented classification methods are performing well: Cuts – fast, robust, transparent for users. Projected Likelihood – a popular and well performing tool PDE_RS – robust and efficient, but large samples of reference candidates needed. Neural network – fast classification while converted to the C function after training, BDT - fast and simple training, insensitive to outliers, good performance. Relatively new in HEP Multivariate analysis is necessary, if it is important to extract as much information from the data as possible. For classification problems no single “best” method exists. What matters - is also simplicity and speed of learning and fast (and robust) classification.

M. Wolter Future plans We should prepare for real data With Monte Carlo: Variable ranking and reduction – optimal set of variables Focus on flexibility and robustness PARADIGM approach should help... With real data: Comparison of Monte Carlo with data (validation) Choice of trusted variables Systematics, systematics...

M. Wolter Backup slides

M. Wolter Comparison of various methods

M. Wolter Tau decay branching ratios