Cluster Classification Studies with MVA techniques

Slides:

Advertisements

Similar presentations

Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.

Advertisements

CBM Calorimeter System CBM collaboration meeting, October 2008 I.Korolko(ITEP, Moscow)

Tracey Berry1 Looking into e &  for high energy e/  Dr Tracey Berry Royal Holloway.

H/Abb -> 4b’s process & Multi-Et-Threshold Study for 4jet Trigger Kohei Yorita Young-Kee Kim University of the FTK Meeting on July 13 th, 2006.

Kalanand Mishra BaBar Coll. Meeting Sept 26, /12 Development of New SPR-based Kaon, Pion, Proton, and Electron Selectors Kalanand Mishra University.

Implementation of e-ID based on BDT in Athena EgammaRec Hai-Jun Yang University of Michigan, Ann Arbor (with T. Dai, X. Li, A. Wilson, B. Zhou) US-ATLAS.

1 N. Davidson E/p single hadron energy scale check with minimum bias events Jet Note 8 Meeting 15 th May 2007.

Top Turns Ten March 2 nd, Measurement of the Top Quark Mass The Low Bias Template Method using Lepton + jets events Kevin Black, Meenakshi Narain.

Particle Identification in the NA48 Experiment Using Neural Networks L. Litov University of Sofia.

Kevin Black Meenakshi Narain Boston University

1 Update on Photons Graham W. Wilson Univ. of Kansas 1.More on  0 kinematic fit potential in hadronic events. 2.Further H-matrix studies (with Eric Benavidez).

1 N. Davidson, E. Barberio E/p single hadron energy scale check with minimum bias event Hadronic Calibration Workshop 26 th -27 th April 2007.

Analysis Meeting – April 17 '07 Status and plan update for single hadron scale check with minimum bias events N. Davidson.

Energy Flow Studies Steve Kuhlmann Argonne National Laboratory for Steve Magill, U.S. LC Calorimeter Group.

In order to acquire the full physics potential of the LHC, the ATLAS electromagnetic calorimeter must be able to efficiently identify photons and electrons.

1 Calice Analysis Meeting 13/02/07David Ward Just a collection of thoughts to guide us in planning electron analysis In order to end up with a coherent.

Michele Faucci Giannelli TILC09, Tsukuba, 18 April 2009 SiW Electromagnetic Calorimeter Testbeam results.

Energy Flow and Jet Calibration Mark Hodgkinson Artemis Meeting 27 September 2007 Contains work by R.Duxfield,P.Hodgson, M.Hodgkinson,D.Tovey.

W  eν The W->eν analysis is a phi uniformity calibration, and only yields relative calibration constants. This means that all of the α’s in a given eta.

Optimizing DHCAL single particle energy resolution Lei Xia Argonne National Laboratory 1 LCWS 2013, Tokyo, Japan November , 2013.

Irakli Chakaberia Final Examination April 28, 2014.

Optimizing DHCAL single particle energy resolution Lei Xia 1 CALICE Meeting LAPP, Annecy, France September 9 – 11, 2013.

DHCAL - Resolution (S)DHCAL Meeting January 15, 2014 Lyon, France Burak Bilki, José Repond and Lei Xia Argonne National Laboratory.

SN DAQ for IceCube Marc Hellwig Universität Mainz IceCube DAQ Meeting, Berkeley, Expected Signal Time resolution in Simulation Artificial Deadtime.

Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,

1 Realistic top Quark Reconstruction for Vertex Detector Optimisation Talini Pinto Jayawardena (RAL) Kristian Harder (RAL) LCFI Collaboration Meeting 23/09/08.

Reconstruction techniques, Aart Heijboer, OWG meeting, Marseille nov Reconstruction techniques Estimators ML /   Estimator M-Estimator Background.

A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:

Ideas for in-situ calibration for the EMC S.Paganis, K.Loureiro ( Wisconsin ) input from+discussions with T.Carli, F.Djama, G.Unal, D.Zerwas, M.Boonekamp,

Development of a Particle Flow Algorithms (PFA) at Argonne Presented by Lei Xia ANL - HEP.

CaloTopoCluster Based Energy Flow and the Local Hadron Calibration Mark Hodgkinson June 2009 Hadronic Calibration Workshop.

The ATLAS detector. High energy particle physics Typical detector layout Tracking chamber ElectroMagnetic calorimeter Hadronic calorimeter Muon chamber.

Interactions of hadrons in the SiW ECAL Towards paper Naomi van der Kolk.

M. Pilar Casado 1 Optimization of Tau Menus: L1 & L2 Trigger & Physics week (19-22 March 2007) M. Pilar Casado (IFAE & UAB) on behalf of the Tau Trigger.

Missing Et Before and After Shutdown Yuri Gershtein.

Issues with cluster calibration + selection cuts for TrigEgamma note Hardeep Bansil University of Birmingham Birmingham ATLAS Weekly Meeting 12/08/2010.

Calice Meeting Argonne Muon identification with the hadron calorimeter Nicola D’Ascenzo.

L1Calo EM Efficiencies Hardeep Bansil University of Birmingham L1Calo Joint Meeting, Stockholm 29/06/2011.

C.Clement Eta RegionTraining regionVariable set |  |

SEARCH FOR DIRECT PRODUCTION OF SUPERSYMMETRIC PAIRS OF TOP QUARKS AT √ S = 8 TEV, WITH ONE LEPTON IN THE FINAL STATE. Juan Pablo Gómez Cardona PhD Candidate.

One framework for most common MVA-techniques, available in ROOT Have a common platform/interface for all MVA classification and regression-methods: Have.

LNF 12/12/06 1 F.Ambrosino-T. Capussela-F.Perfetto Update on        Dalitz plot slope Where we started from A big surprise Systematic checks.

PHENIX J/  Measurements at  s = 200A GeV Wei Xie UC. RiverSide For PHENIX Collaboration.

Mark Dorman UCL/RAL MINOS WITW June 05 An Update on Using QE Events to Estimate the Neutrino Flux and Some Preliminary Data/MC Comparisons for a QE Enriched.

Converted photon and π 0 discrimination based on H    analysis.

Check of Calibration Hits in the Atlas simulation. Assignment of DM energy to CaloCluster. G.Pospelov Budker Institute of Nuclear Physics, Novosibirsk,

1 Dead material correction status. Alexei Maslennikov, Guennadi Pospelov. Bratislava/Kosice/MPI Calorimeter Meeting. 8-December Problems with DM.

or getting rid of the give-away particles in a test-beam environment

Some introduction Cosmics events can produce energetic jets and missing energy. They need to be discriminated from collision events with true MET and jets.

Identification of isolated photons

Top Tagging at CLIC 1.4TeV Using Jet Substructure

Electron -converted photon –pi0 discrimination

Multivariate Data Analysis

ISTEP 2016 Final Project— Project on

Electron Identification Based on Boosted Decision Trees

Individual Particle Reconstruction

CMS-Bijing weekly meeting

EM Linearity using calibration constants from Geant4

2000 Diffuse Analysis Jessica Hodges, Gary Hill, Jodi Cooley

Project on H →ττ and multivariate methods

Using Single Photons for WIMP Searches at the ILC

 discrimination with converted photons

High Granularity Calorimeter Upgrade Studies

Toolkit for Multivariate Data Analysis Helge Voss, MPI-K Heidelberg

Problems with the Run4 Preliminary Phi->KK Analysis

CMS-Bijing weekly meeting

CMS-Bijing weekly meeting

Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.

Search for rare decays of W bosons

Presentation transcript:

Cluster Classification Studies with MVA techniques Motivation: Current EMFracClassification tool uses 75 TProfile 2D plots as “lookup tables”. Why not apply simple cuts? Maybe more sophisticated MVA discrimination techniques (Likelihood, ANN, ...) Why use the two cluster moments <ϱ> and _clus? Improvement of the efficiency and purity of the classification. Used as test analysis for developing a toolkit for multi variate analyses TMVA. (see http://tmva.sf.net). TMVA integration in ROOT is about to finish this week

Data set The basis of the cluster classification studies are the postrome single pions with calibration hits: http://menke.home.cern.ch/menke/cgi-bin/hec/postrome.sh Created same data sets with electrons/positrons with same software and scripts: (would be on castor already, but my grid certificate expired) # events per generated single particle energy : energy distribution for all clusters:

Which clusters are from the electron or pion? In an empty calorimeter one expects up to 12 clusters from noise in addition to the clusters from the generated single particle Take only clusters wich contain energy from calibration hits (true G4) clusters in pion sample: clusters in electron sample:

Definition of the classification samples Strategy: “Try to find the EM clusters first, and apply weights to the rest” EM clusters are the “signal” Defition of “EM clusters” EM_frac (from calibration hits) > 0.9 (not tuned yet)

Cluster moments (2 < eta <2.2; 4 < E < 16 GeV)

Cluster moments 2/3

Cluster moments 3/3 There are many cluster moments already calculated by default Some look pretty promising! Try to find out “best variable” or “best variable set” using automatic cut optimisation technique

Method of Cut Optimisation “Optimal cuts” maximise the signal efficiency at given background efficiency. The result is (in this case) a set of 100 cuts corresponding the signal efficiency from 0 to 1. Each cut set has a corresponding background rejection efficiency. For the application afterwards one has to choose one working point. Technically, optimisation is achieved in TMVA by Monte Carlo generation using uniform priors for the lower cut value, and the cut width, thrown within the variable ranges.

Example for Cut Optimisation Take the two variables from EmfractTool: <ϱ> and _clus Run cut optimisation: EMFracTool would be just one point in this plot

Finding the best set of variables Strategy: Run cut optimisation for all combinations of 2 (3,4,5) moments out of the 16. Compare the resulting efficiencies at background rejection of 99% (high purity) This is done for more than 1000 combinations in to bins 0.2 < |eta| <0.4 4 < E_clus < 16 GeV 2.0 < |eta| <2.2 4 < E_clus < 16 GeV

“optimal” Set of Variables (i) 0.2 < |eta| <0.4 4 < E_clus < 16 GeV: --- MVA Signal efficiency: --- Methods: @B=0.01 @B=0.10 @B=0.30 --- Cuts_278 : 0.681 0.940 0.986 --- Cuts_279 : 0.671 0.939 0.987 --- Cuts_27 : 0.671 0.939 0.986 --- Cuts_27c : 0.671 0.938 0.986 --- Cuts_8c : 0.668 0.915 0.985 --- Cuts_289 : 0.667 0.936 0.987 --- Cuts_28a : 0.666 0.936 0.986 --- Cuts_27a : 0.663 0.938 0.986 --- Cuts_27b : 0.661 0.939 0.986 --- Cuts_270 : 0.654 0.941 0.987 --- Cuts_8a : 0.651 0.936 0.985 --- Cuts_28c : 0.644 0.935 0.986 --- Cuts_280 : 0.644 0.929 0.986 The name “cut_xyz” is a short cut for cutting on three variables (x,y,z) 0 = "cl_m2_r_topo" 1 = "cl_m2_lambda_topo" 2 = "cl_center_lambda_topo" 3 = "cl_lateral_topo" 4 = "cl_center_x_topo" 5 = "cl_longitudinal_topo" 6 = "cl_lateral_topo" 7 = "cl_m1_dens_topo" 8 = "cl_m2_dens_topo" 9 = "cl_center_Y_topo" a = "cl_delta_theta_topo" b = "cl_center_z_topo" c = "cl_eng_frac_max_topo"

“optimal” Set of Variables (ii) The name “cut_xyz” is a short cut for cutting on three variables (x,y,z) 0 = "cl_m2_r_topo" 1 = "cl_m2_lambda_topo" 2 = "cl_center_lambda_topo" 3 = "cl_lateral_topo" 4 = "cl_center_x_topo" 5 = "cl_longitudinal_topo" 6 = "cl_lateral_topo" 7 = "cl_m1_dens_topo" 8 = "cl_m2_dens_topo" 9 = "cl_center_Y_topo" a = "cl_delta_theta_topo" b = "cl_center_z_topo" c = "cl_eng_frac_max_topo" 0.2 < |eta| <0.4 4 < E_clus < 16 GeV: --- MVA Signal efficiency: --- Methods: @B=0.01 @B=0.10 @B=0.30 --- Cuts_25c : 0.568 0.891 0.980 --- Cuts_258 : 0.566 0.892 0.979 --- Cuts_25b : 0.556 0.890 0.980 --- Cuts_256 : 0.554 0.891 0.980 --- Cuts_5bc : 0.553 0.893 0.980 --- Cuts_25 : 0.539 0.896 0.979 --- Cuts_257 : 0.539 0.894 0.980 --- Cuts_25a : 0.539 0.894 0.980 --- Cuts_25c : 0.538 0.892 0.980 --- Cuts_5b : 0.534 0.896 0.980 --- Cuts_7b : 0.533 0.930 0.985 --- Cuts_278 : 0.533 0.930 0.985 --- Cuts_27 : 0.533 0.929 0.985 --- Cuts_279 : 0.533 0.928 0.984

“optimal” Set of Variables (iii) The optimal set of varibles seems to be eta (energy?) dependend. The most prominent variables are: center_lambda, m2_dens, longitudinal, frac_em Needs further investigation. Let TMVA use these four variables and let's try some other discrimination techniques: --- TMVA_Factory: Evaluation results ranked by best 'signal eff @B=0.01' --------------------------------------------------------------------------- --- MVA Signal efficiency: Signifi- Sepa- mu-Trans- --- Methods: @B=0.01 @B=0.10 @B=0.30 cance: ration: form: --- TMlpANN : 0.604 0.934 0.988 2.331 0.770 0.841 --- Cuts : 0.554 0.924 0.983 0.000 0.000 0.000 --- Likelihood : 0.472 0.893 0.990 1.670 0.693 0.938 --- BDTGini : 0.393 0.914 0.981 2.115 0.719 0.898 --- PDERS : 0.345 0.858 0.976 1.998 0.685 0.780 --- Fisher : 0.194 0.790 0.981 1.355 0.538 0.798 --- TMVA_MethodFisher: ranked output (top variable is best ranked) ---------------------------------------------------------------- --- Variable : Coefficient: Discr. power: --- cl_m1_dens_topo: +2.877 0.4517 --- cl_center_lambda_topo: -2.796 0.3710 --- cl_eng_frac_em_topo: - 0.039 0.3436 --- cl_longitudinal_topo: -1.206 0.2722

Summary & Outlook The rectengular cut method is really competitive method for cluster classification Optimal cuts are calculated for each efficiency/background -> need to choose working point Optimal sets of cuts for all bins of E and eta are currently being calculated. ->Then decide which variables to use finally Use TMVA_Reader (ROOT class) other method not yet fully tuned... Since at least one variable is perfectly discriminating one has to remove this variable and do a training on the remaining variables on top of it

Code Example: Do the training in 72 bins! // load data sets TString datFileS = "data/e.dat"; TString datFileB = "data/pi.dat"; Tmva_factory->SetInputTrees( datFileS, datFileB ); // which variables are used for discrimination inputVars->push_back("cl_m2_r_topo"); inputVars->push_back("cl_m2_lambda_topo"); inputVars->push_back("cl_delta_phi_topo"); tmva_factory->SetInputVariables( inputVars );

// split data set and do training for EACH bin! Double_t etaBins[25]={ 0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8}; Double_t energyBins[6] = { 0.0000, 4000, 16000, 64000, 200000, 40000000 }; tmva_factory->BookMultipleMVAs("cl_e_topo", 5, &energyBins[0] ); Tmva_factory->BookMultipleMVAs("cl_m1_eta_topo", 24, &etaBins[0] ); // choose method inputVars->push_back("cl_m2_r_topo"); tmva_factory->BookMethod( "MethodCuts", "V:MC:500000:AllFSmart" ); tmva_ factory->TrainAllMethods(); tmva_factory->TestAllMethods();

Code Example: Apply Classification in Athena // create TMVA_Reader object TMVA_Reader *tmva = new TMVA_Reader( inputVars ); tmva->BookMultipleMVAs("cl_e_topo", 5, &energyBins[0] ); tmva->BookMultipleMVAs("cl_m1_eta_topo", 24, &etaBins[0] ); tmva->BookMVA( TMVA_Reader::LikeLiHood, “myweightfile" ); double mvaLKD = tmva->EvaluateMVA( varValues, multicutValues, TMVA_Reader::LikelihoodD );