Cluster Classification Studies with MVA techniques

Cluster Classification Studies with MVA techniques
Motivation: Current EMFracClassification tool uses 75 TProfile 2D plots as “lookup tables”. Why not apply simple cuts? Maybe more sophisticated MVA discrimination techniques (Likelihood, ANN, ...) Why use the two cluster moments <ϱ> and _clus? Improvement of the efficiency and purity of the classification. Used as test analysis for developing a toolkit for multi variate analyses TMVA. (see TMVA integration in ROOT is about to finish this week

Data set The basis of the cluster classification studies are the postrome single pions with calibration hits: Created same data sets with electrons/positrons with same software and scripts: (would be on castor already, but my grid certificate expired) # events per generated single particle energy : energy distribution for all clusters:

Which clusters are from the electron or pion?
In an empty calorimeter one expects up to 12 clusters from noise in addition to the clusters from the generated single particle Take only clusters wich contain energy from calibration hits (true G4) clusters in pion sample: clusters in electron sample:

Definition of the classification samples
Strategy: “Try to find the EM clusters first, and apply weights to the rest” EM clusters are the “signal” Defition of “EM clusters” EM_frac (from calibration hits) > (not tuned yet)

Cluster moments (2 < eta <2.2; 4 < E < 16 GeV)

Cluster moments 2/3

Cluster moments 3/3 There are many cluster moments already calculated by default Some look pretty promising! Try to find out “best variable” or “best variable set” using automatic cut optimisation technique

Method of Cut Optimisation
“Optimal cuts” maximise the signal efficiency at given background efficiency. The result is (in this case) a set of 100 cuts corresponding the signal efficiency from 0 to 1. Each cut set has a corresponding background rejection efficiency. For the application afterwards one has to choose one working point. Technically, optimisation is achieved in TMVA by Monte Carlo generation using uniform priors for the lower cut value, and the cut width, thrown within the variable ranges.

Example for Cut Optimisation
Take the two variables from EmfractTool: <ϱ> and _clus Run cut optimisation: EMFracTool would be just one point in this plot

Finding the best set of variables
Strategy: Run cut optimisation for all combinations of 2 (3,4,5) moments out of the 16. Compare the resulting efficiencies at background rejection of 99% (high purity) This is done for more than 1000 combinations in to bins 0.2 < |eta| < < E_clus < 16 GeV 2.0 < |eta| < < E_clus < 16 GeV

“optimal” Set of Variables (i)
0.2 < |eta| < < E_clus < 16 GeV: --- MVA Signal efficiency: --- Methods: @B=0.01 @B=0.10 @B=0.30 --- Cuts_278 : --- Cuts_279 : --- Cuts_27 : --- Cuts_27c : --- Cuts_8c : --- Cuts_289 : --- Cuts_28a : --- Cuts_27a : --- Cuts_27b : --- Cuts_270 : --- Cuts_8a : --- Cuts_28c : --- Cuts_280 : The name “cut_xyz” is a short cut for cutting on three variables (x,y,z) 0 = "cl_m2_r_topo" 1 = "cl_m2_lambda_topo" 2 = "cl_center_lambda_topo" 3 = "cl_lateral_topo" 4 = "cl_center_x_topo" 5 = "cl_longitudinal_topo" 6 = "cl_lateral_topo" 7 = "cl_m1_dens_topo" 8 = "cl_m2_dens_topo" 9 = "cl_center_Y_topo" a = "cl_delta_theta_topo" b = "cl_center_z_topo" c = "cl_eng_frac_max_topo"

“optimal” Set of Variables (ii)
The name “cut_xyz” is a short cut for cutting on three variables (x,y,z) 0 = "cl_m2_r_topo" 1 = "cl_m2_lambda_topo" 2 = "cl_center_lambda_topo" 3 = "cl_lateral_topo" 4 = "cl_center_x_topo" 5 = "cl_longitudinal_topo" 6 = "cl_lateral_topo" 7 = "cl_m1_dens_topo" 8 = "cl_m2_dens_topo" 9 = "cl_center_Y_topo" a = "cl_delta_theta_topo" b = "cl_center_z_topo" c = "cl_eng_frac_max_topo" 0.2 < |eta| < < E_clus < 16 GeV: --- MVA Signal efficiency: --- Methods: @B=0.01 @B=0.10 @B=0.30 --- Cuts_25c : --- Cuts_258 : --- Cuts_25b : --- Cuts_256 : --- Cuts_5bc : --- Cuts_25 : --- Cuts_257 : --- Cuts_25a : --- Cuts_25c : --- Cuts_5b : --- Cuts_7b : --- Cuts_278 : --- Cuts_27 : --- Cuts_279 :

“optimal” Set of Variables (iii)
The optimal set of varibles seems to be eta (energy?) dependend. The most prominent variables are: center_lambda, m2_dens, longitudinal, frac_em Needs further investigation. Let TMVA use these four variables and let's try some other discrimination techniques: --- TMVA_Factory: Evaluation results ranked by best 'signal --- MVA Signal efficiency: Signifi- Sepa- mu-Trans- --- Methods: @B=0.01 @B=0.10 @B=0.30 cance: ration: form: --- TMlpANN : --- Cuts : --- Likelihood : --- BDTGini : --- PDERS : --- Fisher : --- TMVA_MethodFisher: ranked output (top variable is best ranked) --- Variable : Coefficient: Discr. power: --- cl_m1_dens_topo: --- cl_center_lambda_topo: --- cl_eng_frac_em_topo: --- cl_longitudinal_topo:

Summary & Outlook The rectengular cut method is really competitive method for cluster classification Optimal cuts are calculated for each efficiency/background -> need to choose working point Optimal sets of cuts for all bins of E and eta are currently being calculated. ->Then decide which variables to use finally Use TMVA_Reader (ROOT class) other method not yet fully tuned... Since at least one variable is perfectly discriminating one has to remove this variable and do a training on the remaining variables on top of it

Code Example: Do the training in 72 bins!
// load data sets TString datFileS = "data/e.dat"; TString datFileB = "data/pi.dat"; Tmva_factory->SetInputTrees( datFileS, datFileB ); // which variables are used for discrimination inputVars->push_back("cl_m2_r_topo"); inputVars->push_back("cl_m2_lambda_topo"); inputVars->push_back("cl_delta_phi_topo"); tmva_factory->SetInputVariables( inputVars );

// split data set and do training for EACH bin!
Double_t etaBins[25]={ 0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8}; Double_t energyBins[6] = { , 4000, 16000, 64000, , }; tmva_factory->BookMultipleMVAs("cl_e_topo", 5, &energyBins[0] ); Tmva_factory->BookMultipleMVAs("cl_m1_eta_topo", 24, &etaBins[0] ); // choose method inputVars->push_back("cl_m2_r_topo"); tmva_factory->BookMethod( "MethodCuts", "V:MC:500000:AllFSmart" ); tmva_ factory->TrainAllMethods(); tmva_factory->TestAllMethods();

Code Example: Apply Classification in Athena
// create TMVA_Reader object TMVA_Reader *tmva = new TMVA_Reader( inputVars ); tmva->BookMultipleMVAs("cl_e_topo", 5, &energyBins[0] ); tmva->BookMultipleMVAs("cl_m1_eta_topo", 24, &etaBins[0] ); tmva->BookMVA( TMVA_Reader::LikeLiHood, “myweightfile" ); double mvaLKD = tmva->EvaluateMVA( varValues, multicutValues, TMVA_Reader::LikelihoodD );

Cluster Classification Studies with MVA techniques

Similar presentations

Presentation on theme: "Cluster Classification Studies with MVA techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cluster Classification Studies with MVA techniques

Similar presentations

Presentation on theme: "Cluster Classification Studies with MVA techniques"— Presentation transcript:

Similar presentations

About project

Feedback