Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning with TMVA A ROOT based Tool for Multivariate Data Analysis PANDA Computing Workshop Groningen 23.1.2008 The TMVA developer team: The TMVA.

Similar presentations


Presentation on theme: "Machine Learning with TMVA A ROOT based Tool for Multivariate Data Analysis PANDA Computing Workshop Groningen 23.1.2008 The TMVA developer team: The TMVA."— Presentation transcript:

1 Machine Learning with TMVA A ROOT based Tool for Multivariate Data Analysis PANDA Computing Workshop Groningen 23.1.2008 The TMVA developer team: The TMVA developer team: Andreas Höcker, Peter Speckmeyer, Jörg Stelzer, Helge Voss

2 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer2 General Event Classification Problem Event described by k variables (that are found to be discriminating)  (x i )   k Events can be classified into n categories: H 1 … H n General classifier: f:  k  , (x i )  {1,…,n} TMVA: only n=2 Commonly the case in HEP (signal/background) Most classification methods f:  k   d, (x i )  (y i ) Further:  d  , (y i )  {1,…,n} TMVA: d=1  y≥y sep : signal, y<y sep : background Example: k=2, n=3

3 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer3 Outline Introduction to the event classification problem Classifiers in TMVA Usage of TMVA

4 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer4 General Event Classification Problem Example: k=2 variables x 1,2, n=3 categories H 1, H 2, H 3 The problem: How to draw the boundaries between H 1, H 2, and H 3 such that f(x) returns the true nature of x with maximum correctness H2H2 H1H1 x1x1 x2x2 H3H3 Non-linear Boundaries ? H2H2 H1H1 x1x1 x2x2 H3H3 Linear Boundaries ? H2H2 H1H1 x1x1 x2x2 H3H3 Rectangular Cuts ? Simple example  I can do it by hand.

5 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer5 Large input variable space, complex correlations: manual optimization very difficult 2 general ways to build f(x): Supervised learning: in an event sample the category of each event is known. Machine adapts to give the smallest misclassification error on training sample. Unsupervised learning: the correct category of each event is unknown. Machinery tries to discover structures in the dataset All classifiers in TMVA are supervised learning methods General Event Classification Problem 1. What is the optimal boundary f(x) to separate the categories 2. More pragmatic: Which classifier is best to find this optimal boundary (or estimates it closest) Machine Learning

6 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer6 Classification Problems in HEP In HEP mostly two class problems – signal (S) and background (B) Event level (Higgs searches, …) Cone level (Tau-vs-jet reconstruction, …) Track level (particle identification, …) Lifetime and flavour tagging (b-tagging, …)... Input information Kinematic variables (masses, momenta, decay angles, …) Event properties (jet/lepton multiplicity, sum of charges, …) Event shape (sphericity, Fox-Wolfram moments, …) Detector response (silicon hits, dE/dx, Cherenkov angle, shower profiles, muon hits, …) …

7 Classifiers in TMVA

8 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer8 Conventional Linear Classifiers Cut Based  Widely used because transparent  Machine optimization is challenging: MINUIT fails for large n due to sparse population of input parameter space Alternatives are Monte Carlo Sampling, Genetic Algorithms, Simulated Annealing Projective Likelihood Estimator  Probability density estimators for each variable combined into one  Much liked in HEP Returns the likelihood of a sample belonging to a class  Projection ignores correlation between variables Significant performance loss for correlated variables Linear Fisher Discriminant  Axis in parameter space on which samples are projected, chosen such that signal and background are pushed far away from each other Optimal classifier for linearly correlated Gaussian-distributed variables Means of signal and background must be different R.A. Fisher, Annals Eugenics 7, 179 (1936).

9 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer9 Common Non-linear Classifiers Neural Network  Feed forward multilayer perceptron Non-linear activation function of each neuron Weierstrass theorem: can approximate any continuous functions to arbitrary precision with a single hidden layer and an infinite number of neurons PDE Range-Search, k Nearest Neighbours  n- dimensional signal and background PDF, probability obtained by counting number of signal and background events in vicinity of test event Range Search: vicinity is predefined volume k nearest neighbor: adaptive (k events in volume) Function Discriminant Analysis  User provided separation function fitted to the training data Simple, transparent discriminator for non-linear problems In-between solution (better then Fisher, but not good for complex examples) test event (“Activation” function) T. Carli and B. Koblitz, Nucl. Instrum. Meth. A501, 576 (2003) [hep-ex/0211019]

10 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer10 Classifiers Recent in HEP  Decision Tree is a series of cuts that split sample set into ever smaller sets, leafs are assigned either signal or background status Each split try to maximizing gain in separation (Gini-index)  Bottom-up pruning of a decision tree Protect from overtraining ( * ) by removing statistically insignificant nodes  DT easy to understand but not powerful Boosting  Increase the weight of incorrectly identified events and build a new decision tree  Final classifier: ‘forest’ of decision trees linearly combined Large coefficient for tree with small misclassification Improved performance and stability Boosted Decision Trees * Performance on training sample statistically better than on independent test sample Little tuning required for good performance D0 single top

11 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer11 Classifiers Recent in HEP Learning via Rule Ensembles  Rule is a set of cuts, defining regions in the input parameter space Rules extracted from a forest of Decision Trees (either from BDT, or a random forest generator) Linear combinations of rules, coefficients fitted by minimizing risk of misclassification  Good performance Support Vector Machines  Optimal hyperplane between linearly-separable data (1962)  Wrongly classified events add an extra term to the cost-function which is minimized  Non-separable data becomes linearly separable in higher dimensions  : R n  R   Kernel trick (suggested 1964, applied to SVM 1992) Cost function depends only on  (x)  (y) = K(x,y), no explicit knowledge of F required J. Friedman and B.E. Popescu, “Predictive Learning via Rule Ensembles”, Technical Report, Statistics Department, Stanford University, 2004. x1x1 x2x2 margin support vectors Separable data optimal hyperplane C. Cortes and V. Vapnik, “Support vector networks”, Machine Learning, 20, 273 (1995).

12 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer12 Data Preprocessing: Decorrelation Various classifiers perform sub-optimal in the presence of correlations between input variables (Cuts, Projective LH), others are slower (BDT, RuleFit) Removal of linear correlations by rotating input variables Determine square-root C of covariance matrix C, i.e., C = CC Transform original (x i ) into decorrelated variable space (x i ) by: x = C  1 x Also implemented Principal Component Analysis (PCA) Note that decorrelation is only complete, if Correlations are linear Input variables are Gaussian distributed Not very accurate conjecture in general original SQRT derorr. PCA derorr.

13 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer13 Is there a best Classifier Performance In the presence/absence of linear/nonlinear correlations Speed Training / evaluation time Robustness, stability Sensitivity to overtraining, weak input variables Size of training sample Dimensional scalability Do performance, speed, and robustness deteriorate with large dimensions Clarity Can the learning procedure/result be easily understood/visualized

14 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer14 No Single Best Criteria Classifiers Cuts Likeli- hood PDERS/ k- NN H-MatrixFisherMLPBDTRuleFitSVM Perfor- mance no / linear correlations  nonlinear correlations  Speed Training  Response //  Robust- ness Overtraining   Weak input variables   Curse of dimensionality  Clarity  

15 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer15 What is TMVA Motivation: Classifiers perform very different depending on the data  all should be tested on a given problem Situation for many year: usually only a small number of classifiers were investigated by analysts Needed a Tool that enables the analyst to simultaneously evaluate the performance of a large number of classifiers on his/her dataset Design Criteria: Performance and Convenience (A good tool does not have to be difficult to use) Training, testing, and evaluation of many classifiers in parallel Preprocessing of input data: decorrelation (PCA, Gaussianization) Illustrative tools to compare performance of all classifiers (ranking of classifiers, ranking of input variable, choice of working point) Actively protect against overtraining Straight forward application to test data Special needs of high energy physics addressed Two classes, events weights, familiar terminology

16 A typical TMVA analysis consists of two main steps: 1.Training phase: training, testing and evaluation of classifiers using data samples with known signal and background composition 2.Application phase: using selected trained classifiers to classify unknown data samples Using TMVA

17 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer17 Technical Aspects TMVA is open source, written in C++, and based on and part of ROOT Development on SourceForge, there is all the information http://sf.tmva.nethttp://sf.tmva.net Bundled with ROOT since 5.11-03 Training requires ROOT-environment, resulting classifiers also available as standalone C++ code Four core developers, many contributors > 2200 downloads since Mar 2006 (not counting ROOT users) Mailing list for reporting problems Users Guide at http://sf.tmva.net:http://sf.tmva.net 97p., classifier descriptions, code examples arXiv physics/0703039

18 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer18 Quick Start cd /afs/cern.ch/sw/lcg/external/root/5.18.00/slc4_ia32_gcc34/root. bin/thisroot.sh cd ~ cp -r $ROOTSYS/tmva/test macros; cd macros root -l TMVAnalysis.C\(\"MLP,BDT,SVM_Gauss\"\) directory needs to be called macros

19 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer19 Training with TMVA User usually starts with template TMVAnalysis.C Choose training variables Choose input data Select classifiers (adjust training options – described in the manual by specifying option ‘H’) Template TMVAnalysis.C (also.py) available at $TMVA/macros/ and $ROOTSYS/tmva/test/ TMVA GUI

20 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer20 Evaluation results ranked by best signal efficiency and purity (area) ------------------------------------------------------------------------------ MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi- Methods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance: ------------------------------------------------------------------------------ Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189 MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260 LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251 PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192 RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390 1.092 HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093 BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099 CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000 Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682 ------------------------------------------------------------------------------ Testing efficiency compared to training efficiency (overtraining check) ------------------------------------------------------------------------------ MVA Signal efficiency: from test sample (from training sample) Methods: @B=0.01 @B=0.10 @B=0.30 ------------------------------------------------------------------------------ Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873) MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873) LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872) PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881) RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848) HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868) BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911) CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715) Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677) ----------------------------------------------------------------------------- Evaluation Output Better classifier Remark on overtraining Occurs when classifier training becomes sensitive to the events of the particular training sample, rather then just to the generic features Sensitivity to overtraining depends on classifier: e.g., Fisher insensitive, BDT very sensitive Detect overtraining: compare performance between training and test sample Counteract overtraining: e.g., smooth likelihood PDFs, prune decision trees, …

21 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer21 More Evaluation Output --- Fisher : Ranking result (top variable is best ranked) --- Fisher : ---------------------------------------------------------------- --- Fisher : Rank : Variable : Discr. power --- Fisher : ---------------------------------------------------------------- --- Fisher : 1 : var4 : 2.175e-01 --- Fisher : 2 : var3 : 1.718e-01 --- Fisher : 3 : var1 : 9.549e-02 --- Fisher : 4 : var2 : 2.841e-02 --- Fisher : ---------------------------------------------------------------- Better variable --- Factory : Inter-MVA overlap matrix (signal): --- Factory : ------------------------------ --- Factory : Likelihood Fisher --- Factory : Likelihood: +1.000 +0.667 --- Factory : Fisher: +0.667 +1.000 --- Factory : ------------------------------ Input Variable Ranking Classifier correlation and overlap how useful is a variable? do classifiers perform the same separation into signal and background? If two classifiers have similar performance, but significant non-overlapping classifications  check if you can combine them!

22 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer22 Graphical Evaluation Classifier output distributions for independent test sample:

23 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer23 Graphical Evaluation There is no unique way to express the performance of a classifier  several benchmark quantities computed by TMVA Signal eff. at various background effs. (= 1 – rejection) when cutting on classifier output The Separation: “Rarity” implemented (background flat): Comparison of signal shapes between different classifiers Quick check: background on data should be flat

24 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer24 Visualization Using the GUI Projective likelihood PDFs, MLP training, BDTs, … average no. of nodes before/after pruning: 4193 / 968

25 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer25 Choosing a Working Point Depending on the problem the user might want to Achieve a certain signal purity, signal efficiency, or background reduction, or Find the selection that results in the highest signal significance (depending on the expected signal and background statistics) Using the TMVA graphical output one can determine at which classifier output value he needs to cuts to separate signal from background

26 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer26 Applying the Trained Classifier Use the TMVA::Reader class, example in TMVApplication.C: Set input variables Book classifier with the weight file (contains all information) Compute classifier response inside event loop  use it Templates TMVApplication.C available at $TMVA/macros/ and $ROOTSYS/tmva/test/

27 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer27 Applying the Trained Classifier (II) Also standalone C++ class without ROOT dependence Can be put into executable (ideal for GRID jobs) std::vector inputVars; … classifier = new ReadMLP ( inputVars ); for (int i=0; i<nEv; i++) { std::vector inputVec = …; double retval = classifier->GetMvaValue( *inputVec ); } example from ClassApplication.C

28 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer28 Extending TMVA A user might have an own implementation of a multivariate classifier, or wants to use an external one With ROOT 5.18.00 (16.Jan.08) user can seamlessly evaluate and compare his own classifier within TMVA: 1.Requirement: An own class must be derived from TMVA::MethodBase and must implement the TMVA::IMethod interface 2.The class must be added to the factory via ROOT’s plugin mechanism 3.Training, testing, evaluation, and comparison can then be done as usual, Example in TMVAnalysis.C

29 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer29 Conclusion Remarks Multivariate classifiers are no black boxes, we just need to understand them Cuts and Likelihood are transparent  if they perform use them In presence of correlations other classifiers are better Difficult to understand at any rate Enormous acceptance growth in recent decade in HEP TMVA provides means to train, evaluate, compare, and apply different classifiers TMVA also tries – through visualization – improve the understanding of the internals of each classifier Acknowledgments: The fast development of TMVA would not have been possible without the contribution and feedback from many developers and users to whom we are indebted. We thank in particular the CERN Summer students Matt Jachowski (Stanford) for the implementation of TMVA's new MLP neural network, Yair Mahalalel (Tel Aviv) for a significant improvement of PDERS, and Or Cohen for the development of the general classifier boosting, the Krakow student Andrzej Zemla and his supervisor Marcin Wolter for programming a powerful Support Vector Machine, as well as Rustem Ospanov for the development of a fast k-NN algorithm. We are grateful to Doug Applegate, Kregg Arms, René Brun and the ROOT team, Tancredi Carli, Zhiyi Liu, Elzbieta Richter-Was, Vincent Tisserand and Alexei Volk for helpful conversations.

30 Outlook Primary development from this Summer: Generalized classifiers 1.Be able to boost or bag any classifier 2.Combine any classifier with any other classifier using any combination of input variables in any phase space region 1. is ready – now in testing mode. To be deployed after upcoming ROOT release.

31 Additional Information

32 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer32 A Word on Treatment of Systematics? Some things could be done: Example: var4 may in reality have a shifted central value and hence a worse discrimination power One can: ignore the systematic in the training var4 appears stronger in training than it might be suboptimal performance (bad training, not wrong) Classifier response will strongly depend on “var4”, and hence will have a larger systematic uncertainty Better: Train with shifted (weakened) var4 Then evaluate systematic error on classifier output There is no principle difference in systematics evaluation between single discriminating variables and MV classifiers Control sample to estimate uncertainty on classifier output (not necessarily for each input variable) Advantage: correlations automatically taken into account

33 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer33 Checker Board Example Performance achieved without parameter tuning: PDERS and BDT best “out of the box” classifiers After specific tuning, also SVM und MLP perform well Theoretical maximum

34 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer34 Linear-, Cross-, Circular Correlations Illustrate the behavior of linear and nonlinear classifiers Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background)

35 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer35 Linear-, Cross-, Circular Correlations Plot test-events weighted by classifier output (red: signal-like, blue: background- like) Linear correlations (same for signal and background) Cross-linear correlations (opposite for signal and background) Circular correlations (same for signal and background) LikelihoodLikelihood - DPDERSFisherMLPBDT

36 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer36 Final Performance Background rejection versus signal efficiency curve: Linear Example Cross Example Circular Example

37 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer37 Stability with Respect to Irrelevant Variables Toy example with 2 discriminating and 4 non-discriminating variables: use only two discriminant variables in classifiers use all discriminant variables in classifiers

38 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer38 TMVAnalysis.C Script for Training void TMVAnalysis( ) { TFile* outputFile = TFile::Open( "TMVA.root", "RECREATE" ); TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile,"!V"); TFile *input = TFile::Open("tmva_example.root"); factory->AddSignalTree ( (TTree*)input->Get("TreeS"), 1.0 ); factory->AddBackgroundTree ( (TTree*)input->Get("TreeB"), 1.0 ); factory->AddVariable("var1+var2", 'F'); factory->AddVariable("var1-var2", 'F'); factory->AddVariable("var3", 'F'); factory->AddVariable("var4", 'F'); factory->PrepareTrainingAndTestTree("", "NSigTrain=3000:NBkgTrain=3000:SplitMode=Random:!V" ); factory->BookMethod( TMVA::Types::kLikelihood, "Likelihood", "!V:!TransformOutput:Spline=2:NSmooth=5:NAvEvtPerBin=50" ); factory->BookMethod( TMVA::Types::kMLP, "MLP", "!V:NCycles=200:HiddenLayers=N+1,N:TestRate=5" ); factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); outputFile->Close(); delete factory; } create Factory give training/test trees register input variables train, test and evaluate select MVA methods

39 DESY, Hamburg 14.1.2008Multivariate Analysis with TMVA - Jörg Stelzer39 TMVApplication.C Script for Application void TMVApplication( ) { TMVA::Reader *reader = new TMVA::Reader("!Color"); Float_t var1, var2, var3, var4; reader->AddVariable( "var1+var2", &var1 ); reader->AddVariable( "var1-var2", &var2 ); reader->AddVariable( "var3", &var3 ); reader->AddVariable( "var4", &var4 ); reader->BookMVA( "MLP classifier", "weights/MVAnalysis_MLP.weights.txt" ); TFile *input = TFile::Open("tmva_example.root"); TTree* theTree = (TTree*)input->Get("TreeS"); // … set branch addresses for user TTree for (Long64_t ievt=3000; ievt GetEntries();ievt++) { theTree->GetEntry(ievt); var1 = userVar1 + userVar2; var2 = userVar1 - userVar2; var3 = userVar3; var4 = userVar4; Double_t out = reader->EvaluateMVA( "MLP classifier" ); // do something with it … } delete reader; } register the variables book classifier(s) prepare event loop compute input variables calculate classifier output create Reader


Download ppt "Machine Learning with TMVA A ROOT based Tool for Multivariate Data Analysis PANDA Computing Workshop Groningen 23.1.2008 The TMVA developer team: The TMVA."

Similar presentations


Ads by Google