1 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA Toolkit for Multivariate Data Analysis with ROOT.

1 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA Toolkit for Multivariate Data Analysis with ROOT Helge Voss, MPI-K Heidelberg on behalf of: Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer* http://tmva.sourceforge.net/ arXiv: physics/0703039 Supply an environment to easily: apply different sophisticated data selection algorithms have them all trained, tested and evaluated find the best one for your selection problem and contributors: A.Christov, S.Henrot-Versillé, M.Jachowski, A.Krasznahorkay Jr., Y.Mahalalel, X.Prudent, P.Speckmayer, M.Wolter, A.Zemla

2 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Motivation/Outline Outline: introduction the MVA classifiers available in TMVA demonstration with toy examples summary ROOT: is the analysis framework used by most (HEP)-physicists Idea: rather than just implementing new MVA techniques and making them somehow available in ROOT (i.e. like TMulitLayerPercetron does): have one common platform/interface for all MVA classifiers easy to use and compare different MVA classifiers train/test on same data sample and evaluate consistently

3 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 … y(Bkg)  0 y(Signal)  1 Multivariate Event Classification All multivariate classifiers condense (correlated) multi-variable input information into a single scalar output variable: R n  R One variable to base your decision on

4 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 What is in TMVA TMVA currently includes: Rectangular cut optimisation Projective and Multi-dimensional likelihood estimator Fisher discriminant and H-Matrix (  2 estimator) Artificial Neural Network (3 different implementations) Boosted/bagged Decision Trees Rule Fitting Support Vector Machines TMVA package provides training, testing and evaluation of the classifiers each classifier provides a ranking of the input variables classifiers produce weight files that are read by reader class for MVA application all classifiers are highly customizable integrated in ROOT (since release 5.11/03) and very easy to use! support of arbitrary pre-selections and individual event weights common pre-processing of input: de-correlation, principal component analysis

5 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Commonly realised for all methods in TMVA (centrally in DataSet class): Note that this “de-correlation” is only complete, if: input variables are Gaussians correlations linear only in practise: gain form de-correlation often rather modest – or even harmful  Preprocessing the Input Variables: Decorrelation original SQRT derorr. PCA derorr. Removal of linear correlations by rotating variables using the square-root of the correlation matrix using the Principal Component Analysis

6 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Simplest method: cut in rectangular volume using scan in signal efficiency [0  1] and maximise background rejection from this scan, the optimal working point in terms if S,B numbers can be derived Technical problem: how to perform optimisation TMVA uses: random sampling, Simulated Annealing or Genetics Algorithm speed improvement in volume search:  training events are sorted in Binary Seach Trees Cut Optimisation do this in normal variable space or de-correlated variable space

7 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Combine probability from different variables for an event to be signal or background like discriminating variables Species: signal, background types Likelihood ratio for event i event PDFs Projected Likelihood Estimator (PDE Appr.) automatic,unbiased, but suboptimal easy to automate, can create artefacts TMVA uses: Splines0-5, Kernel estimators difficult to automate Technical problem: how to implement reference PDFs 3 ways: counting, function fitting, parametric fitting (splines, kernel estimators.) Optimal if no correlations and PDF’s are correct (known) usually it is not true  development of different methods

8 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Carli-Koblitz, NIM A501, 576 (2003) Generalisation of 1D PDE approach to N var dimensions Optimal method – in theory – if “true N-dim PDF” were known Practical challenges: derive N-dim PDF from training sample TMVA implementation: Range search PDERS count number of signal and background events in “vicinity” of a data event  fixed size or adaptive (latter one = kNN-type classifiers) Multidimensional Likelihood Estimator S B x1x1 x2x2 test event speed up range search by sorting training events in Binary Trees use multi-D kernels (Gaussian, triangular, …) to weight events within a volume volumes can be rectangular or spherical

9 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Well-known, simple and elegant classifier: determine linear variable transformation where: linear correlations are removed mean values of signal and background are “pushed” as far apart as possible the computation of Fisher response is very simple: linear combination of the event variables * Fisher coefficients “Fisher coefficients” Fisher Discriminant (and H-Matrix)

10 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Feed-forward Multilayer Perceptron 1 i...... N 1 input layerk hidden layers1 ouput layer 1 j M1M1............ 1...... MkMk 2 output classes (signal and background) N var discriminating input variables............ with: (“Activation” function) Get a non-linear classifier response by giving linear combination of input variables to nodes with non-linear activation Artificial Neural Network (ANN) Training: adjust weights using known event such that signal/background are best separated Nodes (or neurons) and arranged in series  Feed-Forward Multilayer Perceptrons (3 different implementations in TMVA)

11 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Decision Trees sequential application of “cuts” which splits the data into nodes, and the final nodes (leaf) classifies an event as signal or background Training: growing a decision tree Start with Root node Split training sample according to cut on best variable at this node Splitting criterion: e.g., maximum “Gini-index”: purity  (1– purity) Continue splitting until min. number of events or max. purity reached Bottom up Pruning: remove statistically insignificant nodes  avoid overtraining Classify leaf node according to majority of events, or give weight; unknown test events are classified accordingly Decision tree before pruning Decision tree after pruning Decision Trees

12 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Decision Trees: well know since a long time but hardly used in HEP (although very similar to “simple Cuts”) Disatvantage: instability: small changes in training sample can give large changes in tree structure Boosted Decision Trees Boosted Decision Trees (1996): combine several decision trees: forest classifier output is the (weighted) majority vote of individual trees trees derived from same training sample with different event weights e.g. AdaBoost: wrong classified training events are given a larger weight bagging (re-sampling with replacement)  random weights Boosted Decision Trees Remark: bagging/boosting  create a basis of classifiers final classifier is a linear combination of base classifiers

13 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Following RuleFit from Friedman-Popescu: Classifier is a linear combination of simple base classifiers that are called rules and are here: sequences of cuts: The procedure is: 1. create the rule ensemble  created from a set of decision trees 2. fit the coefficients  “Gradient directed regularization” (Friedman et al) Rule Fitting (Predictive Learning via Rule Ensembles) rules (cut sequence  r m =1 if all cuts satisfied, =0 otherwise) normalised discriminating event variables RuleFit classifier Linear Fisher termSum of rules Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003

14 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Support Vector Machines x1x1 x2x2 Find hyperplane that best separates signal from background best separation: maximum distance between closest events (support) to hyperplane linear decision boundary Non linear cases: transform the variables in higher dimensional feature space where linear boundary (hyperplanes) can separate the data transformation is done implicitly using Kernel Functions that effectively introduces a metric for the distance measures that “mimics” the transformation Choose Kernel and fit the hyperplane x1x1 x1x1 x2x2 x1x1 x2x2 x3x3 Available Kernels: Gaussian, Polynomial, Sigmoid

15 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 A Complete Example Analysis void TMVAnalysis( ) { TFile* outputFile = TFile::Open( "TMVA.root", "RECREATE" ); TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile,"!V"); TFile *input = TFile::Open("tmva_example.root"); TTree *signal = (TTree*)input->Get("TreeS"); TTree *background = (TTree*)input->Get("TreeB"); factory->AddSignalTree ( signal, 1. ); factory->AddBackgroundTree( background, 1.); factory->AddVariable("var1+var2", 'F'); factory->AddVariable("var1-var2", 'F'); factory->AddVariable("var3", 'F'); factory->AddVariable("var4", 'F'); factory->PrepareTrainingAndTestTree("", "NSigTrain=3000:NBkgTrain=3000:SplitMode=Random:!V" ); factory->BookMethod( TMVA::Types::kLikelihood, "Likelihood", "!V:!TransformOutput:Spline=2:NSmooth=5:NAvEvtPerBin=50" ); factory->BookMethod( TMVA::Types::kMLP, "MLP", "!V:NCycles=200:HiddenLayers=N+1,N:TestRate=5" ); factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); outputFile->Close(); delete factory; } create Factory give training/test trees tell which variables (example uses variables not directly avaiable in the tree:i.e.” var1+var2”) select the MVA methods train,test and evaluate

16 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Example Application void TMVApplication( ) { TMVA::Reader *reader = new TMVA::Reader("!Color"); Float_t var1, var2, var3, var4; reader->AddVariable( "var1+var2", &var1 ); reader->AddVariable( "var1-var2", &var2 ); reader->AddVariable( "var3", &var3 ); reader->AddVariable( "var4", &var4 ); reader->BookMVA( "MLP method", "weights/MVAnalysis_MLP.weights.txt" ); TFile *input = TFile::Open("tmva_example.root"); TTree* theTree = (TTree*)input->Get("TreeS"); Float_t userVar1, userVar2; theTree->SetBranchAddress( "var1", &userVar1 ); theTree->SetBranchAddress( "var2", &userVar2 ); theTree->SetBranchAddress( "var3", &var3 ); theTree->SetBranchAddress( "var4", &var4 ); for (Long64_t ievt=3000; ievt GetEntries();ievt++) { theTree->GetEntry(ievt); var1 = userVar1 + userVar2; var2 = userVar1 - userVar2; cout EvaluateMVA( "MLP method" ) <<endl; } delete reader; } create Reader tell it about the variables selected MVA method set tree variables (example uses variables not directly avaiable in the tree) event loop calculate the MVA response

17 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Use data set with 4 linearly correlated Gaussian distributed variables: --------------------------------------- Rank : Variable : Separation --------------------------------------- 1 : var3 : 3.834e+02 2 : var2 : 3.062e+02 3 : var1 : 1.097e+02 4 : var0 : 5.818e+01 --------------------------------------- A purely academic Toy example

18 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Validating the Classifier Training average no. of nodes before/after pruning: 4193 / 968 Validating the classifiers TMVA GUI Projective likelihood PDFs, MLP training, BDTs,....

19 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA output distributions: The Output Classifier Output due to correlations correlations removed Likelihood PDERS Fisher Neural NetworkBoosted Decision TreesRule Fitting

20 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA output distributions for Fisher, Likelihood, BDT and MLP… The Output Evaluation Output For this case: Fisher discriminant provides the theoretically ‘best’ possible method  Same as decorrelated Likelihood For this case: Fisher discriminant provides the theoretically ‘best’ possible method  Same as decorrelated Likelihood Cuts and Likelihood w/o de-correlation are inferior Note: About All Realistic Use Cases are Much More Difficult Than This One

21 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Evaluation Output (taken from TMVA printout) Evaluation results ranked by best signal efficiency and purity (area) ------------------------------------------------------------------------------ MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi- Methods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance: ------------------------------------------------------------------------------ Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189 MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260 LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251 PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192 RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390 1.092 HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093 BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099 CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000 Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682 ------------------------------------------------------------------------------ Testing efficiency compared to training efficiency (overtraining check) ------------------------------------------------------------------------------ MVA Signal efficiency: from test sample (from traing sample) Methods: @B=0.01 @B=0.10 @B=0.30 ------------------------------------------------------------------------------ Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873) MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873) LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872) PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881) RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848) HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868) BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911) CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715) Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677) ----------------------------------------------------------------------------- Better classifier Check for overtraining

22 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 More Toys: Linear-, Cross-, Circular Correlations Illustrate the behaviour of linear and nonlinear classifiers Circular correlations (same for signal and background) More Toys: Circular correlations

23 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Weight Variables by Classifier Performance Example: How do classifiers deal with the correlation patterns ? Illustustration: Events weighted by MVA-response: Linear Classifiers: Non Linear Classifiers: Decision Trees PDERS Likelihood Fisher decorrelated Likelihood

24 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Circular Example Final Classifier Performance Background rejection versus signal efficiency curve: Final Classifier Performance

25 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 More Toys: “Schachbrett” (chess board) Performance achieved without parameter adjustments: PDERS and BDT are best “out of the box” After some parameter tuning, also SVM und ANN(MLP) perform Theoretical maximum Event Distribution Events weighted by SVM response

26 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 We (finally) have a Users Guide ! Available from tmva.sf.net TMVA-Users Guide TMVA Users Guide 78pp, incl. code examples arXiv: physics/0703039

27 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA unifies highly customizable and performing multivariate classification algorithms in a single user-friendly framework Summary This ensures most objective classifier comparisons and simplifies their use TMVA is available from tmva.sf.net and in ROOT (>5.11/03) A typical TMVA analysis requires user interaction with a Factory (for classifier training) and a Reader (for classifier application) a set of ROOT macros displays the evaluation results We will continue to improve flexibility and add new classifiers Bayesian Classifiers “Committee Method”  combination of different MVA techniques C-code output for trained classifiers (for selected methods…)

28 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 More Toys: Linear-, Cross-, Circular Correlations Illustrate the behaviour of linear and nonlinear classifiers Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) More Toys: Linear, Cross, Circular correlations

29 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Weight Variables by Classifier Performance Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) How well do the classifier resolve the various correlation patterns ? Illustustration: Events weighted by MVA-response:

30 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Final Classifier Performance Background rejection versus signal efficiency curve: Linear Example Cross Example Circular Example Final Classifier Performance

31 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Stability with Respect to Irrelevant Variables Toy example with 2 discriminating and 4 non-discriminating variables ? use only two discriminant variables in classifiers use all discriminant variables in classifiers Stability with respect to irrelevant variables

32 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 Using TMVA in Training and Application Can be ROOT scripts, C++ executables or python scripts (via PyROOT), or any other high-level language that interfaces with ROOT

33 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 A linear boundary?A nonlinear one?Rectangular cuts? Introduction: Event Classification Different techniques use different ways trying to exploit (all) features  compare and choose How to place the decision boundary?  Let the machine learn it from training events S

1 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA Toolkit for Multivariate Data Analysis with ROOT.

Similar presentations

Presentation on theme: "1 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA Toolkit for Multivariate Data Analysis with ROOT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA Toolkit for Multivariate Data Analysis with ROOT.

Similar presentations

Presentation on theme: "1 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA Toolkit for Multivariate Data Analysis with ROOT."— Presentation transcript:

Similar presentations

About project

Feedback