Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toolkit for Multivariate Data Analysis Helge Voss, MPI-K Heidelberg

Similar presentations


Presentation on theme: "Toolkit for Multivariate Data Analysis Helge Voss, MPI-K Heidelberg"— Presentation transcript:

1 Toolkit for Multivariate Data Analysis Helge Voss, MPI-K Heidelberg
TMVA Toolkit for Multivariate Data Analysis with ROOT Helge Voss, MPI-K Heidelberg on behalf of: Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer* Supply an environment to easily: apply different sophisticated data selection algorithms have them all trained, tested and evaluated find the best one for your selection problem and contributors: A.Christov, S.Henrot-Versillé, M.Jachowski, A.Krasznahorkay Jr., Y.Mahalalel, X.Prudent, P.Speckmayer, M.Wolter, A.Zemla arXiv: physics/ Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

2 Motivation/Outline ROOT: is the analysis framework used by most (HEP)-physicists Idea: rather than just implementing new MVA techniques and making them somehow available in ROOT (i.e. like TMulitLayerPercetron does): have one common platform/interface for all MVA classifiers easy to use and compare different MVA classifiers train/test on same data sample and evaluate consistently Outline: introduction the MVA classifiers available in TMVA demonstration with toy examples summary Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

3 Multivariate Event Classification
All multivariate classifiers condense (correlated) multi-variable input information into a single scalar output variable: Rn  R y(Bkg)  0 y(Signal)  1 One variable to base your decision on Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

4 What is in TMVA TMVA currently includes:
Rectangular cut optimisation Projective and Multi-dimensional likelihood estimator Fisher discriminant and H-Matrix (2 estimator) Artificial Neural Network (3 different implementations) Boosted/bagged Decision Trees Rule Fitting Support Vector Machines all classifiers are highly customizable common pre-processing of input: de-correlation, principal component analysis support of arbitrary pre-selections and individual event weights TMVA package provides training, testing and evaluation of the classifiers each classifier provides a ranking of the input variables classifiers produce weight files that are read by reader class for MVA application integrated in ROOT(since release 5.11/03) and very easy to use! Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

5 Preprocessing the Input Variables: Decorrelation
Commonly realised for all methods in TMVA (centrally in DataSet class): Removal of linear correlations by rotating variables using the square-root of the correlation matrix using the Principal Component Analysis original SQRT derorr. PCA derorr. Note that this “de-correlation” is only complete, if: input variables are Gaussians correlations linear only in practise: gain form de-correlation often rather modest – or even harmful  Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

6 Cut Optimisation Simplest method: cut in rectangular volume using
scan in signal efficiency [0 1] and maximise background rejection from this scan, the optimal working point in terms if S,B numbers can be derived Technical problem: how to perform optimisation TMVA uses: random sampling, Simulated Annealing or Genetics Algorithm speed improvement in volume search:  training events are sorted in Binary Seach Trees do this in normal variable space or de-correlated variable space Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

7 Projected Likelihood Estimator (PDE Appr.)
Combine probability from different variables for an event to be signal or background like Optimal if no correlations and PDF’s are correct (known) usually it is not true  development of different methods discriminating variables Species: signal, background types Likelihood ratio for event ievent PDFs Technical problem: how to implement reference PDFs 3 ways: counting, function fitting , parametric fitting (splines, kernel estimators.) automatic,unbiased, but suboptimal difficult to automate easy to automate, can create artefacts TMVA uses: Splines0-5, Kernel estimators Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

8 Multidimensional Likelihood Estimator
Generalisation of 1D PDE approach to Nvar dimensions Optimal method – in theory – if “true N-dim PDF” were known Practical challenges: derive N-dim PDF from training sample x2 S TMVA implementation: Range search PDERS count number of signal and background events in “vicinity” of a data event  fixed size or adaptive (latter one = kNN-type classifiers) test event B x1 volumes can be rectangular or spherical use multi-D kernels (Gaussian, triangular, …) to weight events within a volume speed up range search by sorting training events in Binary Trees Carli-Koblitz, NIM A501, 576 (2003) Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

9 Fisher Discriminant (and H-Matrix)
Well-known, simple and elegant classifier: determine linear variable transformation where: linear correlations are removed mean values of signal and background are “pushed” as far apart as possible the computation of Fisher response is very simple: linear combination of the event variables * Fisher coefficients “Fisher coefficients” Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

10 Artificial Neural Network (ANN)
Get a non-linear classifier response by giving linear combination of input variables to nodes with non-linear activation Nodes (or neurons) and arranged in series  Feed-Forward Multilayer Perceptrons (3 different implementations in TMVA) Feed-forward Multilayer Perceptron 1 i . . . N 1 input layer k hidden layers 1 ouput layer j M1 Mk 2 output classes (signal and background) Nvar discriminating input variables with: (“Activation” function) Training: adjust weights using known event such that signal/background are best separated Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

11 Decision tree before pruning
Decision Trees Decision Trees sequential application of “cuts” which splits the data into nodes, and the final nodes (leaf) classifies an event as signal or background Training: growing a decision tree Start with Root node Split training sample according to cut on best variable at this node Splitting criterion: e.g., maximum “Gini-index”: purity  (1– purity) Continue splitting until min. number of events or max. purity reached Classify leaf node according to majority of events, or give weight; unknown test events are classified accordingly Decision tree after pruning Decision tree before pruning Bottom up Pruning: remove statistically insignificant nodes  avoid overtraining Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

12 Boosted Decision Trees
Decision Trees: well know since a long time but hardly used in HEP (although very similar to “simple Cuts”) Disatvantage: instability: small changes in training sample can give large changes in tree structure Boosted Decision Trees (1996): combine several decision trees: forest classifier output is the (weighted) majority vote of individual trees trees derived from same training sample with different event weights e.g. AdaBoost: wrong classified training events are given a larger weight bagging (re-sampling with replacement) random weights Remark: bagging/boosting  create a basis of classifiers final classifier is a linear combination of base classifiers Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

13 Rule Fitting (Predictive Learning via Rule Ensembles)
Following RuleFit from Friedman-Popescu: Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003 Classifier is a linear combination of simple base classifiers that are called rules and are here: sequences of cuts: rules (cut sequence  rm=1 if all cuts satisfied, =0 otherwise) normalised discriminating event variables RuleFit classifier Linear Fisher term Sum of rules The procedure is: create the rule ensemble  created from a set of decision trees fit the coefficients  “Gradient directed regularization” (Friedman et al) Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

14 Support Vector Machines
Find hyperplane that best separates signal from background best separation: maximum distance between closest events (support) to hyperplane linear decision boundary x2 Non linear cases: transform the variables in higher dimensional feature space where linear boundary (hyperplanes) can separate the data transformation is done implicitly using Kernel Functions that effectively introduces a metric for the distance measures that “mimics” the transformation Choose Kernel and fit the hyperplane x1 x1 x2 x3 x1 x2 Available Kernels: Gaussian, Polynomial, Sigmoid x1 Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

15 A Complete Example Analysis
void TMVAnalysis( ) { TFile* outputFile = TFile::Open( "TMVA.root", "RECREATE" ); TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile,"!V"); TFile *input = TFile::Open("tmva_example.root"); TTree *signal = (TTree*)input->Get("TreeS"); TTree *background = (TTree*)input->Get("TreeB"); factory->AddSignalTree ( signal, ); factory->AddBackgroundTree( background, 1.); factory->AddVariable("var1+var2", 'F'); factory->AddVariable("var1-var2", 'F'); factory->AddVariable("var3", 'F'); factory->AddVariable("var4", 'F'); factory->PrepareTrainingAndTestTree("", "NSigTrain=3000:NBkgTrain=3000:SplitMode=Random:!V" ); factory->BookMethod( TMVA::Types::kLikelihood, "Likelihood", "!V:!TransformOutput:Spline=2:NSmooth=5:NAvEvtPerBin=50" ); factory->BookMethod( TMVA::Types::kMLP, "MLP", "!V:NCycles=200:HiddenLayers=N+1,N:TestRate=5" ); factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); outputFile->Close(); delete factory; } create Factory give training/test trees tell which variables (example uses variables not directly avaiable in the tree:i.e.” var1+var2”) select the MVA methods train,test and evaluate Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

16 Example Application create Reader tell it about the variables
void TMVApplication( ) { TMVA::Reader *reader = new TMVA::Reader("!Color"); Float_t var1, var2, var3, var4; reader->AddVariable( "var1+var2", &var1 ); reader->AddVariable( "var1-var2", &var2 ); reader->AddVariable( "var3", &var3 ); reader->AddVariable( "var4", &var4 ); reader->BookMVA( "MLP method", "weights/MVAnalysis_MLP.weights.txt" ); TFile *input = TFile::Open("tmva_example.root"); TTree* theTree = (TTree*)input->Get("TreeS"); Float_t userVar1, userVar2; theTree->SetBranchAddress( "var1", &userVar1 ); theTree->SetBranchAddress( "var2", &userVar2 ); theTree->SetBranchAddress( "var3", &var3 ); theTree->SetBranchAddress( "var4", &var4 ); for (Long64_t ievt=3000; ievt<theTree->GetEntries();ievt++) { theTree->GetEntry(ievt); var1 = userVar1 + userVar2; var2 = userVar1 - userVar2; cout << reader->EvaluateMVA( "MLP method" ) <<endl; } delete reader; } create Reader tell it about the variables selected MVA method set tree variables (example uses variables not directly avaiable in the tree) event loop calculate the MVA response Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

17 A purely academic Toy example
Use data set with 4 linearly correlated Gaussian distributed variables: Rank : Variable  : Separation      1 : var3      : 3.834e+02     2 : var2       : 3.062e+02      : var1       : 1.097e+02     4 : var0       : 5.818e+01 Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

18 Validating the classifiers
Validating the Classifier Training TMVA GUI Projective likelihood PDFs, MLP training, BDTs, .... average no. of nodes before/after pruning: 4193 / 968 Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

19 Classifier Output The Output TMVA output distributions: Likelihood
PDERS Fisher correlations removed due to correlations Neural Network Boosted Decision Trees Rule Fitting Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

20 Evaluation Output The Output
TMVA output distributions for Fisher, Likelihood, BDT and MLP… For this case: Fisher discriminant provides the theoretically ‘best’ possible method  Same as de-correlated Likelihood Note: About All Realistic Use Cases are Much More Difficult Than This One Cuts and Likelihood w/o de-correlation are inferior Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

21 Evaluation Output (taken from TMVA printout)
Evaluation results ranked by best signal efficiency and purity (area) MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi- Methods: @B= @B= @B= Area | ration: cance: Fisher : 0.268(03) (03) (02) | MLP : 0.266(03) (03) (02) | LikelihoodD : 0.259(03) (03) (02) | PDERS : 0.223(03) (03) (02) | RuleFit : 0.196(03) (03) (02) | HMatrix : 0.058(01) (03) (02) | BDT : 0.154(02) (04) (03) | CutsGA : 0.109(02) (00) (03) | Likelihood : 0.086(02) (03) (03) | Testing efficiency compared to training efficiency (overtraining check) MVA Signal efficiency: from test sample (from traing sample) Methods: @B= @B= @B=0.30 Fisher : (0.275) (0.658) (0.873) MLP : (0.278) (0.658) (0.873) LikelihoodD : (0.273) (0.657) (0.872) PDERS : (0.389) (0.691) (0.881) RuleFit : (0.198) (0.616) (0.848) HMatrix : (0.060) (0.623) (0.868) BDT : (0.268) (0.736) (0.911) CutsGA : (0.123) (0.424) (0.715) Likelihood : (0.092) (0.379) (0.677) Better classifier Check for over-training Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

22 More Toys: Circular correlations
More Toys: Linear-, Cross-, Circular Correlations Illustrate the behaviour of linear and nonlinear classifiers Circular correlations (same for signal and background) Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

23 Illustustration: Events weighted by MVA-response:
Weight Variables by Classifier Performance Example: How do classifiers deal with the correlation patterns ? Linear Classifiers: Likelihood decorrelated Likelihood Fisher Non Linear Classifiers: Decision Trees PDERS Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

24 Final Classifier Performance
Background rejection versus signal efficiency curve: Circular Example Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

25 More Toys: “Schachbrett” (chess board)
Event Distribution Performance achieved without parameter adjustments: PDERS and BDT are best “out of the box” After some parameter tuning, also SVM und ANN(MLP) perform Theoretical maximum Events weighted by SVM response Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

26 We (finally) have a Users Guide !
TMVA-Users Guide We (finally) have a Users Guide ! Available from tmva.sf.net TMVA Users Guide 78pp, incl. code examples arXiv: physics/ Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

27 Summary TMVA unifies highly customizable and performing multivariate classification algorithms in a single user-friendly framework This ensures most objective classifier comparisons and simplifies their use TMVA is available from tmva.sf.net and in ROOT (>5.11/03) A typical TMVA analysis requires user interaction with a Factory (for classifier training) and a Reader (for classifier application) a set of ROOT macros displays the evaluation results We will continue to improve flexibility and add new classifiers Bayesian Classifiers “Committee Method”  combination of different MVA techniques C-code output for trained classifiers (for selected methods…) Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

28 More Toys: Linear, Cross, Circular correlations
Illustrate the behaviour of linear and nonlinear classifiers Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

29 Illustustration: Events weighted by MVA-response:
Weight Variables by Classifier Performance How well do the classifier resolve the various correlation patterns ? Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

30 Final Classifier Performance
Background rejection versus signal efficiency curve: Circular Example Cross Example Linear Example Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

31 Stability with respect to irrelevant variables
Toy example with 2 discriminating and 4 non-discriminating variables ? use all discriminant variables in classifiers use only two discriminant variables in classifiers Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

32 Using TMVA in Training and Application
Can be ROOT scripts, C++ executables or python scripts (via PyROOT), or any other high-level language that interfaces with ROOT Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007

33 Introduction: Event Classification
Different techniques use different ways trying to exploit (all) features  compare and choose Rectangular cuts? A linear boundary? A nonlinear one? S B x1 x2 S B x1 x2 B x1 x2 S How to place the decision boundary?  Let the machine learn it from training events Helge Voss Nikhef 23rd - 27th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007


Download ppt "Toolkit for Multivariate Data Analysis Helge Voss, MPI-K Heidelberg"

Similar presentations


Ads by Google