Download presentation
Presentation is loading. Please wait.
Published byDerick Johns Modified over 9 years ago
1
TMVA Andreas Höcker (CERN) CERN meeting, Oct 6, 2006 Toolkit for Multivariate Data Analysis
2
2 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Many (more and more) HEP data analyses uses multivariate techniques MVA Experience Often analysts use custom tools, without much comparison MVAs tedious to implement & debug, therefore few comparisons between methods ! most accepted: cuts also widely understood and accepted: likelihood (probability density estimators – PDE ) often disliked, but more and more used (LEP, BABAR): Artificial Neural Networks much used in BABAR and Belle: Fisher discriminants introduced by D0 (for electron id): H-Matrix used by MiniBooNE and recently by BABAR: Boosted Decision Trees All interesting methods … but how to dissipate the widespread skepticism ? black boxes ! what if the training samples incorrectly describe the data ? how can one evaluate systematics ?
3
3 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – What is TMVA Toolkit for Multivariate Analysis (TMVA): provides a ROOT-integrated environment for the parallel processing and evaluation of MVA techniques to discriminate signal from background samples. Note that TMVA does not seek to promote a particular method ! The TMVA analysis provides training, testing and evaluation of the MVAs Each MVA method provides a ranking of the input variables MVAs produce weight files that are read by reader class for actual MVA analysis TMVA presently includes (ranked by complexity): Rectangular cut optimisation Likelihood estimator (PDE approach) Multi-dimensional likelihood estimator (PDE - range-search approach) Fisher discriminant H-Matrix approach ( 2 estimator) Multilayer Perceptron Artificial Neural Network (three different implementations) Boosted Decision Trees RuleFit upcoming: Support Vector Machines, “Committee” methods Common preprocessing of input data: decorrelation, principal-component analysis
4
4 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – TMVA is a sourceforge (SF) package to accommodate world-wide access code can be downloaded as tar file, or via anonymous or developer cvs access home page ……………………………http://tmva.sourceforge.net/http://tmva.sourceforge.net/ SF project page ………………………http://sourceforge.net/projects/tmvahttp://sourceforge.net/projects/tmva view CVS ……………………………..http://tmva.cvs.sourceforge.net/tmva/TMVA/http://tmva.cvs.sourceforge.net/tmva/TMVA/ mailing lists ……………………………http://sourceforge.net/mail/?group_id=152074http://sourceforge.net/mail/?group_id=152074 ROOT::TMVA class index…………...http://root.cern.ch/root/htmldoc/TMVA_Index.htmlhttp://root.cern.ch/root/htmldoc/TMVA_Index.html TMVA Technicalities TMVA is written in C++ and is built upon ROOT Since 5.11/03 integrated and distributed within ROOT TMVA is modular training, testing and evaluation factory iterates over all registered methods each method has specific options that can be conveniently set by the user ROOT scripts are provided for most important performance assessments We enthusiastically (!) welcome new users and developers at present: 18 SF-registered TMVA developers ! (http://sourceforge.net/project/memberlist.php?group_id=152074) (http://sourceforge.net/project/memberlist.php?group_id=152074)
5
5 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – TMVA Methods
6
6 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Cut Optimisation Simplest method: cut in rectangular volume using N var input variables Usually training files in TMVA do not contain realistic signal and background abundance cannot optimize for best significance scan in signal efficiency [0 1] and maximise background rejection Technical problem: how to perform maximisation Minuit fit (SIMPLEX) found to be not reliable enough Robust, but suboptimal: random sampling New available techniques: Genetics Algorithm and Simulated Annealing (SA not yet optimal) Huge speed improvement by sorting training events in N var -dim. Binary Trees for 4 variables: 41 times faster than simple volume cut
7
7 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – automatic, unbiased, but suboptimal easy to automate, can create artefacts; currently in TMVA: Splines0-5, Gaussian KDEs difficult to automate Combine probability density distributions to likelihood estimator Projected Likelihood Estimator (PDE Approach) discriminating variables Species: signal, background types Likelihood ratio for event i event PDFs Assumes uncorrelated input variables optimal MVA approach if true, since containing all the information performance reduction if not true reason for development of other methods! Technical problem: how to implement reference PDFs 3 ways: function fitting parametric fitting (splines, kernel est.) counting
8
8 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Linear correlations can be removed by diagonalising the input variable space Preprocessing the Input Variables: Decorrelation Various ways to choose diagonalisation (also implemented: principal component analysis) Determine square-root C of correlation matrix C, i.e., C = C C compute C by diagonalising C: transformation from original (x) in de-correlated variable space (x) by: x = C 1 x Note that this “de-correlation” is only complete, if: input variables are Gaussians correlations linear only in practise: gain form de-correlation often rather modest – or even harmful! Commonly realised for all methods in TMVA (centrally in DataSet class)
9
9 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Multidimensional Likelihood Estimator Carli-Koblitz, NIM A501, 576 (2003) Generalisation of 1D PDE approach to N var dimensions Optimal method – in theory – since full information is used Practical challenges: parameterisation of multi-dimensional phase space needs huge training samples implementation of N var -dim. reference PDF with kernel estimates or counting for kernel estimates: difficult to control fidelity of parameterisation TMVA implementation following Range-Search method count number of signal and background events in “vicinity” of data event “vicinity” defined by fixed or adaptive N var -dim. volume size volumes can be rectangular or elliptical use multi-D kernels (Gaussians, triangular, …) to weight events within volume speed up range search by sorting training events in Binary Trees
10
10 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Well-known, simple and elegant MVA method: event selection is performed in a transformed variable space with zero linear correlations, by distinguishing the mean values of the signal and background distributions Fisher Discriminant (and H-Matrix) Instead of equations, words: optimal for linearly correlated Gaussians with equal RMS’ and different means no separation if equal means and different RMS (shapes) An axis is determined in the (correlated) hyperspace of the input variables such that, when projecting the output classes (signal and background) upon this axis, they are pushed as far as possible away from each other, while events of a same class are confined in a close vicinity. The linearity property of this method is reflected in the metric with which "far apart" and "close vicinity" are determined: the covariance matrix of the discriminant variable space. Computation of Fisher MVA couldn’t be simpler: “Fisher coefficents” H-Matrix estimator: correlated 2 – poor man’s variation of Fisher discriminant
11
11 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Artificial Neural Network (ANN) 1 i...... N 1 input layerk hidden layers1 ouput layer 1 j M1M1............ 1...... MkMk 2 output classes (signal and background) N var discriminating input variables............ (“Activation” function) with: TMVA has three different ANN implementations – all are Multilayer Perceptrons 1.Clermont-Ferrand ANN: used for ALEPH Higgs analysis; translated from FORTRAN 2.TMultiLayerPerceptron interface: ANN implemented in ROOT 3.MLP-ANN (new): similar to TMultiLayerPerceptron but faster and more flexible recent comparison on tau-ID in ATLAS with Stuttgart NN confirmed excellent performance! ANNs are non-linear discriminants: Fisher = ANN without hidden layer ANNs seem to be better adapted to realistic use cases than Fisher and Likelihood Multilayer Perceptron
12
12 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Decision Trees ROOT-Node N signal N bkg depth: 1 depth: 3 depth: 2 Node N signal N bkg Node N signal N bkg Leaf-Node N signal <N bkg bkg Leaf-Node N signal >N bkg signal Leaf-Node N signal <N bkg bkg Leaf-Node N signal >N bkg signal Decision tree Decision Trees: a sequential application of “cuts” which splits the data into nodes, and the final nodes (leaf) classifies an event as signal or background Pros: robust out-of-the-box method immune against addition of weak variables Contras: instability: small changes in training sample can give large changes in tree structure overtraining: requires huge input statistics Boosting (bagging, …), pruning: combine several decision trees into a forest, where each tree is optimised for different event weights; can also give overall weights to trees in forest pruning: remove statistically insignificant nodes
13
13 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – RuleFit Ref: Friedman-Popescu: http://www-stat.stanford.edu/~jhf/R-RuleFit.html http://www-stat.stanford.edu/~jhf/R-RuleFit.html Model is a linear combination of rules; a rule is a sequence of cuts: The problem to solve is: 1. create the rule ensemble created from a set of decision trees 2. fit the coefficients “Gradient direct regularization” (Friedman et al) Fast, robust and good performance rules coefficientsscore data Methods tested: 1.Multi Layer Perceptron (MLP) 2.Fisher 3.RuleFit Test on 4k simulated SUSY + top events with 4 discriminating variables
14
14 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – TMVA – an example
15
15 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Simple toy to illustrate the strength of the de-correlation technique 4 linearly correlated Gaussians, with equal RMS and shifted means between S and B A Truly Academic Example --- TMVA_Factory: correlation matrix (signal): ---------------------------------------------------------------- --- var1 var2 var3 var4 --- var1: +1.000 +0.336 +0.393 +0.447 --- var2: +0.336 +1.000 +0.613 +0.668 --- var3: +0.393 +0.613 +1.000 +0.907 --- var4: +0.447 +0.668 +0.907 +1.000 --- TMVA_MethodFisher: ranked output (top --- variable is best ranked) ---------------------------------------------------------------- --- Variable : Coefficient: Discr. power: ---------------------------------------------------------------- --- var4 : +8.077 0.3888 --- var3 : 3.417 0.2629 --- var2 : 0.982 0.1394 --- var1 : 0.812 0.0391 TMVA output :
16
16 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Quality Assurance Likelihood reference PDFs MLP convergence MLP architecture
17
17 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – TMVA output distributions for Fisher, Likelihood, BDT and MLP The Output Fisher, LikelihoodD, ANNs are optimal Note: About All Realistic Use Cases are Much More Difficult Than This One
18
18 ROOT meeting, CERN, Oct 06, 2006 Andreas Höcker – Concluding Remarks TMVA is still a young project ! TMVA is open source: we want to gather as much development help as possible (from all areas, including real experts from statistics faculties – there is no one so far!) to make it to a reference tool in HEP New MVA methods are important, but the emphasis lies on the consolidation of the currently implemented methods: Implement fully independent, phase-space-dependent MVAs Implement use of event weights throughout all methods (missing: Cuts, Fisher, H-Matr., RuleFit) Improve kernel estimators Improve decorrelation techniques Expect performance gain from combination of MVAs in Committees Although the number of users rises, still need more real-life experience ! Need to write a manual !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.