DESY Computing Seminar Hamburg

Slides:



Advertisements
Similar presentations
HEP Data Mining with TMVA
Advertisements

S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
ECG Signal processing (2)
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
TMVA ― Status and Developments
TMVA TMVA Workshop 2011, CERN, Jan 21 Cuts and Likelihood Classifiers in TMVA Jörg Stelzer – Michigan State University TMVA Workshop 2011, CERN, Geneva,
TMVA – Toolkit for Multivariate Analysis
Introduction to Statistics and Machine Learning 1 How do we: understandunderstand interpretinterpret our measurements How do we get the data for our measurements.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
8. Hypotheses 8.4 Two more things K. Desch – Statistical methods of data analysis SS10 Inclusion of systematic errors LHR methods needs a prediction (from.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Lecture II-2: Probability Review
Ensemble Learning (2), Tree and Forest
Radial Basis Function Networks
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
This week: overview on pattern recognition (related to machine learning)
TMVA Andreas Höcker (CERN) CERN meeting, Oct 6, 2006 Toolkit for Multivariate Data Analysis.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Machine Learning with TMVA A ROOT based Tool for Multivariate Data Analysis PANDA Computing Workshop Groningen The TMVA developer team: The TMVA.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
1 Helge Voss Nikhef 23 rd - 27 th April 2007TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 TMVA Toolkit for Multivariate Data Analysis with ROOT.
Michigan REU Final Presentations, August 10, 2006Matt Jachowski 1 Multivariate Analysis, TMVA, and Artificial Neural Networks Matt Jachowski
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
TMVA Jörg Stelzer: Machine Learning withCHEP 2007, Victoria, Sep 5 Machine Learning Techniques for HEP Data Analysis with TMVA Toolkit for Multivariate.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
1 / 41 LAL Seminar, June 21, 2007A. Hoecker: Machine Learning with TMVA Machine Learning Techniques for HEP Data Analysis with TMVA Andreas Hoecker ( *
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Helge Voss Genvea 28 th March 2007ROOT 2007: TMVA Toolkit for MultiVariate Analysis TMVA A Toolkit for MultiVariate Data Analysis with ROOT Andreas Höcker.
1 Top Workshop, LPSC, Oct 18–20, 2007A. Hoecker: Multivariate Analysis with TMVA Machine Learning Techniques for HEP Data Analysis with TMVA Andreas Hoecker.
Training of Boosted DecisionTrees Helge Voss (MPI–K, Heidelberg) MVA Workshop, CERN, July 10, 2009.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Krakow Seminar, Feb 27, 2007A. Hoecker: Data Mining with TMVA HEP Data Mining with TMVA  ToolKit for Multivariate Analysis with ROOT  Andreas Hoecker.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב
1 Teilchenseminar, April 26, 2007K. Voss: TMVA toolkit TMVA  ToolKit for Multivariate Analysis with ROOT  Kai Voss ( * ) Teilchenseminar, Bonn, April.
EEE502 Pattern Recognition
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
1 / 44 LPNHE Seminar, June 20, 2007A. Hoecker: Machine Learning with TMVA Machine Learning Techniques for HEP Data Analysis with TMVA Andreas Hoecker (
1 CATPhys meeting, CERN, Mar 13, 2006 Andreas Höcker – TMVA – toolkit for parallel multivariate data analysis – Andreas Höcker (ATLAS), Helge Voss (LHCb),
Multivariate Classifiers or “Machine Learning” in TMVA
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Multivariate Data Analysis with TMVA Andreas Hoecker ( * ) (CERN) SOS, Strasbourg, France, July 4, 2008 ( * ) On behalf of the present core team: A. Hoecker,
One framework for most common MVA-techniques, available in ROOT Have a common platform/interface for all MVA classification and regression-methods: Have.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Multivariate Data Analysis with TMVA4 Jan Therhaag ( * ) (University of Bonn) ICCMSE09 Rhodes, 29. September 2009 ( * ) On behalf of the present core developer.
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 9633 Machine Learning Support Vector Machines
Multivariate Methods of
Multivariate Data Analysis with TMVA
Multivariate Data Analysis
Multi-dimensional likelihood
ISTEP 2016 Final Project— Project on
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Toolkit for Multivariate Data Analysis Helge Voss, MPI-K Heidelberg
Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.
Measurement of the Single Top Production Cross Section at CDF
Support Vector Machines 2
Presentation transcript:

DESY Computing Seminar Hamburg 14.1.2008 Machine Learning with TMVA A ROOT based Tool for Multivariate Data Analysis DESY Computing Seminar Hamburg 14.1.2008 The TMVA developer team: Andreas Höcker, Peter Speckmeyer, Jörg Stelzer, Helge Voss

General Event Classification Problem Event described by k variables (that are found to be discriminating)  (xi)  k Events can be classified into n categories: H1 … Hn General classifier: f: k  , (xi)  {1,…,n} TMVA: only n=2 Commonly the case in HEP (signal/background) Most classification methods f: k  d, (xi)(yi) Further: d  , (yi){1,…,n} TMVA: d=1  y≥ysep: signal, y<ysep: background H2 H1 x1 x2 H3 Example: k=2, n=3 DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

General Event Classification Problem Example: k=2 variables x1,2, n=3 categories H1, H2, H3 The problem: How to draw the boundaries between H1, H2, and H3 such that f(x) returns the true nature of x with maximum correctness H2 H1 x1 x2 H3 Rectangular Cuts ? H2 H1 x1 x2 H3 Linear Boundaries ? H2 H1 x1 x2 H3 Non-linear Boundaries ? Simple example  I can do it by hand. DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

General Event Classification Problem Large input variable space, complex correlations: manual optimization very difficult 2 general ways to build f(x): Supervised learning: in an event sample the category of each event is known. Machine adapts to give the smallest misclassification error on training sample. Unsupervised learning: the correct category of each event is unknown. Machinery tries to discover structures in the dataset All classifiers in TMVA are supervised learning methods What is the optimal boundary f(x) to separate the categories More pragmatic: Which classifier is best to find this optimal boundary (or estimates it closest) Machine Learning DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Classification Problems in HEP In HEP mostly two class problems – signal (S) and background (B) Event level (Higgs searches, …) Cone level (Tau-vs-jet reconstruction, …) Track level (particle identification, …) Lifetime and flavour tagging (b-tagging, …) ... Input information Kinematic variables (masses, momenta, decay angles, …) Event properties (jet/lepton multiplicity, sum of charges, …) Event shape (sphericity, Fox-Wolfram moments, …) Detector response (silicon hits, dE/dx, Cherenkov angle, shower profiles, muon hits, …) … DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Classifiers in TMVA

Rectangular Cut Optimization Intuitive and simple: rectangular volumes in variable space Technical challenge: cut optimization: MINUIT fit: (simplex) was found not to be reliable Monte Carlo sampling: random scanning of parameter space inefficient for large number of input variables Genetic algorithm: preferred method Samples of cut-sets (a population) are evaluated, the fittest individuals are cross-bred (including mutation) to create a new generation The Genetic Algorithm can also be used as standalone optimizer, outside the TMVA framework Simulated annealing: still need to optimize its performance Simulated slow cooling of metal, introduce temperature dependent perturbation probability to recover from local minima Cuts usually benefit from prior decorrelation of cut variables DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Projective Likelihood Estimator (PDE) Probability density estimators for each input variable combined in likelihood estimator Optimal MVA approach, if variables are uncorrelated In practice rarely the case, solution: de-correlate input or use different method Reference PDFs are automatically generated from training data: Histograms (counting), splines (order 2,3,5), or unbinned kernel estimator Output of likelihood estimator often strongly peaked at 0 and 1. To ease output parameterization TMVA applies inverse Fermi transformation. Reference PDF’s DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Estimating PDF Kernels Technical challenge: how to estimate the PDF shapes 3 ways: We have chosen to implement nonparametric fitting in TMVA Binned shape interpolation using spline functions (orders: 1, 2, 3, 5) Unbinned kernel density estimation (KDE) with Gaussian smearing TMVA performs automatic validation of goodness-of-fit Easy to automate, can create artefacts/suppress information Difficult to automate for arbitrary PDFs parametric fitting (function) nonparametric fitting event counting Automatic, unbiased, but suboptimal original distribution is Gaussian DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer Multidimensional PDE Extension of the one-dimensional PDE approach to n dimensions Counts signal and background reference events (training sample) in the vicinity V of the test event Volume V definition: Size: fixed (defined by the data: % of Max-Min or RMS) or adaptive (define by number of events in search volume) Shape: box or ellipsoid Improve yPDERS estimate within V by using various n-D kernel estimators (function of the (normalized) distance between test- and reference events) Practical challenges: Need very large training sample (curse of dimensionality of kernel based methods) No training, slow evaluation. Search speed improvement with kd-tree event sorting Carli-Koblitz, NIM A501, 576 (2003) H1 H0 x1 x2 test event DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Fisher’s Linear Discriminant Analysis Well-known, simple and elegant MVA method Fisher analysis determines an axis in the input variable hyperspace (F1,…,Fn, such that a projection of events onto this axis separates signal and background as much as possible Optimal for linearly correlated Gaussian variables with different S and B means Variable v with the same S and B sample mean  Fv=0 Projection: W: sum of S and B covariance matrices Fisher Coefficients: classifier: Function discriminant analysis (FDA) Fit any user-defined function of input variables requiring that signal events return 1 and background 0 Parameter fitting: Genetics Alg., MINUIT, MC and combinations Easy reproduction of Fisher result, but can add nonlinearities Very transparent discriminator New DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Artificial Neural Network (ANN) Multilayer perceptron: fully connected, feed forward, k hidden layers ANNs are non-linear discriminants Non linearity from activation function. (Fisher is an ANN with linear activation function) Training: back-propagation method Randomly feed signal and background events to MLP and compare the desired output {0,1} with the received output (0,1): ε = d - r Correct weights, depending on ε and learning rate η 1 input layer k hidden layers 1 ouput layer 1 1 . . . 1 . . . . . . . . . 1 output variable Nvar discriminating input variables i j Mk . . . . . . Weierstrass theorem: MLP can approximate every continuous function to arbitrary precision with just one layer and infinite number of nodes N M1 y’j Typical activation function A v’j DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Boosted Decision Trees (BDT) A DT is a series of cuts that split sample set into ever smaller sets, leafs are assigned either S or B status Classifies events by following a sequence of cuts depending on the events variable content until a S or B leaf Growing Each split try to maximizing gain in separation (Gini-index) DT dimensionally robust and easy to understand but not powerful 1. Pruning Bottom-up pruning of a decision tree Protect from overtraining by removing statistically insignificant nodes S,B S1,B1 S2,B2 2. Boosting (Adaboost) Increase the weight of incorrectly identified events  build new DT Final classifier: ‘forest’ of DT’s linearly combined Large coefficient for DT with small misclassification Improved performance and stability BDT requires only little tuning to achieve good performance DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Predictive Learning via Rule Ensembles (RuleFit) Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003 Following RuleFit approach by Friedman-Popescu Model is linear combination of rules, where a rule is a sequence of cuts defining a region in the input parameter space The problem to solve is Create rule ensemble: use forest of decision trees either from a BDT, or from a random forest generator (TMVA) Fit coefficients am, bk, minimizing risk of misclassification (Friedman et al.) Pruning removes topologically equal rules” (same variables in cut sequence) rules (cut sequence  rm=1 if all cuts satisfied, =0 otherwise) normalised discriminating event variables RuleFit classifier Linear Fisher term Sum of rules DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Support Vector Machine Find hyperplane that between linearly separable signal and background (1962) Best separation: maximum distance (margin) between closest events (support) to hyperplane Wrongly classified events add extra term to cost-function which is minimized x1 x3 x2 x1 x2 x2 support vectors Non-separable data Separable data optimal hyperplane (x1,x2) margin x1 Non-linear cases: Transform variables into higher dimensional space where again a linear boundary (hyperplane) can separate the data (only mid-’90) Explicit transformation form not required, cost function depends on scalar product between events: use Kernel Functions to approximate scalar products between transformed vectors in the higher dimensional space Choose Kernel and fit the hyperplane using the linear techniques developed above Available Kernels: Gaussian, Polynomial, Sigmoid DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Data Preprocessing: Decorrelation Various classifiers perform sub-optimal in the presence of correlations between input variables (Cuts, Projective LH), others are slower (BDT, RuleFit) Removal of linear correlations by rotating input variables Determine square-root C of covariance matrix C, i.e., C = CC Transform original (xi) into decorrelated variable space (xi) by: x = C 1x Also implemented Principal Component Analysis (PCA) Note that decorrelation is only complete, if Correlations are linear Input variables are Gaussian distributed Not very accurate conjecture in general original SQRT derorr. PCA derorr. DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Is there a best Classifier Performance In the presence/absence of linear/nonlinear correlations Speed Training / evaluation time Robustness, stability Sensitivity to overtraining, weak input variables Size of training sample Dimensional scalability Do performance, speed, and robustness deteriorate with large dimensions Clarity Can the learning procedure/result be easily understood/visualized DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

No Single Best    / Criteria Classifiers DESY, Hamburg 14.1.2008 Cuts Likeli-hood PDERS/ k-NN H-Matrix Fisher MLP BDT RuleFit SVM Perfor-mance no / linear correlations   nonlinear correlations  Speed Training Response / Robust-ness Overtraining Weak input variables Curse of dimensionality Clarity DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer What is TMVA Motivation: Classifiers perform very different depending on the data, all should be tested on a given problem Situation for many year: usually only a small number of classifiers were investigated by analysts Needed a Tool that enables the analyst to simultaneously evaluate the performance of a large number of classifiers on his/her dataset Design Criteria: Performance and Convenience (A good tool does not have to be difficult to use) Training, testing, and evaluation of many classifiers in parallel Preprocessing of input data: decorrelation (PCA, Gaussianization) Illustrative tools to compare performance of all classifiers (ranking of classifiers, ranking of input variable, choice of working point) Actively protect against overtraining Straight forward application to test data Special needs of high energy physics addressed Two classes, events weights, familiar terminology DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Using TMVA A typical TMVA analysis consists of two main steps: Training phase: training, testing and evaluation of classifiers using data samples with known signal and background composition Application phase: using selected trained classifiers to classify unknown data samples Using TMVA

Multivariate Analysis with TMVA - Jörg Stelzer Technical Aspects TMVA is open source, written in C++, and based on and part of ROOT Development on SourceForge, there is all the information http://sf.tmva.net Bundled with ROOT since 5.11-03 Training requires ROOT-environment, resulting classifiers also available as standalone C++ code Six core developers, many contributors > 1400 downloads since Mar 2006 (not counting ROOT users) Mailing list for reporting problems Users Guide at http://sf.tmva.net: 97p., classifier descriptions, code examples arXiv physics/0703039 DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer Training with TMVA User usually starts with template TMVAnalysis.C Choose training variables Choose input data Select classifiers (adjust training options – described in the manual by specifying option ‘H’) TMVA GUI Template TMVAnalysis.C (also .py) available at $TMVA/macros/ and $ROOTSYS/tmva/test/ DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Evaluation Output Remark on overtraining Evaluation results ranked by best signal efficiency and purity (area) ------------------------------------------------------------------------------ MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi- Methods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance: Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189 MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260 LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251 PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192 RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390 1.092 HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093 BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099 CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000 Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682 Testing efficiency compared to training efficiency (overtraining check) MVA Signal efficiency: from test sample (from training sample) Methods: @B=0.01 @B=0.10 @B=0.30 Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873) MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873) LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872) PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881) RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848) HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868) BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911) CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715) Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677) ----------------------------------------------------------------------------- Remark on overtraining Occurs when classifier training becomes sensitive to the events of the particular training sample, rather then just to the generic features Sensitivity to overtraining depends on classifier: e.g., Fisher insensitive, BDT very sensitive Detect overtraining: compare performance between training and test sample Counteract overtraining: e.g., smooth likelihood PDFs, prune decision trees, … Better classifier DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

More Evaluation Output Input Variable Ranking --- Fisher : Ranking result (top variable is best ranked) --- Fisher : ---------------------------------------------------------------- --- Fisher : Rank : Variable : Discr. power --- Fisher : 1 : var4 : 2.175e-01 --- Fisher : 2 : var3 : 1.718e-01 --- Fisher : 3 : var1 : 9.549e-02 --- Fisher : 4 : var2 : 2.841e-02 Better variable how useful is a variable? Classifier correlation and overlap --- Factory : Inter-MVA overlap matrix (signal): --- Factory : ------------------------------ --- Factory : Likelihood Fisher --- Factory : Likelihood: +1.000 +0.667 --- Factory : Fisher: +0.667 +1.000 do classifiers perform the same separation into signal and background? If two classifiers have similar performance, but significant non-overlapping classifications  check if you can combine them! DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer Graphical Evaluation Classifier output distributions for independent test sample: DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer Graphical Evaluation There is no unique way to express the performance of a classifier  several benchmark quantities computed by TMVA Signal eff. at various background effs. (= 1 – rejection) when cutting on classifier output The Separation: “Rarity” implemented (background flat): Comparison of signal shapes between different classifiers Quick check: background on data should be flat DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Visualization Using the GUI Projective likelihood PDFs, MLP training, BDTs, … average no. of nodes before/after pruning: 4193 / 968 DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Choosing a Working Point Depending on the problem the user might want to Achieve a certain signal purity, signal efficiency, or background reduction, or Find the selection that results in the highest signal significance (depending on the expected signal and background statistics) Using the TMVA graphical output one can determine at which classifier output value he needs to cuts to separate signal from background DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Applying the Trained Classifier Use the TMVA::Reader class, example in TMVApplication.C: Set input variables Book classifier with the weight file (contains all information) Compute classifier response inside event loop  use it Also standalone C++ class without ROOT dependence std::vector<std::string> inputVars; … classifier = new ReadMLP ( inputVars ); for (int i=0; i<nEv; i++) { std::vector<double> inputVec = …; double retval = classifier->GetMvaValue( *inputVec ); } from ClassApplication.C Templates TMVApplication.C ClassAplication.C available at $TMVA/macros/ and $ROOTSYS/tmva/test/ DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer Extending TMVA A user might have an own implementation of a multivariate classifier, or wants to use an external one With ROOT 5.18.00 (16.Jan.08) user can seamlessly evaluate and compare his own classifier within TMVA: Requirement: An own class must be derived from TMVA::MethodBase and must implement the TMVA::IMethod interface The class must be added to the factory via ROOT’s plugin mechanism Training, testing, evaluation, and comparison can then be done as usual, Example in TMVAnalysis.C DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

A Word on Treatment of Systematics? There is no principle difference in systematics evaluation between single discriminating variables and MV classifiers Control sample to estimate uncertainty on classifier output (not necessarily for each input variable) Advantage: correlations automatically taken into account Some things could be done: Example: var4 may in reality have a shifted central value and hence a worse discrimination power One can: ignore the systematic in the training var4 appears stronger in training than it might be suboptimal performance (bad training, not wrong) Classifier response will strongly depend on “var4”, and hence will have a larger systematic uncertainty Better: Train with shifted (weakened) var4 Then evaluate systematic error on classifier output DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer Conclusion Remarks Multivariate classifiers are no black boxes, we just need to understand them Cuts and Likelihood are transparent  if they perform use them In presence of correlations other classifiers are better Difficult to understand at any rate Enormous acceptance growth in recent decade in HEP TMVA provides means to train, evaluate, compare, and apply different classifiers TMVA also tries – through visualization – improve the understanding of the internals of each classifier Acknowledgments: The fast development of TMVA would not have been possible without the contribution and feedback from many developers and users to whom we are indebted. We thank in particular the CERN Summer students Matt Jachowski (Stanford) for the implementation of TMVA's new MLP neural network, Yair Mahalalel (Tel Aviv) for a significant improvement of PDERS, and Or Cohen for the development of the general classifier boosting, the Krakow student Andrzej Zemla and his supervisor Marcin Wolter for programming a powerful Support Vector Machine, as well as Rustem Ospanov for the development of a fast k-NN algorithm. We are grateful to Doug Applegate, Kregg Arms, René Brun and the ROOT team, Tancredi Carli, Zhiyi Liu, Elzbieta Richter-Was, Vincent Tisserand and Alexei Volk for helpful conversations. DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Outlook Primary development from this Summer: Generalized classifiers Be able to boost or bag any classifier Combine any classifier with any other classifier using any combination of input variables in any phase space region 1. is ready – now in testing mode. To be deployed after upcoming ROOT release.

A Few Toy Examples

Multivariate Analysis with TMVA - Jörg Stelzer Checker Board Example Performance achieved without parameter tuning: PDERS and BDT best “out of the box” classifiers After specific tuning, also SVM und MLP perform well Theoretical maximum DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Linear-, Cross-, Circular Correlations Illustrate the behavior of linear and nonlinear classifiers Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Linear-, Cross-, Circular Correlations Plot test-events weighted by classifier output (red: signal-like, blue: background-like) Linear correlations (same for signal and background) Cross-linear correlations (opposite for signal and background) Circular correlations (same for signal and background) Fisher MLP BDT PDERS Likelihood - D Likelihood DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Multivariate Analysis with TMVA - Jörg Stelzer Final Performance Background rejection versus signal efficiency curve: Circular Example Cross Example Linear Example DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

Additional Information

Stability with Respect to Irrelevant Variables Toy example with 2 discriminating and 4 non-discriminating variables: use all discriminant variables in classifiers use only two discriminant variables in classifiers DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

TMVAnalysis.C Script for Training void TMVAnalysis( ) { TFile* outputFile = TFile::Open( "TMVA.root", "RECREATE" ); TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile,"!V"); TFile *input = TFile::Open("tmva_example.root"); factory->AddSignalTree ( (TTree*)input->Get("TreeS"), 1.0 ); factory->AddBackgroundTree ( (TTree*)input->Get("TreeB"), 1.0 ); factory->AddVariable("var1+var2", 'F'); factory->AddVariable("var1-var2", 'F'); factory->AddVariable("var3", 'F'); factory->AddVariable("var4", 'F'); factory->PrepareTrainingAndTestTree("", "NSigTrain=3000:NBkgTrain=3000:SplitMode=Random:!V" ); factory->BookMethod( TMVA::Types::kLikelihood, "Likelihood", "!V:!TransformOutput:Spline=2:NSmooth=5:NAvEvtPerBin=50" ); factory->BookMethod( TMVA::Types::kMLP, "MLP", "!V:NCycles=200:HiddenLayers=N+1,N:TestRate=5" ); factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); outputFile->Close(); delete factory; } create Factory give training/test trees register input variables select MVA methods train, test and evaluate DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer

TMVApplication.C Script for Application void TMVApplication( ) { TMVA::Reader *reader = new TMVA::Reader("!Color"); Float_t var1, var2, var3, var4; reader->AddVariable( "var1+var2", &var1 ); reader->AddVariable( "var1-var2", &var2 ); reader->AddVariable( "var3", &var3 ); reader->AddVariable( "var4", &var4 ); reader->BookMVA( "MLP classifier", "weights/MVAnalysis_MLP.weights.txt" ); TFile *input = TFile::Open("tmva_example.root"); TTree* theTree = (TTree*)input->Get("TreeS"); // … set branch addresses for user TTree for (Long64_t ievt=3000; ievt<theTree->GetEntries();ievt++) { theTree->GetEntry(ievt); var1 = userVar1 + userVar2; var2 = userVar1 - userVar2; var3 = userVar3; var4 = userVar4; Double_t out = reader->EvaluateMVA( "MLP classifier" ); // do something with it … } delete reader; } create Reader register the variables book classifier(s) prepare event loop compute input variables calculate classifier output DESY, Hamburg 14.1.2008 Multivariate Analysis with TMVA - Jörg Stelzer