TMVA – Toolkit for Multivariate Analysis

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
ECG Signal processing (2)
Component Analysis (Review)
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
What is Statistical Modeling
x – independent variable (input)
Sparse vs. Ensemble Approaches to Supervised Learning
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Machine Learning CMPT 726 Simon Fraser University
Three kinds of learning
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Sparse vs. Ensemble Approaches to Supervised Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen1 Backups Jens Zimmermann Max-Planck-Institut für Physik, München Forschungszentrum.
Ensemble Learning (2), Tree and Forest
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine Learning CS 165B Spring 2012
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Efficient Model Selection for Support Vector Machines
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Michigan REU Final Presentations, August 10, 2006Matt Jachowski 1 Multivariate Analysis, TMVA, and Artificial Neural Networks Matt Jachowski
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
TMVA Jörg Stelzer: Machine Learning withCHEP 2007, Victoria, Sep 5 Machine Learning Techniques for HEP Data Analysis with TMVA Toolkit for Multivariate.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Training of Boosted DecisionTrees Helge Voss (MPI–K, Heidelberg) MVA Workshop, CERN, July 10, 2009.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
Linear Models for Classification
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Mining and Decision Support
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
One framework for most common MVA-techniques, available in ROOT Have a common platform/interface for all MVA classification and regression-methods: Have.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Multivariate Data Analysis with TMVA4 Jan Therhaag ( * ) (University of Bonn) ICCMSE09 Rhodes, 29. September 2009 ( * ) On behalf of the present core developer.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
CS 9633 Machine Learning Support Vector Machines
Deep Feedforward Networks
Multivariate Methods of
ISTEP 2016 Final Project— Project on
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Ensemble learning.
Model generalization Brief summary of methods
Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.
Support Vector Machines 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

TMVA – Toolkit for Multivariate Analysis Jörg Stelzer(*) – DESY, Hamburg ACAT 2008, 3rd-7th Nov, Erice, Sicily (*)For the TMVA developer team: A. Höcker, P.Speckmayer, J.Stelzer, H.Voss and many contributors

MVA in HEP In a time of ever larger dataset with an ever smaller signal fraction it becomes increasingly important to use all the features of signal and background data that are present. That means that not only the variable distributions of the signal must be evaluated against the background, but in particular the information hidden in the variable correlations must be explored. The possible complexities of the data distributed in a high-dimensional space are difficult to disentangle manually and without the help of automata. Machinated multivariate analysis needed for data analysis in High Energy Physics 11/3/2008

Outline of the Presentation The basics of multivariate analysis? A short introduction to the idea of TMVA and its history Quick survey of available classifiers And how to judge their performance An outlook to TMVA 4 11/3/2008

Event Classification Suppose data sample with two types of events: H0, H1 We have found discriminating input variables x1, x2, … What decision boundary should be used to separate events of type H0 and H1 ? How can we decide this in an optimal way ?  Let the machine do it using Multivariate Classifiers! A linear boundary? A nonlinear one? Rectangular cuts? H1 H0 x1 x2 11/3/2008

Multivariate (Binary) Classifiers All multivariate binary classifiers have in common to condense (correlated) multi-variable input information in a single scalar output variable Regression problem from N-dimensional feature space into 1 dimensional real number y: RN  R Classification: Mapping from R to the class (binary: signal or background) Definition of a “cut”: signal y>=Xcut, background y<Xcut Choice of cut driven by the type of analysis Cut classifier is an exception: Direct mapping from RN {Signal,Background} RN R y(x) y(H0) y(H1) Xcut 11/3/2008

TMVA – A Simple Idea Large number of classifiers exist in different places and languages Neural net libraries, BDT implementations, Likelihood fitting package TMVA-approach: rather than just re-implementing MVA techniques in ROOT: Have one common platform / interface for all MVA classifiers Have common data pre-processing capabilities Provide common data input and analysis framework (ROOT scripts) Train and test all classifiers on same data sample and evaluate consistently Classifier application w/ and w/o ROOT, through macros, C++ executables or python TMVA is hosted on SourceForge http://tmva.sf.net/ (mailing list) Integrated in ROOT since ROOT v5.11/03 Currently 4 core developers, and many active contributors Users guide arXiv physics/0703039 11/3/2008

Features of the TMVA Data Handling Data input format: ROOT TTree or ASCII Supports selection of any subset or combination or function of available variables and arrays (var1=“sin(x)+3*y”, var2=“…”) just like ROOT’s TTree::Draw() Supports application of pre-selection cuts (possibly independent for signal and bkg) Supports global event weights for signal or background input files Supports use of any input variable as individual event weight Supports various methods for splitting data into training and test samples: Block wise, randomly, periodically User defined training and test trees Preprocessing of input variables (e.g., decorrelation, PCA) Improves performance of projective Likelihood 11/3/2008

Quick Overview Of The Methods Conventional Linear Classifiers Cut based (still widely used since transparent) Projective likelihood estimator (optimal if no correlations) Linear Fisher Discriminant (robust and fast) Still Popular! Should only be used if non-linear correlations are absent Common Non-linear Classifiers Neural Network (very powerful, but training often ends in local minima) PDE: Range Search, kNN, Foam (multi-dim LHood  optimal classification) Function Discriminant Analysis Modern classifiers recent in HEP Boosted Decision Tree (brute force, not much tuning necessary) Support Vector Machine (one local minima, careful tuning necessary) Learning via rule ensembles Detailed description of all classifiers and references in the Users Guide! 11/3/2008

Evaluation Training of classifiers: Users Guide, presentations (http://tmva.sf.net/talks.shtml) or tutorial (https://twiki.cern.ch/twiki/bin/view/TMVA/WebHome) New web-page: classifier tuning parameters at http://tmva.sf.net/optionRef.html Classifier output variable (test sample) Evaluation: Receiver operating characteristics ROC Curve Scan over classifier output variable creates set of (εsig,εbkgd) points 11/3/2008

The ROC Curve ROC (receiver operating characteristics) curve describes performance of a binary classifier by plotting the false positive vs. the true positive fraction False positive (type 1 error) = εbackgr: classify background event as signal loss of purity False negative (type 2 error) = 1-εsignal: fail to identify a signal event as such loss of efficiency True positive: εsignal Likelihood ratio test The Likelihood ratio used as “selection criterion” y(x) gives for each selection efficiency the best possible background rejection (Neyman-Pearson) It maximizes the area under the ROC-curve (PDE classifiers) 1 1- ebackgr. esignal random guessing good classification better classification “limit” in ROC curve given by likelihood ratio 11/3/2008

Optimal Cut for Each Classifier Working Point: Optimal cut on a classifier output (=optimal point on ROC curve) depends on the problem Cross section measurement: maximum of S/√(S+B) Search: maximum of S/√(B) Precision measurement: high purity Trigger selection: high efficiency 11/3/2008

No Single Best    / Criteria Classifiers Cuts Likeli-hood PDERS/ k-NN H-Matrix Fisher MLP BDT RuleFit SVM Perfor-mance no / linear correlations   nonlinear correlations  Speed Training Response / Robust-ness Overtraining Weak input variables Curse of dimensionality Transparency 11/3/2008

TMVA 4 – Preview on New Developments Data Regression Categorization: multi-class classification Automated classifier tuning: using cross-validation method Generic boost or bag of any classifiers Composite classifiers (parallel training in different phase space regions) Input data handling Arbitrary combination of dataset transformations possible Status: changed TMVA framework in handling datasets and classifiers, implemented regression training for most classifiers. User interface extended. New Method PDE Foam Based on the ROOT TFoam by S.Jadach eprint physics/0203033 See next talk 11/3/2008

Multivariate Regression “Classifiers” try to describe the functional dependence Example: predict the energy correction of jet clusters Classification: RNR {0,1,..,N} Regression: RNR Training: instead of specifying sig/bkgr, provide a regression target Multi-dim target space possible Does not work for all methods! Δtarget vs. target on test sample for different “classifiers” Linear Discriminator SVM Functional Discriminator PDE Foam Example: target as function of two variables MLP PDE Range Search 11/3/2008

Automated Classifier Tuning Many classifiers have parameters that, being tuned, improve the performance Overtraining: Performance on test sample worse than on training sample, classifier follows the particular features of the training sample to well Protect by choosing right MLP training cycles, BDT splitting criteria PDE range size, BDT pruning strength, Neural network structure Method for automated parameter tuning: Cross-Validation (aka Rotation-Estimation) Special choice of K-fold cross-validation: Divide the data sample into K sub-sets For set of parameters α train K classifiers Ci(α), i=1..K, omitting each time the i-th subset from the training to use as test sample Calculate test error for each Ci and average Choose tuning parameter set α for which is minimum and train the final classifier using all data Train Test Test Test Test Test 11/3/2008

Multiclass Classification Some classifiers support this naturally: MLP, all Likelihood based approaches Others classifiers need special treatment: training 1 class against rest and make a class decision based on LH ratios which are based on the training output CLASS 1 CLASS 2 CLASS 3 CLASS 4 CLASS 5 CLASS 6 SIGNAL BACKGROUND 11/3/2008

Generic Classifier Boosting Principle: of multiple training cycles, each time wrongly classified events get a higher event weight Tests should be performed with Cuts, MLP, and SVM Likelihood based classifiers and Fisher can’t be boosted original event weights Classifier1 E1 1 Training sample, N weight sets train N classifiers E = N/Nwrong-1 event weights 2 Re-weight wrong events Classifier2 E2 event weights N ClassifierN … Response is weighted sum of each classifier response 11/3/2008

Further Plans TMVA to be made multi-threaded and run on multi-core architectures Automatic tuning time consuming. Many users have multi-core machines and could take advantage of parallel computing Composite classifiers Let’s say you want to train a NN for tau selection differently in the detector barrel and endcap regions. The composite classifier can set this up and train automatically, need to specify the variable the region selector is based on (η) Let’s assume an N-class problem, and you want to use N Fisher classifiers one for each class to separate it from the rest. Then one could use a neural net on top to get a multi-class classifier Just for convenience, can already be done manually Current stable TMVA version 3.9.5 for ROOT 5.22 (middle of December), afterwards moving to TMVA 4 Not everything at once: Regression and generic classifier boosting Multi-class classification, automatic classifier tuning Multi-core some time along the way 11/3/2008

Summary Event selection in multi-dimensional parameter space should be done using multivariate techniques Complex correlations between input variables need “machine treatment” Optimal event selection is based on the likelihood ratio PDE RS, kNN, and Foam estimate the probability density in N dimensions but suffer from high dimensions Other classifiers with similar performance can be used in many cases: Fishers Linear discriminant  simple and robust Neural networks  very powerful but difficult to train (multiple local minima) Support Vector machines  one global minimum but needs careful tuning Boosted Decision Trees  “brute force method” with very good performance Interesting new features to come in TMVA 4 Regression will be very useful for a large variety of applications Automatic tuning of parameters will improve quality of results, no need to be an expert 11/3/2008