Multivariate Data Analysis in HEP. Successes, challenges and future outlook "The presentation you are about to watch might look like (bad) parody. Its.

Slides:

Advertisements

Similar presentations

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 20 Jim Martin.

Advertisements

Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Matthew Schwartz Harvard University March 24, Boost 2011.

An Overview of Machine Learning

Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.

Lecture 14 – Neural Networks

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

MACHINE LEARNING AND ARTIFICIAL NEURAL NETWORKS FOR FACE VERIFICATION

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

How to do backpropagation in a brain

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Michigan REU Final Presentations, August 10, 2006Matt Jachowski 1 Multivariate Analysis, TMVA, and Artificial Neural Networks Matt Jachowski

G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.

Artificial Intelligence Techniques Multilayer Perceptrons.

Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,

Use of Multivariate Analysis (MVA) Technique in Data Analysis Rakshya Khatiwada 08/08/2007.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.

Training of Boosted DecisionTrees Helge Voss (MPI–K, Heidelberg) MVA Workshop, CERN, July 10, 2009.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.

Data Mining and Decision Support

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

Introduction to Deep Learning

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.

Chapter 6 Neural Network.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Neural Network Analysis of Dimuon Data within CMS Shannon Massey University of Notre Dame Shannon Massey1.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.

Big data classification using neural network

Multivariate Methods of

Deep Learning Amin Sobhani.

an introduction to: Deep Learning

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Multivariate Methods of

Multi-dimensional likelihood

Basic machine learning background with Python scikit-learn

Deep Learning Qing LU, Siyuan CAO.

Structure learning with deep autoencoders

ISTEP 2016 Final Project— Project on

Project on H →ττ and multivariate methods

Machine Learning 101 Intro to AI, ML, Deep Learning

Computing and Statistical Data Analysis Stat 5: Multivariate Methods

Ensemble learning.

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

What is Artificial Intelligence?

Goodfellow: Chapter 14 Autoencoders

Presentation transcript:

Multivariate Data Analysis in HEP. Successes, challenges and future outlook "The presentation you are about to watch might look like (bad) parody. Its content is not fact checked. Its reporter is not a journalist. And his opinions might not be fully thought through." freely adopted from “John Stuart” Helge Voss (MPIK-Heidelberg) ACAT 2014 Prague

Multivariate Data Analysis in HEP. Successes, challenges and future outlook 2 Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT2014  A personal view of MVA history in HEP  Some highlights  Challenges  What about the future ?

A Short History Of MVA Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  Already in the very beginning intelligent “Multivariate Pattern Recognition” was used to identify particles ….  But later it became a bit ‘out of fashion’ with the advent of computers  … although I guess some Fisher-Discriminants (Linear Decision Boundary) were used her and there.. If I remember correctly my PhD supervisor mentioning such things being used back in MARKIII

A Short History Of MVA Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  TAUPID.. (my first encounter.. Again, that might absolutly not be the first or most important)  ALEPH (LEP) and later OPAL Tau-particle-identification with a “Likelihood” classifier (Naïve Bayesian) Gigi Rolandi

A Short History Of MVA Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  … and of course elsewhere.. .. Although people have always felt that the advance is somewhat slow…

A Short History Of MVA Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT … and MVA usage in ‘particle searches’ was ‘taboo’ Until LEP2 Higgs search set on to break that, sometimes facing fierce resistance which were replied to like this: “If you are out looking for a spouse and apply ‘hard cuts’, you’re also not going to get anywhere ” (Wayne ? OPAL) NOTE: by the mid ’90s, ANNs were ‘already’ out of fashion in the machine learning community. Why didn’t we pick up on SVM or “Boosted Decision Trees (1996) ??

Sophisticated Multivariate Techniques Pay! Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT The searches for the Standard Model Higgs boson carried out by the four LEP experiments extended the sensitive range well beyond that anticipated at the beginning of the LEP programme [26]. This is due to the higher energy achieved and to more sophisticated detectors and analysis techniques The LEP Working Group for Higgs Boson Searches / Physics Letters B 565 (2003) 61–75  Well… sure, the ‘more sophistication’ is NOT ONLY multivariate techniques, but it sure plays its part of it

Sophisticated Multivariate Techniques Pay! Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  And obviously … the other side of the Atlantic was at least as active…

..and even ventured to Boosted Decision Trees! Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT D0 Single Top discovery  Made BDTs popular in HEP MiniBooNE, B.Roe et.a., NIM 543(2005)

CMS Higgs Discovery (such a nice example for MVA usage) Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  MVA regression for energy calibration

CMS Higgs Discovery (such a nice example for MVA usage) Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT

LHCb B s  μμ Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT

MVA Techniques are Phantastic Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  Detector signal reconstruction (i.e. cluster energy in calorimeter)  Particle reconstruction/classification  Event classification  Automatic fault detection  At the machine?  At the detector – online histograms  …  And can be successfully employed in numerous places  Almost everything we ‘look at’ or ‘study’ is depending on multiple variables

Issues to be be concerned about Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  Which variables to choose  Which classifier – modelling flexibility  Test the generalisation properties of the fitted model  Issues with limited ‘training/testing/validation samples sizes’  And of course – the never ending story of - Systematic uncertatinties

MVA in the presence of systematic uncertainties Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  minimize “systematic” uncertainties (robustness)  “classical cuts” : do not cut near steep edges, or in regions of large sys. uncertainty  hard to translate to MVAs:  artificially degrade discriminative power (shifting/smearing) of systematically “uncertain” observables IN THE TRAINING  remove/smooth the ‘edges’  MVA does not try to exploit them Signal Background

MVA in the presence of systematic uncertainties Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT MVA-decision boundaries Looser MVA-cut  wider boundaries in BOTH variables You actually want a boundary like THIS Tight boundaries in var1 Loose boundaries in var0  YES it works !  Sure, it is of course similar to ‘mixing different Monte Carlos’ as proposed earlier by others…

What are MVAs ? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT

Multivariate Event Classification Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  Each event, if Signal or Background, has “D” measured variables.  D “feature space” y(B)  0, y(S)  1   y(x): “test statistic” in D-dimensional space of input variables y(x): R n  R:  distributions of y(x): PDF S (y) and PDF B (y)  overlap of PDF S (y) and PDF B (y)  separation power, purity  used to set the selection cut!  Find a mapping from D-dimensional input/observable/”feature” space  y(x)=const: surface defining the decision boundary.  efficiency and purity to one dimensional output  class labels > cut: signal = cut: decision boundary < cut: background y(x):

Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT What is y(x) ??  A neural network  A decision tree  A boosted decision tree forest  A multidimensional/projective likelihood  …  The same stuff out of which the ‘dreams’ of AI are born – or better.. which comes from AI or general ‘pattern recognition’ research  Stuff that powers nowadays ‘BIG DATA business’, search engines and social network studies for targeted advertisment …   Stuff that is extremely fast evolving !!

Where do we go from here ? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  We like to think in HEP that we are “state of the art”  In MVA techniques, we’ve certainly always lagged behind  … and the gap is in my opinion growing rather than closing !

Deep Networks Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  Yes… but look at the date ? 2014 !  Deep networks became ‘mainstream’ after 2006 when “Google learned to find cats”  It has since revolutionised the fields of “speech and image recognition”

2006 : GOOGLE finds the Cat Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT

What is “DEEP LEARNING” ? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  A fairly ‘standard’ neural network architecture with MANY hidden layers  Conventional training (backpropagation) proves ineffective  Pre-training individual layers as “auto-encoders” or “Restricted Boltzman Machines” (networks that are trained unsupervised and can ‘learn’ a probability density)

2006 ? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  When the world embraced “DEEP learning”  We celebrated MiniBooNE’s first venture into BDTs  TMVA is shipped as a ROOT “add on package”  And the majority of HEP physicists started to like MVAs  Easy accessibility of TMVA via ROOT and its ease of use are key ingredients to its success over ‘competitors’ like StatPatternRecognition or other non-HEP packages

2006 ? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  Endless discussions about possible ‘problems’ with MVA (systematic etc) show:  We (physicists) are smart and try to really understand what we are doing  But it also shows:  Sophisticated techniques are of course more difficult to grasp than ‘cuts’  To tap into the extremely valuable resource of pattern recognition techniques, the ‘physicist way’ of everyone re- inventing the wheel does not seem promising to me here.

Where do we go from Here 26 Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT2014  But is THIS the answer ??

Summary Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  MVA’s are great  MVA’s are widely used in HEP  MVA’s are even “widelier” used outside HEP  MVA’s are complicated to understand and to code !  MVA’s and work thereon still is not ‘funded’ buy HEP like “Detector/Accelerator development” for example is:  And I think we should have a much larger concentrated effort to put HEP to ‘state of the art’ in pattern recognition, then this one paid TMVA position I was unsuccessfully asking for!

Backup Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT

Neural Network “history” Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT

Deep Networks == Networks with many hidden layers Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  That’s apparently “all” it means … although ‘deep’ somehow in statistical terms would mean apparently:

Training deep networks Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  The new trick is: pre-training + final backpropagation to “fine-tune”  initialize the weights not ‘random’ but ‘sensibly” by  ‘unsupervised training of’ each individual layer, one at the time, as an: : auto-encoder (definite patterns) : restricted-Boltzmann-machine (probabilistic patterns)

Auto-Encoder Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  network that ‘reproduces’ its input  hidden layer < input layer  hidden layer ‘dimensionality reduction’ needs to ‘focus/learn’ the important features that make up the input

Restricted Boltzmann Machine Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT = input A network with ‘symmetric’ weights i.e. not ‘feed-forward’  if you ‘train’ it (i.e. determine the weights) it can ‘learn’ a probability distribution of the input (training events)

Restricted Boltzmann Machine Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT = input A network with ‘symmetric’ weights i.e. not ‘feed-forward’  if you ‘train’ it (i.e. determine the weights) it can ‘learn’ a probability distribution of the input (training events)  … aeeh.. what the hell does THAT mean ??  each network configuration (state) is ‘associated’ with an ‘energy’  the various states are populated according to the probability density given in the training data (given a particular energy I guess) (hmm… given that I understood it ‘correctly’)

Deep Network training Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT The “output” of the first layer (hidden layer of the first RBM trained) is used as input for training the second RBM etc..)

What does it do? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT

What does it do? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT Could this be: 4-vectors invariant masses decays?

Something else we missed? Helge VossMultivariate Data Analysis in HEP. Successes, challenges and future outlook. ACAT  Plenty! (somewhat embarrassingly but: as I said many times, our ‘pattern recognition’ tasks in HEP are very simple compared to “Artificial Intelligence” or even just “image recognition” )  Dropout ! (a new regularizer of weights against overfitting etc.  Bagging done in implicitly in ONE signle neural network)  ‘strange’ activation functions  digital rather than analog output  what are ‘convolutional networks’ ?  what about more ‘complicated’ structures like, built out of many building blocks like this: