Recent progress in multivariate methods for particle physics

Slides:



Advertisements
Similar presentations
Bruce Kennedy, RAL PPD Particle Physics 2 Bruce Kennedy RAL PPD.
Advertisements

Computing and Statistical Data Analysis / Stat 4
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 2: Multivariate Methods (I) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Topics in statistical data analysis for particle physics
KET-BSM meeting Aachen, April 2006 View from the Schauinsland in Freiburg a couple of weeks ago View from the Schauinsland in Freiburg a couple of weeks.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Recent Electroweak Results from the Tevatron Weak Interactions and Neutrinos Workshop Delphi, Greece, 6-11 June, 2005 Dhiman Chakraborty Northern Illinois.
Top Physics at the Tevatron Mike Arov (Louisiana Tech University) for D0 and CDF Collaborations 1.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Radial Basis Function (RBF) Networks
LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Sung-Won Lee 1 Study of Jets Production Association with a Z boson in pp Collision at 7 and 8 TeV with the CMS Detector Kittikul Kovitanggoon Ph. D. Thesis.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Heavy charged gauge boson, W’, search at Hadron Colliders YuChul Yang (Kyungpook National University) (PPP9, NCU, Taiwan, June 04, 2011) June04, 2011,
August 22, 2002UCI Quarknet The Higgs Particle Sarah D. Johnson University of La Verne August 22, 2002.
Irakli Chakaberia Final Examination April 28, 2014.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
B-tagging Performance based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with X. Li and B. Zhou) ATLAS B-tagging Meeting February 9,
MiniBooNE Event Reconstruction and Particle Identification Hai-Jun Yang University of Michigan, Ann Arbor (for the MiniBooNE Collaboration) DNP06, Nashville,
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
Search for a Z′ boson in the dimuon channel in p-p collisions at √s = 7TeV with CMS experiment at the Large Hadron Collider Search for a Z′ boson in the.
Study of the to Dilepton Channel with the Total Transverse Energy Kinematic Variable Athens, April 17 th 2003 Victoria Giakoumopoulou University of Athens,
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Measurements of Top Quark Properties at Run II of the Tevatron Erich W.Varnes University of Arizona for the CDF and DØ Collaborations International Workshop.
Higgs Reach Through VBF with ATLAS Bruce Mellado University of Wisconsin-Madison Recontres de Moriond 2004 QCD and High Energy Hadronic Interactions.
Searching for New Matter with the D0 Experiment Todd Adams Department of Physics Florida State University September 19, 2004.
Single Top Quark Production Mark Palenik Physics 564, Fall 2007.
G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools 1 Statistical Analysis Tools for Particle Physics IDPASC.
Analysis of H  WW  l l Based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with T.S. Dai, X.F. Li, B. Zhou) ATLAS Higgs Meeting September.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
Kinematics of Top Decays in the Dilepton and the Lepton + Jets channels: Probing the Top Mass University of Athens - Physics Department Section of Nuclear.
La Thuile, March, 15 th, 2003 f Makoto Tomoto ( FNAL ) Prospects for Higgs Searches at DØ Makoto Tomoto Fermi National Accelerator Laboratory (For the.
Search for H  WW*  l l Based on Boosted Decision Trees Hai-Jun Yang University of Michigan LHC Physics Signature Workshop January 5-11, 2008.
G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 21 Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate.
Backup slides Z 0 Z 0 production Once  s > 2M Z ~ GeV ÞPair production of Z 0 Z 0 via t-channel electron exchange. e+e+ e-e- e Z0Z0 Z0Z0 Other.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Higgs in the Large Hadron Collider Joe Mitchell Advisor: Dr. Chung Kao.
Viktor Veszpremi Purdue University, CDF Collaboration Tev4LHC Workshop, Oct , Fermilab ZH->vvbb results from CDF.
Study of Diboson Physics with the ATLAS Detector at LHC Hai-Jun Yang University of Michigan (for the ATLAS Collaboration) APS April Meeting St. Louis,
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
1 Donatella Lucchesi July 22, 2010 Standard Model High Mass Higgs Searches at CDF Donatella Lucchesi For the CDF Collaboration University and INFN of Padova.
Multivariate Methods of
Electroweak Physics Lecture 6
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
First Evidence for Electroweak Single Top Quark Production
Tutorial on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Venkat Kaushik, Jae Yu University of Texas at Arlington
Comment on Event Quality Variables for Multivariate Analyses
ISTEP 2016 Final Project— Project on
Lectures on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Jessica Leonard Oct. 23, 2006 Physics 835
Searches at LHC for Physics Beyond the Standard Model
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 6
TRISEP 2016 / Statistics Lecture 2
Statistical Analysis Tools for Particle Physics
SUSSP65, St Andrews, August 2009 / Statistical Methods 3
SUSY SEARCHES WITH ATLAS
Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan RHUL Physics RHUL Computer Science Seminar.
B. Mellado University of the Witwatersrand
Linear Discrimination
Measurement of the Single Top Production Cross Section at CDF
Measurement of b-jet Shapes at CDF
Presentation transcript:

Recent progress in multivariate methods for particle physics 17 January, 2010 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

Weizmann Institute, 17 Jan 10 / Multivariate Methods Outline Multivariate methods for HEP Event selection as a statistical test Neyman-Pearson lemma and likelihood ratio test Some multivariate classifiers: “Naive” (cuts, linear) Neural networks Boosted Decision Trees Support Vector Machines G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Data analysis in particle physics Particle physics experiments are expensive e.g. LHC, ~ $1010 (accelerator and experiments) the competition is intense (ATLAS vs. CMS) vs. Tevatron and the stakes are high: 4 sigma effect 5 sigma effect Hence the increased interest in advanced statistical methods. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

The Standard Model of particle physics Matter... + gauge bosons... photon (g), W±, Z, gluon (g) + relativity + quantum mechanics + symmetries... = Standard Model 25 free parameters (masses, coupling strengths,...). Includes Higgs boson (not yet seen). Almost certainly incomplete (e.g. no gravity). Agrees with all experimental observations so far. Many candidate extensions to SM (supersymmetry, extra dimensions,...) G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

The Large Hadron Collider Counter-rotating proton beams in 27 km circumference ring pp centre-of-mass energy 14 TeV Detectors at 4 pp collision points: ATLAS CMS LHCb (b physics) ALICE (heavy ion physics) general purpose G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods The ATLAS detector 2100 physicists 37 countries 167 universities/labs 25 m diameter 46 m length 7000 tonnes ~108 electronic channels G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

LHC event production rates most events (boring) mildly interesting interesting very interesting (~1 out of every 1011) G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods LHC data At LHC, ~109 pp collision events per second, mostly uninteresting do quick sifting, record ~200 events/sec single event ~ 1 Mbyte 1 “year”  107 s, 1016 pp collisions / year 2  109 events recorded / year (~2 Pbyte / year) For new/rare processes, rates at LHC can be vanishingly small e.g. Higgs bosons detectable per year could be ~103 → 'needle in a haystack' For Standard Model and (many) non-SM processes we can generate simulated data with Monte Carlo programs (including simulation of the detector). G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

A simulated SUSY event in ATLAS high pT jets of hadrons high pT muons p p missing transverse energy G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods Background events This event from Standard Model ttbar production also has high pT jets and muons, and some missing transverse energy. → can easily mimic a SUSY event. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods A simulated event PYTHIA Monte Carlo pp → gluino-gluino . G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Event selection as a statistical test For each event we measure a set of numbers: x1 = jet pT x2 = missing energy x3 = particle i.d. measure, ... follows some n-dimensional joint probability density, which depends on the type of event produced, i.e., was it E.g. hypotheses H0, H1, ... Often simply “signal”, “background” G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Finding an optimal decision boundary H0 In particle physics usually start by making simple “cuts”: xi < ci xj < cj H1 Maybe later try some other type of decision boundary: H0 H0 H1 H1 G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Two distinct event selection problems In some cases, the event types in question are both known to exist. Example: separation of different particle types (electron vs muon) Use the selected sample for further study. In other cases, the null hypothesis H0 means "Standard Model" events, and the alternative H1 means "events of a type whose existence is not yet established" (to do so is the goal of the analysis). Many subtle issues here, mainly related to the heavy burden of proof required to establish presence of a new phenomenon. Typically require p-value of background-only hypothesis below ~ 10-7 (a 5 sigma effect) to claim discovery of "New Physics". G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Using classifier output for discovery signal search region y f(y) y N(y) background background excess? ycut Normalized to unity Normalized to expected number of events Discovery = number of events found in search region incompatible with background-only hypothesis. p-value of background-only hypothesis can depend crucially distribution f(y|b) in the "search region". G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Example of a "cut-based" study In the 1990s, the CDF experiment at Fermilab (Chicago) measured the number of hadron jets produced in proton-antiproton collisions as a function of their momentum perpendicular to the beam direction: "jet" of particles Prediction low relative to data for very high transverse momentum. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

High pT jets = quark substructure? Although the data agree remarkably well with the Standard Model (QCD) prediction overall, the excess at high pT appears significant: The fact that the variable is "understandable" leads directly to a plausible explanation for the discrepancy, namely, that quarks could possess an internal substructure. Would not have been the case if the variable plotted was a complicated combination of many inputs. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

High pT jets from parton model uncertainty Furthermore the physical understanding of the variable led one to a more plausible explanation, namely, an uncertain modeling of the quark (and gluon) momentum distributions inside the proton. When model adjusted, discrepancy largely disappears: Can be regarded as a "success" of the cut-based approach. Physical understanding of output variable led to solution of apparent discrepancy. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Neural networks in particle physics For many years, the only "advanced" classifier used in particle physics. Usually use single hidden layer, logistic sigmoid activation function: G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Neural network example from LEP II Signal: e+e- → W+W- (often 4 well separated hadron jets) Background: e+e- → qqgg (4 less well separated hadron jets) ← input variables based on jet structure, event shape, ... none by itself gives much separation. Neural network output: (Garrido, Juste and Martinez, ALEPH 96-144) G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Some issues with neural networks In the example with WW events, goal was to select these events so as to study properties of the W boson. Needed to avoid using input variables correlated to the properties we eventually wanted to study (not trivial). In principle a single hidden layer with an sufficiently large number of nodes can approximate arbitrarily well the optimal test variable (likelihood ratio). Usually start with relatively small number of nodes and increase until misclassification rate on validation data sample ceases to decrease. Often MC training data is cheap -- problems with getting stuck in local minima, overtraining, etc., less important than concerns of systematic differences between the training data and Nature, and concerns about the ease of interpretation of the output. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods Overtraining If decision boundary is too flexible it will conform too closely to the training points → overtraining. Monitor by applying classifier to independent test sample. training sample independent test sample G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Particle i.d. in MiniBooNE Detector is a 12-m diameter tank of mineral oil exposed to a beam of neutrinos and viewed by 1520 photomultiplier tubes: Search for nm to ne oscillations required particle i.d. using information from the PMTs. H.J. Yang, MiniBooNE PID, DNP06 G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods Decision trees Out of all the input variables, find the one for which with a single cut gives best improvement in signal purity: where wi. is the weight of the ith event. Resulting nodes classified as either signal/background. Iterate until stop criterion reached based on e.g. purity or minimum number of events in a node. The set of cuts defines the decision boundary. Example by MiniBooNE experiment, B. Roe et al., NIM 543 (2005) 577 G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

BDT example from MiniBooNE ~200 input variables for each event (n interaction producing e, m or p). Each individual tree is relatively weak, with a misclassification error rate ~ 0.4 – 0.45 B. Roe et al., NIM 543 (2005) 577 G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Monitoring overtraining From MiniBooNE example: Performance stable after a few hundred trees. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Comparison of boosting algorithms A number of boosting algorithms on the market; differ in the update rule for the weights. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Single top quark production (CDF/D0) Top quark discovered in pairs, but SM predicts single top production. Use many inputs based on jet properties, particle i.d., ... Pair-produced tops are now a background process. signal (blue + green) G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Different classifiers for single top Also Naive Bayes and various approximations to likelihood ratio,.... Final combined result is statistically significant (>5s level) but not easy to understand classifier outputs. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Support Vector Machines Map input variables into high dimensional feature space: x → f Maximize distance between separating hyperplanes (margin) subject to constraints allowing for some misclassification. Final classifier only depends on scalar products of f(x): So only need kernel Bishop ch 7 G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods Using an SVM To use an SVM the user must as a minimum choose a kernel function (e.g. Gaussian) any free parameters in the kernel (e.g. the s of the Gaussian) a cost parameter C (plays role of regularization parameter) The training is relatively straightforward because, in contrast to neural networks, the function to be minimized has a single global minimum. Furthermore evaluating the classifier only requires that one retain and sum over the support vectors, a relatively small number of points. The advantages/disadvantages and rationale behind the choices above is not always clear to the particle physicist -- help needed here. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

SVM in particle physics SVMs are very popular in the Machine Learning community but have yet to find wide application in HEP. Here is an early example from a CDF top quark anlaysis (A. Vaiciulis, contribution to PHYSTAT02). signal eff. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Summary on multivariate methods Particle physics has used several multivariate methods for many years: linear (Fisher) discriminant neural networks naive Bayes and has in the last several years started to use a few more k-nearest neighbour boosted decision trees support vector machines The emphasis is often on controlling systematic uncertainties between the modeled training data and Nature to avoid false discovery. Although many classifier outputs are "black boxes", a discovery at 5s significance with a sophisticated (opaque) method will win the competition if backed up by, say, 4s evidence from a cut-based method. G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods Quotes I like “Keep it simple. As simple as possible. Not any simpler.” – A. Einstein “If you believe in something you don't understand, you suffer,...” – Stevie Wonder G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods Extra slides G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods

Weizmann Institute, 17 Jan 10 / Multivariate Methods G. Cowan Weizmann Institute, 17 Jan 10 / Multivariate Methods