Elizabeth R McMahon 14 April 2017

Slides:



Advertisements
Similar presentations
2DS00 Statistics 1 for Chemical Engineering Lecture 2.
Advertisements

Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Statistical Methods Chichang Jou Tamkang University.
Biostatistics Frank H. Osborne, Ph. D. Professor.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Chapter 2 Simple Comparative Experiments
 Catalogue No: BS-338  Credit Hours: 3  Text Book: Advanced Engineering Mathematics by E.Kreyszig  Reference Books  Probability and Statistics by.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.
Optimizing Higgs Analysis at DØ John Sandy – SULI Intern (Texas Tech University) Mentors: Michael P Cooke and Ryuji Yamada (Fermilab National Accelerator.
AM Recitation 2/10/11.
Chapter 8 Introduction to Hypothesis Testing
Chapter 4 Pattern Recognition Concepts continued.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
7.4 – Sampling Distribution Statistic: a numerical descriptive measure of a sample Parameter: a numerical descriptive measure of a population.
Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
October 19, 2000ACAT 2000, Fermilab, Suman B. Beri Top Quark Mass Measurements Using Neural Networks Suman B. Beri, Rajwant Kaur Panjab University, India.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Machine Learning 5. Parametric Methods.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
1 Probability and Statistics Confidence Intervals.
Analysis of H  WW  l l Based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with T.S. Dai, X.F. Li, B. Zhou) ATLAS Higgs Meeting September.
Describing a Score’s Position within a Distribution Lesson 5.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
AUTOCORRELATED DATA. CALCULATIONS ON VARIANCES: SOME BASICS Let X and Y be random variables COV=0 if X and Y are independent.
Robert Anderson SAS JMP
Inter-experimental LHC Machine Learning Working Group Activities
Analysis of Fastenal Quoting Practices
Machine Learning for Data Certification at CMS
Determining the CP Properties of a Light Higgs Boson
An Empirical Comparison of Supervised Learning Algorithms
Introduction to Machine Learning and Tree Based Methods
The story of the origin of mass - Higgs boson discovery -
Statistical Testing of the Large Hadron Collider
Confidence Intervals and Limits
CSSE463: Image Recognition Day 11
CH 5: Multivariate Methods
Chapter Six Normal Curves and Sampling Probability Distributions
Chapter 2 Simple Comparative Experiments
Multi-dimensional likelihood
Basic machine learning background with Python scikit-learn
SA3202 Statistical Methods for Social Sciences
CSSE463: Image Recognition Day 11
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Introduction to Predictive Modeling
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
Our Data Science Roadmap
Section 11.1: Significance Tests: Basics
Analyzing and Interpreting Quantitative Data
Measurement of the Single Top Production Cross Section at CDF
Applied Statistics and Probability for Engineers
Introductory Statistics
An introduction to Machine Learning (ML)
Presentation transcript:

Elizabeth R McMahon 14 April 2017 Higgs Boson Elizabeth R McMahon 14 April 2017

Table of Contents Introduction Data Description Data Exploration Discovery of the Higgs Boson Importance of Higgs ML Challenge Data Description Introduction to Variables Important Ideas Data Exploration Models Decision Tree Conditional Inference Decision Tree Random Forest Logistic Regression Naïve Bayes Results/Discussion Conclusions References

Introduction

Discovery of the Higgs Boson Announced July 4th, 2012 LHC-CERN, Switzerland ATLAS and CMS Experiments

Importance of Higgs Interactions with Higgs field gives other particles mass

Method of Discovery

CERN physicists & data scientists simulated data set mimicking ATLAS results GOAL: optimize classification and characterization of Higgs Events using ML techniques

Data Description

Training Set: 250,000 collisions Test Set: 500,000 collisions Computational Problems! Reduced data set to 5000 collisions (Random sample)

Variables I

Variables II Feature Engineering is difficult for this data without having a particle physics background! So the CERN physicists did the engineering for us  DER variables *DER: derived value PRI: primitive (raw)

Important Ideas Measuring Angles Velocities Masses Energies Momentum Number of jets Distances

Data Exploration

Analysis of Raw Data Simple functions run to find ratio of signal vs. background on training set

Missing Data

Models

Decision Tree Every variable checked at every level Biggest split Tell tree when to stop growing (vary complexity parameter)

Example of Decision Tree in Use Event 290849 DER_mass_MMC DER_mass_transverse_met_lep DER_mass_vis DER_pt_h 124.586 0.010 49.545 200.535 DER_mass_jet_jet DER_prodeta_jet_jet DER_deltar_tau_lep DER_pt_tot DER_sum_pt 70.110000 0.96100000 1.849 59.745 200.867 DER_pt_ratio_lep_tau DER_met_phi_centrality DER_lep_eta_centrality PRI_tau_pt PRI_tau_eta 1.815 1.000 0.99900000 20.984 0.097 PRI_tau_phi PRI_lep_pt PRI_lep_eta PRI_lep_phi PRI_met PRI_met_phi PRI_met_sumet -0.127 38.088 1.109 -1.674 160.847 256.460 PRI_jet_num PRI_jet_leading_pt PRI_jet_subleading_pt PRI_jet_all_pt Label 2 94.69500 47.100000 141.795 s

Conditional Inference Tree Statistical significance

Random Forest Array of decision trees Reduces error

Logistic Regression Discrete classification

Naïve Bayes Properties of an unknown: Barks? NO Fluffy? YES Energetic? YES P(dog|fluffy, energetic) =P(dog)∗P(fluffy|dog)∗P(energetic|dog)*P(no bark|dog) P(fluffy)∗P(energetic)*P(no bark) =(0.44)(0.61)(0.60)(0.07) (0.59)(0.72)(0.52) =0.05 P(cat…)=0.11 P(fish…)=0 Naïve Bayes PET Barks Fluffy Energetic Yes No Cat 1 39 35 5 25 15 Dog 50 55 40 Fish 30 3 27 Prior Probabilities (Base Rates) 𝑃 𝑐𝑎𝑡 = 40 125 =0.32 𝑃 𝑑𝑜𝑔 = 55 125 =0.44 𝑃 𝑓𝑖𝑠ℎ = 30 125 =0.24 Evidence Probabilities 𝑃 𝑏𝑎𝑟𝑘𝑠 = 51 125 =0.41 𝑃 𝑓𝑙𝑢𝑓𝑓𝑦 = 90 125 =0.72 𝑃 𝑒𝑛𝑒𝑟𝑔𝑒𝑡𝑖𝑐 = 65 125 =0.52 Likeliihood Probabilities 𝑃 𝑏𝑎𝑟𝑘𝑠|𝑐𝑎𝑡 = 1 51 =0.02 𝑃 𝑓𝑙𝑢𝑓𝑓𝑦|𝑐𝑎𝑡 = 35 90 =0.39 … 𝑃 𝑒𝑛𝑒𝑟𝑔𝑒𝑡𝑖𝑐|𝑓𝑖𝑠ℎ = 3 67 =0.045

Assumes variables are independent and normalized

Results/Discussion

Accuracies Rank Model Accuracy 1 Logistic Regression 82.27 2 Random Forest 81.88 3 Decision Tree 80.40 4 CI Decision Tree 76.20 5 Naïve Bayes 74.90

PROS: simple calculation CONS: not good judgement % ‘right’ answers PROS: simple calculation CONS: not good judgement Ex. FIREFIGHTING-Robots Good at predicting true negatives (TN) house not on fire Bad at predicting true positives (TP)  house on fire 98% of houses are not on fire  by not acting 100% of the time, they are 98% accurate. But 2% of houses are on fire.

Confusion matrices F1Score=𝟐 𝑷𝑹 𝑷+𝑹 Precision and Recall [0,1]

Robot Firefighter Example Accuracy= 𝟐𝟓+𝟕𝟓 𝟏𝟐𝟓 =𝟎.𝟖𝟎 Precision= 𝟐𝟓 𝟐𝟓+𝟏𝟓 =𝟎.𝟔𝟑 Recall= 𝟐𝟓 𝟐𝟓+𝟏𝟎 =0.71 F1 Score= 𝟐 𝟎.𝟔𝟑∗𝟎.𝟕𝟏 𝟎.𝟔𝟑+𝟎.𝟕𝟏 =0.67 1 25 15 10 75 actual Predicted

F1 Scores Model Precision Recall F1Score Logistic Regression 0.975136 0.664725 0.790551 Random Forest 0.775886 0.663753 0.715452 Decision Tree 0.692308 0.72439 0.707986 CI Decision Tree 0.65037 0.666667 0.658417 Naïve Bayes 0.639612 0.615385 0.627265

Variable Importance 1 2 3 Most model packages had a built-in “variable importance” functions Able to determine how each model ranked the influence/importance of variables 4 *CI Decision Tree excluded as variable importance not built-in

Variable Importance Variable Decision Tree Random Forest Logistic Regression Naïve Bayes Mean Median DER_mass_transverse_met_lep 2 1 x 1.67 DER_mass_MMC 19 4 6.25 2.5 DER_met_phi_centrality 3 5 3.50 3.5 DER_mass_vis 10 5.25 DER_pt_ratio_lep_tau 8 5.33 PRI_tau_pt 6 21 9.33 DER_deltar_tau_lep 7 17 7.75 DER_pt_h 9 16 9.50 8.5 DER_sum_pt 11 22 10.50 PRI_jet_num 20 11.50 PRI_met_sumet 14 10.33 DER_mass_jet_jet 12 12.50 11.5 PRI_jet_leading_pt 15 13 PRI_jet_all_pt 10.00 DER_lep_eta_centrality 11.75 PRI_met 13.75 13.5 PRI_lep_eta 14.50

Conclusion/Future Work

Talk to Dr. Chen/Dr. Vidden! Predict phenomena in my field, use it as a tool to better understand chemistry Talk to Dr. Chen/Dr. Vidden! ML is cool!

References Thank you to: Dr. Vidden Dr. Ragan Dr. Lesher Dr. Chen

Questions?

DER_mass_transverse_met_lep DER_met_phi_centrality CI Tree DER_mass_transverse_met_lep p<0.001 DER_met_phi_centrality PRI_tau_pt ≤46.776 >46.776 >0.374 >34.753 ≤0.374 ≤34.753

V3<X3 V2<X2 V5<X5 Collisions V4<X4 V1<X1 Higgs (s)