Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms

Slides:



Advertisements
Similar presentations
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Advertisements

Florida International University COP 4770 Introduction of Weka.
Random Forest Predrag Radenković 3237/10
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
Data Mining Classification: Alternative Techniques
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Data mining and statistical learning - lecture 13 Separating hyperplane.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Ensemble Methods in Machine Learning
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Using Classification Trees to Decide News Popularity
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Technische Universität München Yulia Gembarzhevskaya LARGE-SCALE MALWARE CLASSIFICATON USING RANDOM PROJECTIONS AND NEURAL NETWORKS Technische Universität.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Combining Bagging and Random Subspaces to Create Better Ensembles
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Data Mining Practical Machine Learning Tools and Techniques
Bagging and Random Forests
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
Trees, bagging, boosting, and stacking
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Basic machine learning background with Python scikit-learn
Combining Base Learners
Introduction to Data Mining, 2nd Edition
Machine Learning and Verbatim Survey Response
Ensemble learning.
Somi Jacob and Christian Bach
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification with CART
Decision Trees for Mining Data Streams
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms Yinghua Zhang Huangxun CHEN

Outline Background Proposed Methods Evaluation Learning from Winning Solutions Conclusion

Background Imperfection of the standard model of particle physics Matter-antimatter asymmetry in the Universe The existence of the dark matter Indication of new physics 𝜏→ 𝜇 − 𝜇 − 𝜇 + decay which is forbidden in the standard model LHCb experiment: search for the 𝜏→ 𝜇 − 𝜇 − 𝜇 + decay A data mining challenge on Kaggle Data sets: from the largest particle accelerator in the world Goal: A classifier to predict whether 𝜏→ 𝜇 − 𝜇 − 𝜇 + decay happened given a list of collision events and their properties

Data Description Training set Test set Check agreement data set Labelled dataset (attribute ‘signal’: 1 for signal events) Signal events are simulated, background events are real data 67,553 training samples, 49 attributes Test set Non-labelled dataset(i.e., without attribute ‘signal’) 855,819 test samples, 46 attributes (all attributes in training set except ‘mass’, ‘production’ and ‘minANNmuon’) Check agreement data set Check correlation data set

Exploratory Data Analysis Feature importance: xgboost

Three most important features p0_track_Chi2Dof IPSig p2_track_Chi2Dof Box Histogram

Three least important features dira isolatione isolationf Box Histogram

Outline Background Proposed Methods Evaluation Learning from Winning Solutions Conclusion

Proposed method Baseline model Logistic Regression Random Forest Boosted Decision Tree + Logistic Regression Ensemble method Voting method Stacked generalization

Logistic Regression A linear classifier Problem Given a binary output valuable Y, model the conditional probability Pr 𝑌=1 𝑋=𝑥) as a function of x. Logistic regression model 𝑙𝑜𝑔 𝑝(𝑥) 1−𝑝(𝑥) = 𝛽 0 +𝑥∙𝛽 ⇒𝑝= 1 1+ 𝑒 −( 𝛽 0 +𝑥∙𝛽) Predict 𝑌=1 when 𝑝≥0.5 and 𝑌=0 when 𝑝<0.5. Decision boundary the solution of 𝛽 0 +𝑥∙𝛽=0 Likelihood Function 𝐿 𝛽 0 ,𝛽 = 𝑖=1 𝑛 𝑝( 𝑥 𝑖 ) 𝑦 𝑖 (1−𝑝( 𝑥 𝑖 )) 1− 𝑦 𝑖 Unknown parameters in the function are to be estimated by maximum likelihood.

Random forest An ensemble learning method (Leo Breiman, Adele Cutler) Training: Constructing a multitude of decision trees Each tree is grown as follows Training set cases number is N, sample N cases at random with replacement, from the original data to be the training set for growing the tree. M input variables, at each node, m (m<<M) variables are selected at random out of the M and the best split on these m is used to split the node. m is held constant during the forest growing. Each tree is grown to the largest extent possible. There is no pruning. Test Output the class that is the mode of the classes(classification) or mean prediction (regression) of the individual trees Advantage prevent decision trees' habit of overfiting to training set

Proposed method Baseline model Logistic Regression Random Forest Boosted Decision Tree + Logistic Regression Ensemble method Voting method Stacked generalization

GBDT+LR The most important thing is to have the right features. Widely used in industry (Facebook and Tencent) He, X., Pan, J., Jin, O., Xu, T., Liu, B., Xu, T., ... & Candela, J. Q. (2014, August). Practical lessons from predicting clicks on ads at facebook. InProceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1-9). ACM.

Proposed method Baseline model Logistic Regression Random Forest Boosted Decision Tree + Logistic Regression Ensemble method Voting method Stacked generalization

Voting Ensemble existing model predictions Property No need to retrain a model Work better to ensemble low-correlated model predictions Usually improve when adding more ensemble members Types of voting: Majority vote Weighted majority vote give a better model more weight in a vote Averaging(bagging) taking the mean of individual model predictions

Stacked Generalization Introduced by Wolpert in 1992 Basic idea use a pool of base classifiers use another classifier to combine their predictions A stacker model gets more information on the problem space, by using the first -stage predictions as features Goal: reduce the generalization error 2-fold stacking Split the train set in 2 parts: train_a and train_b Fit a first-stage model on train_a and create predictions for train_b Fit the same model on train_b and create predictions for train_a Finally fit the model on the entire train set and create predictions for the test set Now train a second-stage stacker model on the probabilities from the first-stage model(s)

Outline Background Proposed Methods Evaluation Learning from Winning Solutions Conclusion

Metric: Weighed AUC

Parameter Tuning

Ensemble

Outline Background Proposed Methods Evaluation Learning from Winning Solutions AUC=1.0000 Public repository Conclusion

Feature Engineering The significant improvement: discover mass calculation Projection of the momentum to the z-axis for each small particle 𝑝0𝑝𝑧= 𝑝0 𝑝 2 −𝑝0 𝑝𝑡 2 , 𝑝1𝑝𝑧= 𝑝1 𝑝 2 −𝑝1 𝑝𝑡 2 , 𝑝2𝑝𝑧= 𝑝2 𝑝 2 −𝑝2 𝑝𝑡 2 Summarize all of them 𝑝𝑧=𝑝0𝑝𝑧+𝑝1𝑝𝑧+𝑝2𝑝𝑧 Find full mometum p 𝑝= 𝑝𝑧 2 + 𝑝𝑡 2 Calculate Velocity 𝑠𝑝𝑒𝑒𝑑= 𝐹𝑙𝑖𝑔ℎ𝑡𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝐿𝑖𝑓𝑒𝑇𝑖𝑚𝑒 Calculate mass 𝑛𝑒𝑤 𝑚𝑎𝑠𝑠= 𝑝 𝑠𝑝𝑒𝑒𝑑

Classifiers XGB1 XGB2 XGB3 XGB4 Small binary logistic xgboost(five trees) Good AUC, Medium KS error, Not-so-bad Cramer-von Mises error XGB2 Satisfactory AUC, Medium KS error Very low Cramer-von Mises error(use geometrical mean of the models) XGB3 Simple XGBoost with 700 trees and bagging Good AUC, high KS error, very low Cramer-von Mises error XGB4 Small forest(three trees) Bagging and cutting-edging parameter

Corrected mass and new feature ‘new mass’ problem poorly correlate with real mass and generate both false-positive and false-negative errors near signal/background bordor Predict new mass error XGBoost with almost three thousand trees and all features Calculate two new features new_mass_delta = new_mass – new_mass2 new_mass_ratio = new_mass / new_mass2

𝐹𝑖𝑛𝑎𝑙=0.5∗ 𝑠𝑒𝑐𝑜𝑛𝑑 3.9 +0.2∗ 𝑓𝑖𝑟𝑠𝑡 0.6 +0.0001∗ 𝑠𝑒𝑐𝑜𝑛𝑑 0.2 ∗ 𝑓𝑖𝑟𝑠𝑡 0.01 More classifiers XGB5 Heavy XGB with 1500 trees With all new features: new_mass2, new_mass_delta, new_mass_ratio Bagging Very high AUC, High KS error, High Cramer-von Mises error Neural Network One DenseLayer with 8 neurons Final combination: 𝐹𝑖𝑟𝑠𝑡= 𝑋𝐺𝐵1 0.5 + 𝑋𝐺𝐵2 2 ∗0.5 𝑆𝑒𝑐𝑜𝑛𝑑=(𝑋𝐺𝐵1 ∗ 𝑋𝐺𝐵 0.85 ∗ 𝑋𝐺𝐵3 0.01 +𝑋𝐺𝐵2 ∗ 𝑋𝐺𝐵4 900 ∗0.85+ 𝑋𝐺𝐵3 1000 ∗2)/3.85 𝐹𝑖𝑛𝑎𝑙=0.5∗ 𝑠𝑒𝑐𝑜𝑛𝑑 3.9 +0.2∗ 𝑓𝑖𝑟𝑠𝑡 0.6 +0.0001∗ 𝑠𝑒𝑐𝑜𝑛𝑑 0.2 ∗ 𝑓𝑖𝑟𝑠𝑡 0.01

Outline Background Proposed Methods Evaluation Learning from Winning Solutions Conclusion

Conclusion Learn data mining tools Data visualization: ggplot2 in R Machine learning packages: sklearn, pandas, xgboost A better understanding of classification algorithms Logistic Regression Random Forest Gradient Boosting Decision Tree Ensemble methods Learn from winning solution Domain knowledge and feature engineering

Thanks for your attention! Check out our code here