ISTEP 2016 Final Project— Project on

Slides:



Advertisements
Similar presentations
Investigation on Higgs physics Group Ye Li Graduate Student UW - Madison.
Advertisements

Summary of Results and Projected Sensitivity The Lonesome Top Quark Aran Garcia-Bellido, University of Washington Single Top Quark Production By observing.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
1 Statistical Inference Problems in High Energy Physics and Astronomy Louis Lyons Particle Physics, Oxford BIRS Workshop Banff.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Why do Wouter (and ATLAS) put asymmetric errors on data points ? What is involved in the CLs exclusion method and what do the colours/lines mean ? ATLAS.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
Use of Multivariate Analysis (MVA) Technique in Data Analysis Rakshya Khatiwada 08/08/2007.
B-tagging Performance based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with X. Li and B. Zhou) ATLAS B-tagging Meeting February 9,
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
Measurements of Top Quark Properties at Run II of the Tevatron Erich W.Varnes University of Arizona for the CDF and DØ Collaborations International Workshop.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Study of pair-produced doubly charged Higgs bosons with a four muon final state at the CMS detector (CMS NOTE 2006/081, Authors : T.Rommerskirchen and.
Kalanand Mishra BaBar Coll. Meeting February, /8 Development of New Kaon Selectors Kalanand Mishra University of Cincinnati.
G. Cowan RHUL Physics Status of Higgs combination page 1 Status of Higgs Combination ATLAS Higgs Meeting CERN/phone, 7 November, 2008 Glen Cowan, RHUL.
Analysis of H  WW  l l Based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with T.S. Dai, X.F. Li, B. Zhou) ATLAS Higgs Meeting September.
Alexei Safonov (Texas A&M University) For the CDF Collaboration.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
1 Donatella Lucchesi July 22, 2010 Standard Model High Mass Higgs Searches at CDF Donatella Lucchesi For the CDF Collaboration University and INFN of Padova.
Discussion on significance
Bounds on light higgs in future electron positron colliders
Statistical Significance & Its Systematic Uncertainties
Hypothesis Testing: One Sample Cases
LECTURE 11: Advanced Discriminant Analysis
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
LECTURE 10: DISCRIMINANT ANALYSIS
CSSE463: Image Recognition Day 11
First Evidence for Electroweak Single Top Quark Production
Tutorial on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Making Use of Associations Tests
Maximum Likelihood Estimation
Comment on Event Quality Variables for Multivariate Analyses
Multi-dimensional likelihood
Data Analysis in Particle Physics
Statistical Learning Dong Liu Dept. EEIS, USTC.
Tutorial on Multivariate Methods (TMVA)
TAE 2017 Centro de ciencias Pedro Pascual Benasque, Spain
School on Data Science in (Astro)particle Physics
LECTURE 05: THRESHOLD DECODING
CSSE463: Image Recognition Day 11
Hidden Markov Models Part 2: Algorithms
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Project on H →ττ and multivariate methods
Using Single Photons for WIMP Searches at the ILC
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 7
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
LECTURE 05: THRESHOLD DECODING
A Block Based MAP Segmentation for Image Compression
TAE 2018 Centro de ciencias Pedro Pascual Benasque, Spain
CSSE463: Image Recognition Day 11
Project on H→ττ and multivariate methods
CSSE463: Image Recognition Day 11
Why do Wouter (and ATLAS) put asymmetric errors on data points ?
Measurement of the Single Top Production Cross Section at CDF
Presentation transcript:

ISTEP 2016 Final Project— Project on 𝐻→𝜏𝜏 and multivariate methods Wenkai Fan / Hang Zhou / Cheng Chen / Hang Yang / Yongkun Li Tutor : Glen Cowan 2018/11/15 iSTEP 2016, Beijing, Tsinghua

PART ONE Motivation iSTEP 2016, Beijing, Tsinghua “Discovery of the long-awaited Higgs boson was announced on 4 July 2012 and confirmed six months later. The year 2013 saw a number of prestigious awards given for the discovery, including a Nobel Prize. But for physicists, the discovery of a new particle means the beginning of a long and difficult quest to measure its characteristics and determine if it fits the current model of nature.” “The ATLAS experiment has recently observed a signal of the Higgs boson decaying into two tau particles, but this decay is a small signal buried in background noise.” Motivation: to explore the potential of advanced machine-learning methods to improve the discovery significance of the experiment. Basically, we can know the outcome with or without the signal. But the experiment result will always contain some statistical fluctuation. We want to test whether the signal is compatible with the data, so as to say, whether the data can not fit well with the background only.

Method—Multivariate analysis PART TWO Method—Multivariate analysis iSTEP 2016, Beijing, Tsinghua Some of these variables are first used in a real-time multi-stage cascade classifier (called the trigger) to discard most of the uninteresting events (called the background). The selected events (roughly four hundred per second) are then written on disks by a large CPU farm, producing petabytes of data per year. The saved events still, in large majority, represent known processes (called background): they are mostly produced by the decay of particles which are exotic in everyday terms, but known, having been discovered in previous generations of experiments. The goal of the offline analysis is to find a (not necessarily connected) region in the feature space in which there is a significant excess of events (called signal) compared to what known background processes can explain. Once the region has been fixed, a statistical (counting) test is applied to determine the significance of the excess. If the probability that the excess has been produced by background processes falls below a limit, the new particle is deemed to be discovered. 𝑝=2.87× 10 −7 𝑍= Φ −1 1−𝑝 =5 𝑠𝑖𝑔𝑚𝑎 Discovery!

Method—Multivariate analysis PART TWO Method—Multivariate analysis iSTEP 2016, Beijing, Tsinghua 𝑥 𝑖 , 𝑦 𝑖 , 𝑤 𝑖 —the variables, the label of {s,b} and the weight. 𝑖=𝑠,𝑏 𝑤 𝑖 = 𝑁 𝑠,𝑏 — 𝑁 𝑠 and 𝑁 𝑏 are the expected events. (with luminosity and cross section information integrated in) There are about 30 variables in 𝑥 𝑖 , and a classifier maps them on to one variable 𝑡. We apply a simple cut on the 𝑡 axis, meaning we only retain the data for which its 𝑡-value is greater than a 𝑡 𝑐𝑢𝑡 .

PART TWO A Digression iSTEP 2016, Beijing, Tsinghua The weight have been re-weighted, for a better training effect. But the actual signal events is very scarce compared with the background. 200 signal background (Signal is much smaller due to small cross-section.)

Method—Multivariate analysis PART TWO Method—Multivariate analysis iSTEP 2016, Beijing, Tsinghua The number of signal and background events are: 𝑠= 𝑡> 𝑡 𝑐𝑢𝑡 𝑖=𝑠 𝑤 𝑖 𝑏= 𝑡> 𝑡 𝑐𝑢𝑡 𝑖=𝑏 𝑤 𝑖 We can compute the expected discovery significance using the “Asimov” formula: 𝑍= 2( 𝑠+𝑏 ln 1+ 𝑠 𝑏 −𝑠) (Because we assume a Poisson distribution for the number of events detected.)

Result for different classifing methods PART TWO Result for different classifing methods iSTEP 2016, Beijing, Tsinghua Fisher Discriminant: Maximal significance: 2.11 𝑡𝐶𝑢𝑡 𝑡𝐶𝑢𝑡

Result for different classifing methods PART TWO Result for different classifing methods iSTEP 2016, Beijing, Tsinghua Multi Layer Perceptron (Neural Network): Maximal significance: 1.97 𝑡𝐶𝑢𝑡 𝑡𝐶𝑢𝑡

Result for different classifing methods PART TWO Result for different classifing methods iSTEP 2016, Beijing, Tsinghua Boosted Decision Tree : Maximal significance: 3.25 𝑡𝐶𝑢𝑡 𝑡𝐶𝑢𝑡

Receiver Operating Characteristic Curve PART TWO Receiver Operating Characteristic Curve iSTEP 2016, Beijing, Tsinghua 1−𝜀 𝑏 0,1 𝑟𝑒𝑗𝑒𝑐𝑡 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡𝑠 (1,1) Signal Background 𝑡𝐶𝑢𝑡↑ Show background rejection versus signal efficiency. Closer to the right top, the better. Fisher MLP BDT 𝜀 𝑠 1,0 𝑟𝑒𝑡𝑎𝑖𝑛 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑎𝑙 𝑎𝑛𝑑 𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑 𝑒𝑣𝑒𝑛𝑡𝑠

Method—Multivariate analysis PART THREE Method—Multivariate analysis iSTEP 2016, Beijing, Tsinghua 𝒃𝒊𝒏𝒔 Multi-bin analysis Slice the plane into several columns—called bins. Within each column, we can calculate the expected signal and background events for each bin: 𝑠 𝑖 = 𝑡𝜖𝑏𝑖𝑛(𝑖) 𝑖=𝑠 𝑤 𝑖 𝑏 𝑖 = 𝑡𝜖𝑏𝑖𝑛(𝑖) 𝑖=𝑏 𝑤 𝑖

Method—Multivariate analysis PART THREE Method—Multivariate analysis iSTEP 2016, Beijing, Tsinghua Likelihood function: 𝐿 𝜇 = 𝑖=1 𝑁 (𝜇 𝑠 𝑖 + 𝑏 𝑖 ) 𝑛 𝑖 𝑛 𝑖 ! 𝑒 −(𝜇 𝑠 𝑖 + 𝑏 𝑖 ) 𝜇 is defined so that it maximizes 𝐿 𝜇 , or the solution to the following equation: 𝜕𝑙𝑛𝐿 𝜕𝜇 = 𝑖=1 𝑁 𝑛 𝑖 𝑠 𝑖 𝜇 𝑠 𝑖 + 𝑏 𝑖 − 𝑠 𝑖 =0 ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑏 𝑖 ) For testing no signal hypothesis (𝜇=0) : 𝑞 0 = −2 ln 𝐿 0 𝐿 𝜇 𝜇 ≥0, 0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒. Since this equation is monotonic in 𝜇, and has a solution at 𝜇=1. (Numerical optimization gives a solution within the range of 1±0.01.) Significance: 𝑍= 𝑞 0

Analysis on the likelihood function PART THREE Analysis on the likelihood function iSTEP 2016, Beijing, Tsinghua 𝑞 0 =2 𝑖=1 𝑁 𝑙𝑛 𝐿 𝑖 𝜇 −𝑙𝑛 𝐿 𝑖 0 If we set 𝐿 𝑖 𝜇 = (𝜇 𝑠 𝑖 + 𝑏 𝑖 ) 𝑛 𝑖 𝑛 𝑖 ! 𝑒 −(𝜇 𝑠 𝑖 + 𝑏 𝑖 ) , then 𝐿= 𝐿 𝑖 , and we have: We want to check how this 𝑙𝑛 𝐿 𝑖 𝜇 −𝑙𝑛 𝐿 𝑖 0 term depend on 𝑠 𝑖 and 𝑏 𝑖 . The more background you have, the more likely the excess in data belongs to statistical fluctuation, the less significance you will get. 𝑠 𝑏 ↑ 𝑍↑

Analysis on the likelihood function PART THREE Analysis on the likelihood function iSTEP 2016, Beijing, Tsinghua If we take 𝜇 =1, then 𝑙𝑛𝐿 𝜇 −𝑙𝑛𝐿 0 =𝑙𝑛 𝑠+𝑏 𝑠+𝑏 𝑏 𝑏 −𝑠 If we assume 𝑠≪𝑏, then a Taylor expansion about 𝑠=0 gives the following equation: 𝑙𝑛𝐿 𝜇 −𝑙𝑛𝐿 0 ≈𝑙𝑛[𝑏 ⁢𝑠+ 𝑠 2 2⁢𝑏 − 𝑠 3 6⁢ 𝑏 2 + 𝑠 4 12⁢ 𝑏 3 + 𝑂[𝑠 5 Further assuming that 𝑏=𝑐𝑜𝑛𝑠𝑡, gives: 𝑙𝑛𝐿 𝜇 −𝑙𝑛𝐿 0 ∝𝑡( 𝑡 2 − 𝑡 2 6 + 𝑡 3 12 + ln 𝑏 ) where 𝑡= 𝑠 𝑏 . This function is monotonically increasing within the range (0,1). This confirms what we saw using numerical method in the previous slide. Larger 𝑠 𝑏 gives larger 𝑍.

Multi-bin analysis—Result PART THREE Multi-bin analysis—Result iSTEP 2016, Beijing, Tsinghua Fisher BDT MLP tCut maximum Result (BDT) tCut maximum Result (Fisher) tCut maximum Result (MLP) With BDT method and 25 bins, we got an optimized significance about 4.16.

PART FOUR Summary && Outlook iSTEP 2016, Beijing, Tsinghua A simple cut gives a significance around 2-3. BDT algorithm gives the best performance. 2. Multi-bin analysis can improve the result. But it depends on the specific shape of the histogram. Too few or too many bins may deteriorate the performance. 3. The optimized significance is about 4.16(99.9999996% CL). So we can boldly say that, we have “found” the Higgs boson! 4. In order to get greater significance: For classifiers, need to separate the signal from the background as much as possible. For physicists, need to know within which variables do the signal and background differs most, apriori or after checking the data. If signal is much fewer than background, than the significance is determined only by their ratio and is monotonic increasing with it. 5. Further investigation, e.g. using other classifiers(NN,LD,SVM), or tuning the width of each bin to get a further optimization is possible. (NN, SVM, CutsGA, PEDRS etc. have been tried but failed at the training stage.)

THANK YOU Wenkai Fan Hang Zhou Cheng Chen Hang Yang Yongkun Li iSTEP 2016, Beijing, Tsinghua THANK YOU Wenkai Fan Hang Zhou Cheng Chen Hang Yang Yongkun Li