Download presentation
Presentation is loading. Please wait.
1
Project on H →ττ and multivariate methods
Hu, Wenhua Lu, Meng Luo, Xuan Xuzhong, Yukun Gan, Xucheng iSTEP (July, 2016)
2
5σ! 1.Background Important Discovery! How did we find it?
Higgs Boson has been discovered in 20121,2. Important Discovery! How did we find it? Why are we so sure? 5σ! First, it’s the background of our project. As we all know, Higgs Boson has been discovered in That’s an important discovery. 5σ significance is a strong point to believe. So 1. ATLAS Collaboration (2012). Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC ☆. Physics Letters B, 716,1-29 2. CMS Collaboration (2012). Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC ☆. Physics Letters B, 716(1),
3
2. A simplified model Two candidates of signature
Search for decays to bottom quark and anti-bottom quark is more difficult. So signal1 rather than signal2 is chosen to be detected. Strong interaction between b and b(bar) causes much higher background.
4
2. A simplified model signal: H→ττ (Yellow Ball)
Signal and Background signal: H→ττ (Yellow Ball) background: Z→ττ and ttbar (Red Ball) us We are just like the boy with the eye covered. We can touch the ball but we can not tell the difference. So how can we figure out probablity if we can only know there is balls.
5
3. TMVA Methods Train Analyze Result TMVA
6
*What is TMVA? *Why do we try it?
The Toolkit for Multivariate Analysis (TMVA) provides an environment. TMVA makes a multi-variable distribution to a single variable distribution, then simplify the situation. provides a ROOT-integrated machine learning environment for the processing and parallel evaluation of multivariate classification and regression techniques. and there are several methods in it such as Fisher, BDT, MLP… Left multi-Variable right single variable
7
*Train the classifier Usually, we have data collected by detectors and we do not know whether there is signal in it. What can we do is to separate signal and background if there are some signals in the data actually? If we have some events and we know exactly whether it is a signal, we could train a classifier by using these events. background signals Classifier signal Let it have the ability to classify the signal and background
8
*How to select events? We want the contamination of the background to be small. So we do a “cut”, only when the event whose statistic larger than “cut” was reserved. After doing “cut”, we can get the number of signals and back-ground which is in the cut part. . At the same time we use Asimov formula to represent the significance of the “cut”. As Professor Cowan showed in the previous class tCut
9
*Best Cut You see , different cut influences the significance. Obviously, we can search for the entire space to get a Max significance. The left is …… the right is ……… Although we have chosen the same cut value, different significances are obtained by using different classifier methods.
10
Multi-bins analysis More Bins More Information Are we done yet? No.
In the single-cut analysis, we treat the data in the cut area same weight。 But in fact it is not。 So if we can divide the Cut area into sereval parts, obviously we can get more information. That will make the experiment stronger and the Significance higher
11
Multi-bins analysis What if we divided the response area in terms of bins, e.g. 20 bins, and caculated every single bins? Likelihood function for strength parameter µ will be Statistic for test of µ = 0: If we generate a list of (n1,..., nN) sample by possion distrbution, each sample will get a q0. Then repeat it for a large number times, we will get the distribution for q0 under µ = 0 . Can we do better? B
12
Background-only distribution of q0
There are two histogram of distribution of q0 by 20-bins divisions. We can see the MC we did fit the asymptotic well. So how can we calculate the significance? Fisher BDT
13
Asimov data set A simple method to obtain the median significance is to use a so-called “Asimov data set”. In this case, we just use ni=si+bi and then calculate the statistic q0 given in advance. Z0=Finally we can get median significances of Fisher and BDT median significance of Fisher: 2.38 median significance of BDT: 4.08 (Divide bins equally)
14
Bins Matter Fisher BDT median significance of Fisher: 2.35
We also try to divide the bins randomly to see what happens. It show that the bin division matters. It proves the advantage of mult-bins, that bins are not equally important. It also indicates that a good arrangement of bin division might rise significance again. Fisher BDT median significance of Fisher: 2.35 median significance of BDT: 4.04
15
Another Method for Significance
By the definition, we can calculate the significance by P-value. P-value apparently equals the 。。。。。。。。 The shadow area div the entire And Result together with Asimov result fit quite well Z0= 4.11(4.04) Z0= 2.34(2.38)
16
Summary Done: Try through several classifier of TMVA
Reach out the single cut Significance Maximum Use both MC and Asimov Data Set get significance in Multi-bins cut (highest at 4.11 sigma) Can be improved: More methods of classifier to test (SVM,MLP) Search for a 5sigma Multi-bin division
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.