Download presentation
Presentation is loading. Please wait.
Published byCamilla Kennedy Modified over 9 years ago
1
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 1 Input from Statistics Forum for Exotics ATLAS Exotics Meeting CERN/phone, 22 January, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan Input from: Eilam Gross, Samir Ferrag
2
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 2 Intro Contributions to Statistics Forum from Exotics group over last year have raised questions in several areas: methods for setting limits, establishing discovery, methods for incorporating systematic uncertainties, approval of software, methods,… Purpose of this talk is to address some of these issues as part of an ongoing discussion (not yet definitive answers). Some pointers to info -- StatForum Webpage: twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/StatisticsTools including notes in Statistics FAQ and also 1 st half of the Higgs Combination chapter of CSC Book (p 1480).
3
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 3 Statistics Forum Website: FAQ Some general items: PDG Chapters, Pedestrian's guide, Glossary,...
4
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 4 Statistics Forum FAQ Notes This is a living document
5
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 5 Statistics Forum FAQ Notes The “FAQ” consists of a collection of notes on specific questions use cases, examples,... Bayesian methods for ATLAS Higgs search (GC) Comparison of significance from profile and integrated likelihoods (GC, EG) Discovery significance with statistical uncertainty in the background estimate (EG, OV, GC) Error analysis for efficiency (GC) How to measure efficiency (DC) MC statistical errors in ML fits (GC) Covariance matrix for histogram made using seed events (GC) If you have a note which you think should be included here, or if you are interested to write such a note or comment on a note or request a note on a specific subject please let us know.
6
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 6 Some statistics issues in searches (1)Define appropriate test variable(s). Cut-based Multivariate method (Fisher, NN, BDT, SVM,...) (2) Determine its (their) distribution(s) under hypothesis of: background only, background + (parametrized) signal,... Data-driven or MC, parametric or histogram,... Quantify systematic uncertainties. (3) Measure the distribution in data; quantify level of agreement between data and predictions (results in limits, discovery significance). Exclusion limits (Neyman, CLs, Bayesian) Discovery significance (frequentist, Bayesian)
7
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 7 Multivariate methods – brief comment Most searches in the CSC book use physically motivated cut-based selection: analysis easy to understand and easy to spot anomalous behaviour. But by a nonlinear decision boundary between signal and background leads in general to higher sensitivity. Many new tools on market (see e.g. TMVA manual): Boosted Decision Trees, K-Nearest Neighbour/Kernel-based Density Estimation, Support Vector Machines,.. Multivariate analysis suffers some loss of transparency but... from MVA plus e.g. from cuts could win the race.
8
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 8 Search formalism Define a test variable whose distribution is sensitive to whether hypothesis is background-only or signal + background. E.g. count n events in signal region: events found expected signal expected background strength parameter = s / s,nominal
9
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 9 Search formalism with multiple bins (channels) Bin i of a given channel has n i events, expectation value is Expected signal and background are: is global strength parameter, common to all channels. = 0 means background only, = 1 is nominal signal hypothesis. b tot, s, b are nuisance parameters
10
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 10 Subsidiary measurements for background One may have a subsidiary measurement to constrain the background based on a control region where one expects no signal. In bin i of control histogram find m i events; expectation value is where the u i can be found from MC and includes parameters related to the background (mainly rate, sometimes also shape). In some measurements there may be no explicit subsidiary measurement but the sidebands around a signal peak effectively play the same role in constraining the background.
11
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 11 Likelihood function For an individual search channel, n i ~ Poisson( s i +b i ), m i ~ Poisson(u i ). The likelihood is: Parameter of interest Here represents all nuisance parameters For multiple independent channels there is a likelihood L i ( , i ) for each. The full likelihood function is
12
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 12 Systematics "built in" as long as some point in -space = "truth"
13
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 13 p-values Quantify level of agreement between data and hypothesis H with: p-value = Prob(data with ≤ compatibility with H when compared to the data we got | H ) = probability, under assumption of H, to obtain data as bizarre as the data we got (or more so) ≠ probability that H is true (!!!)
14
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 14 Significance from p-value Define significance Z as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p-value. TMath::Prob TMath::NormQuantile
15
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 15 When to publish HEP folklore is to claim discovery when p = 2.9 × 10 -7, corresponding to a significance Z = 5. This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g., phenomenon reasonable p-value for discovery D 0 D 0 mixing Higgs (?) Life on Mars Astrology Note some groups have defined 5 to refer to a two-sided fluctuation, i.e., p = 5.7 × 10 -7
16
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 16 Distribution of q So to find the p-value we need f(q | ). Method 1: generate toy MC experiments with hypothesis , obtain at distribution of q . OK for e.g. ~10 3 or 10 4 experiments, 95% CL limits. But for discovery usually want 5 , p-value = , so need to generate ~10 8 toy experiments (for every point in param. space). Method 2: Wilk's theorem says that for large enough sample, f(q | ) ~ chi-square(1 dof) This is the approach used in the Higgs Combination exercise; not yet validated to 5 level. If/when we are fortunate enough to see a signal, then focus MC resources on that point in parameter space.
17
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 17 Significance from q If we take f(q | ) ~ 2 for 1dof, then the significance is (see Higgs combo note): For n ~ Poisson ( s+b) with b known, testing =0 gives To quantify sensitivity give e.g. expected Z under s+b hypothesis
18
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 18 Likelihood ratio L s+b /L b Fast Fourier Transform method to find distribution; derives n-event distribution from that of single event with FFT. Hu and Nielson, physics/9906010 Solves "5-sigma problem". Used at LEP -- systematics treated by averaging the likelihoods by sampling new values of nuisance parameters for each simulated experiment (integrated rather than profile likelihood). An alternative (in simple cases equivalent) test variable is
19
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 19 Determining distributions: systematics E.g. M ll distribution from Z'→dilepton search (CSC Book p 1709), uses 4-parameter function for signal. Sidebands provide estimate of background. So nothing in real analysis from MC, but... Still should consider some systematic due to fact that assumed parametric functions not perfect. General approach: include more parameters making the model more flexible, so that for some point in the enlarged parameter space, model = Nature (or difference negligible).
20
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 20 A general strategy (see attached note) Suppose one needs to know the shape of a distribution. Initial model (e.g. MC) is available, but known to be imperfect. Q: How can one incorporate the systematic error arising from use of the incorrect model? A: Improve the model. That is, introduce more adjustable parameters into the model so that for some point in the enlarged parameter space it is very close to the truth. Then use profile the likelihood with respect to the additional (nuisance) parameters. The correlations with the nuisance parameters will inflate the errors in the parameters of interest. Difficulty is deciding how to introduce the additional parameters.
21
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 21 A simple example The naive model (a) could have been e.g. from MC (here statistical errors suppressed; point is to illustrate how to incorporate systematics.) 0th order model True model (Nature) Data
22
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 22 Comparison with the 0th order model The 0th order model gives q = 258.8, p ×
23
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 23 Enlarging the model Here try to enlarge the model by multiplying the 0th order distribution by a function s: where s(x) is a linear superposition of Bernstein basis polynomials of order m:
24
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 24 Bernstein basis polynomials
25
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 25 Enlarging the parameter space Using increasingly high order for the basis polynomials gives an increasingly flexible function. At each stage compare the p-value to some threshold, e.g., 0.1 or 0.2, to decide whether to include the additional parameter. Now iterate this procedure, and stop when the data do not require addition of further parameters based on the likelihood ratio test. Once the enlarged model has been found, simply include it in any further statistical procedures, and the statistical errors from the additional parameters will account for the systematic uncertainty in the original model.
26
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 26 Fits using increasing numbers of parameters Stop here
27
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 27 Setting limits Method outlined in the CSC Higgs Combo = "CL s+b method", i.e., for the hypothesized (e.g. 1) compute the p-value: is excluded at CL=0.95 if p < = 0.05, and if =1 is excluded, the corresponding point in parameter space for the signal model is excluded. E.g. present expected limit on vs mass parameter.
28
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 28 Setting limits: CL s Alternative method (from Alex Read at LEP); exclude if where This cures the problematic case where the one excludes parameter point where one has no sensitivity (e.g. large mass scale) because of a downwards fluctuation of the background. But there are perhaps other ways to get around this problem, e.g., only exclude if both observed and expected p-value .
29
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 29 Comment on validation procedures for methods Ongoing discussions on methodology Ideal is to use several methods (profile likelihood, Bayesian, CLs,...) for each result. Formal procedures still evolving, but if you are going to use a novel statistical technique, please come give a talk about it at the Statistics Forum.
30
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 30 Comment on software tools Summer 08: agree to develop RooStats as common framework. Keep eye on ability to carry out independent validation. Key players: Kyle Cranmer (ATLAS) Gregory Schott (CMS) Wouter Verkerke (RooFit) Lorenzo Moneta (Root) Work currently very active (and help needed).
31
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 31 Summary Current areas of activity include: Development of profile likelihood, CLs, Bayesian methods for searches (including systematics); Combination tools (e.g. Higgs combination); RooStats software effort, Multivariate methods,... Statistics forum wants to increase active dialogue with the physics groups. If you are using a novel procedure or want to discuss a statistical method, please contact us.
32
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 32 Extra slides
33
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 33 Physics Group / StatForum interaction Eilam Gross, 8.12.08
34
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 34 Questions from Luis Flores, 24 September, 2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.