G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 1 Input from Statistics Forum for Exotics ATLAS Exotics Meeting CERN/phone, 22 January,

Slides:



Advertisements
Similar presentations
G. Cowan TAE 2013 / Statistics Problems1 TAE Statistics Problems Glen Cowan Physics Department Royal Holloway, University of London
Advertisements

Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
Statistical Data Analysis Stat 3: p-values, parameter estimation
Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian methods for HEP / DESY Terascale School page 1 Bayesian statistical methods for HEP Terascale Statistics School DESY, Hamburg.
G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
G. Cowan RHUL Physics Higgs combination note status page 1 Status of Higgs Combination Note ATLAS Statistics/Higgs Meeting Phone, 7 April, 2008 Glen Cowan.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits Lecture 2: Tests based on likelihood.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
1 Glen Cowan Statistics Forum News Glen Cowan Eilam Gross ATLAS Statistics Forum CERN, 3 December, 2008.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 2 page 1 Statistical Methods in Particle Physics Lecture 2: Limits and Discovery.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Results of combination Higgs toy combination, within and across experiments, with RooStats Grégory Schott Institute for Experimental Nuclear Physics of.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
G. Cowan Statistical techniques for systematics page 1 Statistical techniques for incorporating systematic/theory uncertainties Theory/Experiment Interplay.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
G. Cowan ATLAS Statistics Forum / Minimum Power for PCL 1 Minimum Power for PCL ATLAS Statistics Forum EVO, 10 June, 2011 Glen Cowan* Physics Department.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
G. Cowan S0S 2010 / Statistical Tests and Limits 1 Statistical Tests and Limits Lecture 1: general formalism IN2P3 School of Statistics Autrans, France.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Easy Limit Statistics Andreas Hoecker CAT Physics, Mar 25, 2011.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan RHUL Physics Analysis Strategies Workshop -- Statistics Forum Report page 1 Report from the Statistics Forum ATLAS Analysis Strategies Workshop.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
G. Cowan RHUL Physics Status of Higgs combination page 1 Status of Higgs Combination ATLAS Higgs Meeting CERN/phone, 7 November, 2008 Glen Cowan, RHUL.
G. Cowan Systematic uncertainties in statistical data analysis page 1 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 2: Discovery.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,
G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,
Discussion on significance
Status of the Higgs to tau tau
Statistical Significance & Its Systematic Uncertainties
Grégory Schott Institute for Experimental Nuclear Physics
Comment on Event Quality Variables for Multivariate Analyses
Tutorial on Multivariate Methods (TMVA)
TAE 2018 / Statistics Lecture 3
Statistical Methods for the LHC
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 6
Statistical Methods for HEP Lecture 3: Discovery and Limits
Application of CLs Method to ATLAS Higgs Searches
Decomposition of Stat/Sys Errors
Presentation transcript:

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 1 Input from Statistics Forum for Exotics ATLAS Exotics Meeting CERN/phone, 22 January, 2009 Glen Cowan Physics Department Royal Holloway, University of London Input from: Eilam Gross, Samir Ferrag

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 2 Intro Contributions to Statistics Forum from Exotics group over last year have raised questions in several areas: methods for setting limits, establishing discovery, methods for incorporating systematic uncertainties, approval of software, methods,… Purpose of this talk is to address some of these issues as part of an ongoing discussion (not yet definitive answers). Some pointers to info -- StatForum Webpage: twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/StatisticsTools including notes in Statistics FAQ and also 1 st half of the Higgs Combination chapter of CSC Book (p 1480).

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 3 Statistics Forum Website: FAQ Some general items: PDG Chapters, Pedestrian's guide, Glossary,...

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 4 Statistics Forum FAQ Notes This is a living document

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 5 Statistics Forum FAQ Notes The “FAQ” consists of a collection of notes on specific questions use cases, examples,... Bayesian methods for ATLAS Higgs search (GC) Comparison of significance from profile and integrated likelihoods (GC, EG) Discovery significance with statistical uncertainty in the background estimate (EG, OV, GC) Error analysis for efficiency (GC) How to measure efficiency (DC) MC statistical errors in ML fits (GC) Covariance matrix for histogram made using seed events (GC) If you have a note which you think should be included here, or if you are interested to write such a note or comment on a note or request a note on a specific subject please let us know.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 6 Some statistics issues in searches (1)Define appropriate test variable(s). Cut-based Multivariate method (Fisher, NN, BDT, SVM,...) (2) Determine its (their) distribution(s) under hypothesis of: background only, background + (parametrized) signal,... Data-driven or MC, parametric or histogram,... Quantify systematic uncertainties. (3) Measure the distribution in data; quantify level of agreement between data and predictions (results in limits, discovery significance). Exclusion limits (Neyman, CLs, Bayesian) Discovery significance (frequentist, Bayesian)

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 7 Multivariate methods – brief comment Most searches in the CSC book use physically motivated cut-based selection: analysis easy to understand and easy to spot anomalous behaviour. But by a nonlinear decision boundary between signal and background leads in general to higher sensitivity. Many new tools on market (see e.g. TMVA manual): Boosted Decision Trees, K-Nearest Neighbour/Kernel-based Density Estimation, Support Vector Machines,.. Multivariate analysis suffers some loss of transparency but...  from MVA plus e.g.  from cuts could win the race.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 8 Search formalism Define a test variable whose distribution is sensitive to whether hypothesis is background-only or signal + background. E.g. count n events in signal region: events found expected signal expected background strength parameter  =  s /  s,nominal

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 9 Search formalism with multiple bins (channels) Bin i of a given channel has n i events, expectation value is Expected signal and background are:  is global strength parameter, common to all channels.  = 0 means background only,  = 1 is nominal signal hypothesis. b tot,  s,  b are nuisance parameters

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 10 Subsidiary measurements for background One may have a subsidiary measurement to constrain the background based on a control region where one expects no signal. In bin i of control histogram find m i events; expectation value is where the u i can be found from MC and  includes parameters related to the background (mainly rate, sometimes also shape). In some measurements there may be no explicit subsidiary measurement but the sidebands around a signal peak effectively play the same role in constraining the background.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 11 Likelihood function For an individual search channel, n i ~ Poisson(  s i +b i ), m i ~ Poisson(u i ). The likelihood is: Parameter of interest Here  represents all nuisance parameters For multiple independent channels there is a likelihood L i ( ,  i ) for each. The full likelihood function is

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 12 Systematics "built in" as long as some point in  -space = "truth"

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 13 p-values Quantify level of agreement between data and hypothesis H with: p-value = Prob(data with ≤ compatibility with H when compared to the data we got | H ) = probability, under assumption of H, to obtain data as bizarre as the data we got (or more so) ≠ probability that H is true (!!!)

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 14 Significance from p-value Define significance Z as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p-value. TMath::Prob TMath::NormQuantile

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 15 When to publish HEP folklore is to claim discovery when p = 2.9 × 10 -7, corresponding to a significance Z = 5. This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g., phenomenon reasonable p-value for discovery D 0 D 0 mixing  Higgs   (?) Life on Mars   Astrology   Note some groups have defined 5  to refer to a two-sided fluctuation, i.e., p = 5.7 × 10 -7

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 16 Distribution of q  So to find the p-value we need f(q  |  ). Method 1: generate toy MC experiments with hypothesis , obtain at distribution of q . OK for e.g. ~10 3 or 10 4 experiments, 95% CL limits. But for discovery usually want 5 , p-value =  , so need to generate ~10 8 toy experiments (for every point in param. space). Method 2: Wilk's theorem says that for large enough sample, f(q  |  ) ~ chi-square(1 dof) This is the approach used in the Higgs Combination exercise; not yet validated to 5  level. If/when we are fortunate enough to see a signal, then focus MC resources on that point in parameter space.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 17 Significance from q  If we take f(q  |  ) ~  2 for 1dof, then the significance is (see Higgs combo note): For n ~ Poisson (  s+b) with b known, testing  =0 gives To quantify sensitivity give e.g. expected Z under s+b hypothesis

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 18 Likelihood ratio L s+b /L b Fast Fourier Transform method to find distribution; derives n-event distribution from that of single event with FFT. Hu and Nielson, physics/ Solves "5-sigma problem". Used at LEP -- systematics treated by averaging the likelihoods by sampling new values of nuisance parameters for each simulated experiment (integrated rather than profile likelihood). An alternative (in simple cases equivalent) test variable is

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 19 Determining distributions: systematics E.g. M ll distribution from Z'→dilepton search (CSC Book p 1709), uses 4-parameter function for signal. Sidebands provide estimate of background. So nothing in real analysis from MC, but... Still should consider some systematic due to fact that assumed parametric functions not perfect. General approach: include more parameters making the model more flexible, so that for some point in the enlarged parameter space, model = Nature (or difference negligible).

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 20 A general strategy (see attached note) Suppose one needs to know the shape of a distribution. Initial model (e.g. MC) is available, but known to be imperfect. Q: How can one incorporate the systematic error arising from use of the incorrect model? A: Improve the model. That is, introduce more adjustable parameters into the model so that for some point in the enlarged parameter space it is very close to the truth. Then use profile the likelihood with respect to the additional (nuisance) parameters. The correlations with the nuisance parameters will inflate the errors in the parameters of interest. Difficulty is deciding how to introduce the additional parameters.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 21 A simple example The naive model (a) could have been e.g. from MC (here statistical errors suppressed; point is to illustrate how to incorporate systematics.) 0th order model True model (Nature) Data

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 22 Comparison with the 0th order model The 0th order model gives q = 258.8, p  ×  

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 23 Enlarging the model Here try to enlarge the model by multiplying the 0th order distribution by a function s: where s(x) is a linear superposition of Bernstein basis polynomials of order m:

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 24 Bernstein basis polynomials

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 25 Enlarging the parameter space Using increasingly high order for the basis polynomials gives an increasingly flexible function. At each stage compare the p-value to some threshold, e.g., 0.1 or 0.2, to decide whether to include the additional parameter. Now iterate this procedure, and stop when the data do not require addition of further parameters based on the likelihood ratio test. Once the enlarged model has been found, simply include it in any further statistical procedures, and the statistical errors from the additional parameters will account for the systematic uncertainty in the original model.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 26 Fits using increasing numbers of parameters Stop here

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 27 Setting limits Method outlined in the CSC Higgs Combo = "CL s+b method", i.e., for the hypothesized  (e.g. 1) compute the p-value:  is excluded at CL=0.95 if p <  = 0.05, and if  =1 is excluded, the corresponding point in parameter space for the signal model is excluded. E.g. present expected limit on  vs mass parameter.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 28 Setting limits: CL s Alternative method (from Alex Read at LEP); exclude  if where This cures the problematic case where the one excludes parameter point where one has no sensitivity (e.g. large mass scale) because of a downwards fluctuation of the background. But there are perhaps other ways to get around this problem, e.g., only exclude if both observed and expected p-value .

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 29 Comment on validation procedures for methods Ongoing discussions on methodology Ideal is to use several methods (profile likelihood, Bayesian, CLs,...) for each result. Formal procedures still evolving, but if you are going to use a novel statistical technique, please come give a talk about it at the Statistics Forum.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 30 Comment on software tools Summer 08: agree to develop RooStats as common framework. Keep eye on ability to carry out independent validation. Key players: Kyle Cranmer (ATLAS) Gregory Schott (CMS) Wouter Verkerke (RooFit) Lorenzo Moneta (Root) Work currently very active (and help needed).

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 31 Summary Current areas of activity include: Development of profile likelihood, CLs, Bayesian methods for searches (including systematics); Combination tools (e.g. Higgs combination); RooStats software effort, Multivariate methods,... Statistics forum wants to increase active dialogue with the physics groups. If you are using a novel procedure or want to discuss a statistical method, please contact us.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 32 Extra slides

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 33 Physics Group / StatForum interaction Eilam Gross,

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 34 Questions from Luis Flores, 24 September, 2008