1 Perspective of SCMA IV from Particle Physics Louis Lyons Particle Physics, Oxford (CDF experiment, Fermilab) SCMA IV Penn State.

Slides:

Advertisements

Similar presentations

DO’S AND DONT’S WITH LIKELIHOODS Louis Lyons Oxford (CDF)

Advertisements

1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University LBL, January 2008.

27 th March CERN Higgs searches: CL s W. J. Murray RAL.

1 p-values and Discovery Louis Lyons Oxford SLUO Lecture 4, February 2007.

BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Latin American School March 2015.

1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.

BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University Heidelberg April 2013.

1 χ 2 and Goodness of Fit Louis Lyons IC and Oxford SLAC Lecture 2’, Sept 2008.

1 BAYES versus FREQUENTISM The Return of an Old Controversy The ideologies, with examples Upper limits Systematics Louis Lyons, Oxford University and CERN.

1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford and IC CDF IC, February 2008.

1 χ 2 and Goodness of Fit Louis Lyons Oxford Mexico, November 2006 Lecture 3.

8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.

1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Mexico, November 2006.

1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.

1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.

1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University CERN, October 2006 Lecture 4.

Current Statistical Issues in Particle Physics Louis Lyons Particle Physics Oxford U.K. Future of Statistical Theory Hyderabad December 2004.

1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University SLAC, January 2007.

1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF CERN, October 2006.

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Manchester, 16 th November 2005.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Imperial College and Oxford University Vienna May 2011.

G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.

1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

1 Statistical Inference Problems in High Energy Physics and Astronomy Louis Lyons Particle Physics, Oxford BIRS Workshop Banff.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

1 Practical Statistics for Physicists Stockholm Lectures Sept 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC

Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.

Discovery Experience: CMS Giovanni Petrucciani (UCSD)

880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.

1 Probability and Statistics  What is probability?  What is statistics?

G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,

G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.

1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.

Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.

1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.

1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.

Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.

1 Practical Statistics for Physicists CERN Latin American School March 2015 Louis Lyons Imperial College and Oxford CMS expt at LHC

Sampling and estimation Petter Mostad

BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Summer Students July 2014.

Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.

1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.

G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.

G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.

1 χ 2 and Goodness of Fit & L ikelihood for Parameters Louis Lyons Imperial College and Oxford CERN Summer Students July 2014.

In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.

1 Practical Statistics for Physicists CERN Summer Students July 2014 Louis Lyons Imperial College and Oxford CMS expt at LHC

1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Imperial College and Oxford University Amsterdam Oct 2009.

1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.

Statistical Issues in Searches for New Physics

BAYES and FREQUENTISM: The Return of an Old Controversy

What is Probability? Bayes and Frequentism

χ2 and Goodness of Fit & Likelihood for Parameters

BAYES and FREQUENTISM: The Return of an Old Controversy

Likelihoods 1) Introduction 2) Do’s & Dont’s

p-values and Discovery

Likelihoods 1) Introduction . 2) Do’s & Dont’s

Lecture 4 1 Probability (90 min.)

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Introduction to Statistics − Day 4

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Presentation transcript:

1 Perspective of SCMA IV from Particle Physics Louis Lyons Particle Physics, Oxford (CDF experiment, Fermilab) SCMA IV Penn State 15 th June 2006

2 Topics Basic Particle Physics analyses Similarities between Particles and Astrophysics issues Differences What Astrophysicists do particularly well What Particle Physicists have learnt Conclusions

3 Particle Physics What it is Typical experiments Typical data Typical analysis

4

5 Typical Experiments Experiment Energy Beams # events Result LEP 200 GeV e+ e Z N = 2.987± BaBar/Belle 10 GeV e+ e B anti-B CP-violation Tevatron 2000 GeV p anti-p “10 14 ” SUSY? LHC GeV p p (2007…) Higgs? K  K ~3 GeV ν μ 100 ν oscillations

6

7 K2K -from KEK to Kamioka- Long Baseline Neutrino Oscillation Experiment

8 CDF at Fermilab

9 Typical Analysis Parameter determination: dn/dt = 1/τ * exp(-t / τ) Worry about backgrounds, t resolution, t-dependent efficiency 1) Reconstruct tracks 2) Select real events 3) Select wanted events 4) Extract t from L and v 5) Model signal and background 6) Likelihood fit for lifetime and statistical error 7) Estimate systematic error τ ± σ τ (stat) ± σ τ (syst)

10 Typical Analysis Hypothesis testing: Peak or statistical fluctuation?

11 Similarities Large data sets {ATLAS: event = Mbyte; total = 10 Pbytes} Experimental resolution Systematics Separating signal from background Parameter estimation Testing models (versus alternative?) Search for signals: Setting limits or Discovery SCMA and PHYSTAT

12 Differences Bayes or Frequentism? Background Specific Astrophysics issues Time dependence Spatial structures Correlations Non-parametric methods Visualisation Cosmic variance Blind analyses `

13 Bayesian versus Frequentism Basis of method Bayes Theorem  Posterior probability distribution Uses pdf for data, for fixed parameters Meaning of probability Degree of beliefFrequentist definition Prob of parameters? YesAnathema Needs prior?YesNo Choice of interval? YesYes (except F+C) Data considered Only data you have...+ other possible data Likelihood principle? YesNo Bayesian Frequentist

14 Bayesian versus Frequentism Ensemble of experiment NoYes (but often not explicit) Final statement Posterior probability distribution Parameter values  Data is likely Unphysical/ empty ranges Excluded by priorCan occur SystematicsIntegrate over priorExtend dimensionality of frequentist construction CoverageUnimportantBuilt-in Decision making Yes (uses cost function)Not useful Bayesian Frequentist

15 Bayesianism versus Frequentism “Bayesians address the question everyone is interested in, by using assumptions no-one believes” “Frequentists use impeccable logic to deal with an issue of no interest to anyone”

16 Differences Bayes or Frequentism? Background Specific Astrophysics issues Time dependence Spatial structures Correlations Non-parametric methods Visualisation Cosmic variance Blind analyses `

17 What Astrophysicists do well Glorious pictures Sharing data Making data publicly available Dealing with large data sets Visualisation Funding for Astrostatistics Statistical software

18 Whirlpool Galaxy Width of Z 0 3 light neutrinos

19 What Astrophysicists do well Glorious pictures Sharing data Making data publicly available Dealing with large data sets Visualisation Funding for Astrostatistics Statistical software

20 What Particle Physicists now know  (ln L ) = 0.5 rule Unbinned L max and Goodness of fit Prob (data | hypothesis) ≠ Prob (hypothesis | data) Comparing 2 hypotheses Δ(c 2 ) ≠ c 2 Bounded parameters: Feldman and Cousins Use correct L (Punzi effect) Blind analyses

21 Δln L = -1/2 rule If L (μ) is Gaussian, following definitions of σ are equivalent: 1) RMS of L ( µ ) 2) 1/√(-d 2 L /d µ 2 ) 3) ln( L (μ±σ) = ln( L (μ 0 )) -1/2 If L (μ) is non-Gaussian, these are no longer the same “ Procedure 3) above still gives interval that contains the true value of parameter μ with 68% probability ” Heinrich: CDF note 6438 (see CDF Statistics Committee Web-page) Barlow: Phystat05

22 COVERAGE How often does quoted range for parameter include param’s true value? N.B. Coverage is a property of METHOD, not of a particular exptl result Coverage can vary with Study coverage of different methods of Poisson parameter, from observation of number of events n Hope for: Nominal value 100%

23 COVERAGE If true for all : “correct coverage” P< for some “undercoverage” (this is serious !) P> for some “overcoverage” Conservative Loss of rejection power

24 Coverage : L approach (Not frequentist) P(n, μ) = e -μ μ n /n! (Joel Heinrich CDF note 6438) -2 lnλ< 1 λ = P(n,μ)/P(n,μ best ) UNDERCOVERS

25 Frequentist central intervals, NEVER undercover (Conservative at both ends)

26  = (n- µ) 2 /µ Δ = % coverage? NOT frequentist : Coverage = 0%  100%

27 Great?Good?Bad L max Frequency Unbinned L max and Goodness of Fit? Find params by maximising L So larger L better than smaller L So L max gives Goodness of Fit?? Monte Carlo distribution of unbinned L max

28 Not necessarily: pdf L (data,params) fixed vary L Contrast pdf (data,params) param vary fixed data e.g. p(t|λ) = λ e -λt Max at t = 0 Max at λ =1/ t p L t λ Unbinned L max and Goodness of Fit?

29 Example 1 Fit exponential to times t 1, t 2, t 3 ……. [Joel Heinrich, CDF 5639] L = ln L max = -N(1 + ln t av ) i.e. Depends only on AVERAGE t, but is INDEPENDENT OF DISTRIBUTION OF t (except for……..) (Average t is a sufficient statistic) Variation of L max in Monte Carlo is due to variations in samples’ average t, but NOT TO BETTER OR WORSE FIT pdf Same average t same L max t L max and Goodness of Fit?

30 Example 2 L max and Goodness of Fit? 1 + α cos 2 θ L = cos θ pdf (and likelihood) depends only on cos 2 θ i Insensitive to sign of cos θ i So data can be in very bad agreement with expected distribution e.g. All data with cos θ < 0 and L max does not know about it. Example of general principle

31 Example 3 Fit to Gaussian with variable μ, fixed σ ln L max = N(-0.5 ln 2π – ln σ) – 0.5 Σ(x i – x av ) 2 / σ 2 constant ~variance(x) i.e. L max depends only on variance(x), which is not relevant for fitting μ (μ est = x av ) Smaller than expected variance(x) results in larger L max x Worse fit, larger L max Better fit, lower L max L max and Goodness of Fit?

32 Transformation properties of pdf and L Lifetime example: dn/dt = λ e – λt Change observable from t to y = √t dn/dy = (dn/dt) (dt/dy) = 2 y λ exp(–λy 2 ) So (a) pdf CHANGES but (b) i.e. corresponding integrals of pdf are INVARIANT

33 Now for L ikelihood When parameter changes from λ to τ = 1/λ (a’) L does not change dn/dt = 1/ τ exp{-t/τ} and so L ( τ;t) = L (λ=1/τ;t) because identical numbers occur in evaluations of the two L ’s BUT (b’) So it is NOT meaningful to integrate L (However,………)

34 CONCLUSION: NOT recognised statistical procedure [Metric dependent: τ range agrees with τ pred λ range inconsistent with 1/τ pred ] BUT 1) Could regard as “black box” 2) Make respectable by L Bayes’ posterior Posterior( λ) ~ L ( λ)* Prior( λ) [and Prior(λ) can be constant]

35 pdf(t;λ) L (λ;t) Value of function Changes when observable is transformed INVARIANT wrt transformation of parameter Integral of function INVARIANT wrt transformation of observable Changes when param is transformed ConclusionMax prob density not very sensible Integrating L not very sensible

36 P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3%

37 P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3% but P (female ; pregnant) >>>3%

38 P (Data;Theory) P (Theory;Data) HIGGS SEARCH at CERN Is data consistent with Standard Model? or with Standard Model + Higgs? End of Sept 2000 Data not very consistent with S.M. Prob (Data ; S.M.) < 1% valid frequentist statement Turned by the press into: Prob (S.M. ; Data) < 1% and therefore Prob (Higgs ; Data) > 99% i.e. “It is almost certain that the Higgs has been seen”

39 p-value ≠ Prob of hypothesis being correct Given data and H0 = null hypothesis, Construct statistic T (e.g. χ 2 ) p-value = probability {T  t observed }, assuming H0 = true If p = 10 -3, what is prob that H0 = true? e.g. Try to identify μ in beam (H0: particle = μ) with π contam. Prob (H0) depends on a) similarity of μ and π masses b) relative populations of μ and π If N(π) ~ N(μ), prob(H0)  0.5 If N(π)  N(μ), prob(H0) ~ 1 0 p 1 If N(π) ~10*N(μ), prob(H0) ~ 0.1 i.e. prob(H0) varies with p-value, but is not equal to it

40 p-value ≠ Prob of hypothesis being correct After Conference Banquet speech: “Of those results that have been quoted as significant at the 99% level, about half have turned out to be wrong!” Supposed to be funny, but in fact is perfectly OK

41 PARADOX Histogram with 100 bins Fit 1 parameter S min : χ 2 with NDF = 99 (Expected χ 2 = 99 ± 14) For our data, S min (p 0 ) = 90 Is p 1 acceptable if S(p 1 ) = 115? 1)YES. Very acceptable χ 2 probability 2) NO. σ p from S(p 0 +σ p ) = S min +1 = 91 But S(p 1 ) – S(p 0 ) = 25 So p 1 is 5σ away from best value

42

43

44

45

46 Comparing data with different hypotheses

47 χ 2 with ν degrees of freedom? 1)ν = data – free parameters ? Why asymptotic (apart from Poisson  Gaussian) ? a) Fit flatish histogram with y = N { cos(x-x 0 )} x 0 = free param b) Neutrino oscillations: almost degenerate parameters y ~ 1 – A sin 2 (1.27 Δm 2 L/E) 2 parameters 1 – A (1.27 Δm 2 L/E) 2 1 parameter Small Δm 2

48 χ 2 with ν degrees of freedom? 2) Is difference in χ 2 distributed as χ 2 ? H0 is true. Also fit with H1 with k extra params e. g. Look for Gaussian peak on top of smooth background y = C(x) + A exp{-0.5 ((x-x 0 )/σ) 2 } Is χ 2 H0 - χ 2 H1 distributed as χ 2 with ν = k = 3 ? Relevant for assessing whether enhancement in data is just a statistical fluctuation, or something more interesting N.B. Under H0 (y = C(x)) : A=0 (boundary of physical region) x 0 and σ undefined

49 Is difference in χ 2 distributed as χ 2 ? Demortier: H0 = quadratic bgd H1 =……… + Gaussian of fixed width Protassov, van Dyk, Connors, …. H0 = continuum (a)H1 = narrow emission line (b)H1 = wider emission line (c)H1 = absorption line Nominal significance level = 5%

50 So need to determine the Δχ 2 distribution by Monte Carlo N.B. 1)Determining Δχ 2 for hypothesis H1 when data is generated according to H0 is not trivial, because there will be lots of local minima 2) If we are interested in 5σ significance level, needs lots of MC simulations Is difference in χ 2 distributed as χ 2 ?

51

52

53

54

55

56 X obs = -2 Now gives upper limit

57

58

59 Getting L wrong: Punzi effect Giovanni PHYSTAT2003 “Comments on L fits with variable resolution” Separate two close signals, when resolution σ varies event by event, and is different for 2 signals e.g. 1) Signal 1 1+cos 2 θ Signal 2 Isotropic and different parts of detector give different σ 2) M (or τ) Different numbers of tracks  different σ M (or σ τ )

60 Events characterised by x i and  σ i A events centred on x = 0 B events centred on x = 1 L (f) wrong = Π [f * G(x i,0,σ i ) + (1-f) * G(x i,1,σ i )] L (f) right = Π [f*p(x i,σ i ;A) + (1-f) * p(x i,σ i ;B)] p(S,T) = p(S|T) * p(T) p(x i,σ i |A) = p(x i |σ i,A) * p(σ i |A) = G(x i,0,σ i ) * p(σ i |A) So L (f) right = Π[f * G(x i,0,σ i ) * p(σ i |A) + (1-f) * G(x i,1,σ i ) * p(σ i |B)] If p(σ|A) = p(σ|B), L right = L wrong but NOT otherwise Punzi Effect

61 Punzi’s Monte Carlo for A : G(x,0,   ) B : G(x,1   ) f A = 1/3 L wrong L right       f A  f  f A  f (3) Same (4) (0) 0   (6) (0)  0  1   (7) (2)   (9) (0) 0  1) L wrong OK for p    p   , but otherwise BIASSED  2) L right unbiassed, but L wrong biassed (enormously)!  3) L right gives smaller σ f than L wrong Punzi Effect

62 Explanation of Punzi bias σ A = 1 σ B = 2 A events with σ = 1 B events with σ = 2 x  x  ACTUAL DISTRIBUTION FITTING FUNCTION [N A /N B variable, but same for A and B events] Fit gives upward bias for N A /N B because (i) that is much better for A events; and (ii) it does not hurt too much for B events

63 Another scenario for Punzi problem: PID A B π K M TOF Originally: Positions of peaks = constant K-peak  π-peak at large momentum σ i variable, (σ i ) A ≠ (σ i ) B σ i ~ constant, p K ≠ p π COMMON FEATURE: Separation / Error ≠ Constant Where else?? MORAL: Beware of event-by-event variables whose pdf’s do not appear in L

64 Avoiding Punzi Bias Include p(σ|A) and p(σ|B) in fit (But then, for example, particle identification may be determined more by momentum distribution than by PID) OR Fit each range of σ i separately, and add (N A ) i  (N A ) total, and similarly for B Incorrect method using L wrong uses weighted average of (f A ) j, assumed to be independent of j Talk by Catastini at PHYSTAT05

65 BLIND ANALYSES Why blind analysis? Methods of blinding Study procedure with simulation only Add random number to result * Look at only first fraction of data Keep the signal box closed Keep MC parameters hidden Keep fraction visible for each bin hidden After analysis is unblinded, …….. * Luis Alvarez suggestion re “discovery” of free quarks

66 Conclusions Common problems: scope for learning from each other Large data sets Separating signal Testing models Signal / Discovery Targetted Workshops e.g. SAMSI: Jan  May 2006 Banff: July 2006 (Limits with nuisance params; significance tests: signal-bgd sepn.) Summer schools: Spain, FNAL,…… Thanks to Stefen Lauritzen, Lance Miller, Subir Sarkar, Roberto Trotta. ………

67 Excellent Conference: Very big THANK YOU to Jogesh and Eric !