Download presentation
Presentation is loading. Please wait.
1
1 Perspective of SCMA IV from Particle Physics Louis Lyons Particle Physics, Oxford (CDF experiment, Fermilab) l.lyons@physics.ox.ac.uk SCMA IV Penn State 15 th June 2006
2
2 Topics Basic Particle Physics analyses Similarities between Particles and Astrophysics issues Differences What Astrophysicists do particularly well What Particle Physicists have learnt Conclusions
3
3 Particle Physics What it is Typical experiments Typical data Typical analysis
4
4
5
5 Typical Experiments Experiment Energy Beams # events Result LEP 200 GeV e+ e- 10 7 Z N = 2.987± 0.008 BaBar/Belle 10 GeV e+ e- 10 8 B anti-B CP-violation Tevatron 2000 GeV p anti-p “10 14 ” SUSY? LHC 14000 GeV p p (2007…) Higgs? K K ~3 GeV ν μ 100 ν oscillations
6
6
7
7 K2K -from KEK to Kamioka- Long Baseline Neutrino Oscillation Experiment
8
8 CDF at Fermilab
9
9 Typical Analysis Parameter determination: dn/dt = 1/τ * exp(-t / τ) Worry about backgrounds, t resolution, t-dependent efficiency 1) Reconstruct tracks 2) Select real events 3) Select wanted events 4) Extract t from L and v 5) Model signal and background 6) Likelihood fit for lifetime and statistical error 7) Estimate systematic error τ ± σ τ (stat) ± σ τ (syst)
10
10 Typical Analysis Hypothesis testing: Peak or statistical fluctuation?
11
11 Similarities Large data sets {ATLAS: event = Mbyte; total = 10 Pbytes} Experimental resolution Systematics Separating signal from background Parameter estimation Testing models (versus alternative?) Search for signals: Setting limits or Discovery SCMA and PHYSTAT
12
12 Differences Bayes or Frequentism? Background Specific Astrophysics issues Time dependence Spatial structures Correlations Non-parametric methods Visualisation Cosmic variance Blind analyses `
13
13 Bayesian versus Frequentism Basis of method Bayes Theorem Posterior probability distribution Uses pdf for data, for fixed parameters Meaning of probability Degree of beliefFrequentist definition Prob of parameters? YesAnathema Needs prior?YesNo Choice of interval? YesYes (except F+C) Data considered Only data you have...+ other possible data Likelihood principle? YesNo Bayesian Frequentist
14
14 Bayesian versus Frequentism Ensemble of experiment NoYes (but often not explicit) Final statement Posterior probability distribution Parameter values Data is likely Unphysical/ empty ranges Excluded by priorCan occur SystematicsIntegrate over priorExtend dimensionality of frequentist construction CoverageUnimportantBuilt-in Decision making Yes (uses cost function)Not useful Bayesian Frequentist
15
15 Bayesianism versus Frequentism “Bayesians address the question everyone is interested in, by using assumptions no-one believes” “Frequentists use impeccable logic to deal with an issue of no interest to anyone”
16
16 Differences Bayes or Frequentism? Background Specific Astrophysics issues Time dependence Spatial structures Correlations Non-parametric methods Visualisation Cosmic variance Blind analyses `
17
17 What Astrophysicists do well Glorious pictures Sharing data Making data publicly available Dealing with large data sets Visualisation Funding for Astrostatistics Statistical software
18
18 Whirlpool Galaxy Width of Z 0 3 light neutrinos
19
19 What Astrophysicists do well Glorious pictures Sharing data Making data publicly available Dealing with large data sets Visualisation Funding for Astrostatistics Statistical software
20
20 What Particle Physicists now know (ln L ) = 0.5 rule Unbinned L max and Goodness of fit Prob (data | hypothesis) ≠ Prob (hypothesis | data) Comparing 2 hypotheses Δ(c 2 ) ≠ c 2 Bounded parameters: Feldman and Cousins Use correct L (Punzi effect) Blind analyses
21
21 Δln L = -1/2 rule If L (μ) is Gaussian, following definitions of σ are equivalent: 1) RMS of L ( µ ) 2) 1/√(-d 2 L /d µ 2 ) 3) ln( L (μ±σ) = ln( L (μ 0 )) -1/2 If L (μ) is non-Gaussian, these are no longer the same “ Procedure 3) above still gives interval that contains the true value of parameter μ with 68% probability ” Heinrich: CDF note 6438 (see CDF Statistics Committee Web-page) Barlow: Phystat05
22
22 COVERAGE How often does quoted range for parameter include param’s true value? N.B. Coverage is a property of METHOD, not of a particular exptl result Coverage can vary with Study coverage of different methods of Poisson parameter, from observation of number of events n Hope for: Nominal value 100%
23
23 COVERAGE If true for all : “correct coverage” P< for some “undercoverage” (this is serious !) P> for some “overcoverage” Conservative Loss of rejection power
24
24 Coverage : L approach (Not frequentist) P(n, μ) = e -μ μ n /n! (Joel Heinrich CDF note 6438) -2 lnλ< 1 λ = P(n,μ)/P(n,μ best ) UNDERCOVERS
25
25 Frequentist central intervals, NEVER undercover (Conservative at both ends)
26
26 = (n- µ) 2 /µ Δ = 0.1 24.8% coverage? NOT frequentist : Coverage = 0% 100%
27
27 Great?Good?Bad L max Frequency Unbinned L max and Goodness of Fit? Find params by maximising L So larger L better than smaller L So L max gives Goodness of Fit?? Monte Carlo distribution of unbinned L max
28
28 Not necessarily: pdf L (data,params) fixed vary L Contrast pdf (data,params) param vary fixed data e.g. p(t|λ) = λ e -λt Max at t = 0 Max at λ =1/ t p L t λ Unbinned L max and Goodness of Fit?
29
29 Example 1 Fit exponential to times t 1, t 2, t 3 ……. [Joel Heinrich, CDF 5639] L = ln L max = -N(1 + ln t av ) i.e. Depends only on AVERAGE t, but is INDEPENDENT OF DISTRIBUTION OF t (except for……..) (Average t is a sufficient statistic) Variation of L max in Monte Carlo is due to variations in samples’ average t, but NOT TO BETTER OR WORSE FIT pdf Same average t same L max t L max and Goodness of Fit?
30
30 Example 2 L max and Goodness of Fit? 1 + α cos 2 θ L = cos θ pdf (and likelihood) depends only on cos 2 θ i Insensitive to sign of cos θ i So data can be in very bad agreement with expected distribution e.g. All data with cos θ < 0 and L max does not know about it. Example of general principle
31
31 Example 3 Fit to Gaussian with variable μ, fixed σ ln L max = N(-0.5 ln 2π – ln σ) – 0.5 Σ(x i – x av ) 2 / σ 2 constant ~variance(x) i.e. L max depends only on variance(x), which is not relevant for fitting μ (μ est = x av ) Smaller than expected variance(x) results in larger L max x Worse fit, larger L max Better fit, lower L max L max and Goodness of Fit?
32
32 Transformation properties of pdf and L Lifetime example: dn/dt = λ e – λt Change observable from t to y = √t dn/dy = (dn/dt) (dt/dy) = 2 y λ exp(–λy 2 ) So (a) pdf CHANGES but (b) i.e. corresponding integrals of pdf are INVARIANT
33
33 Now for L ikelihood When parameter changes from λ to τ = 1/λ (a’) L does not change dn/dt = 1/ τ exp{-t/τ} and so L ( τ;t) = L (λ=1/τ;t) because identical numbers occur in evaluations of the two L ’s BUT (b’) So it is NOT meaningful to integrate L (However,………)
34
34 CONCLUSION: NOT recognised statistical procedure [Metric dependent: τ range agrees with τ pred λ range inconsistent with 1/τ pred ] BUT 1) Could regard as “black box” 2) Make respectable by L Bayes’ posterior Posterior( λ) ~ L ( λ)* Prior( λ) [and Prior(λ) can be constant]
35
35 pdf(t;λ) L (λ;t) Value of function Changes when observable is transformed INVARIANT wrt transformation of parameter Integral of function INVARIANT wrt transformation of observable Changes when param is transformed ConclusionMax prob density not very sensible Integrating L not very sensible
36
36 P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3%
37
37 P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3% but P (female ; pregnant) >>>3%
38
38 P (Data;Theory) P (Theory;Data) HIGGS SEARCH at CERN Is data consistent with Standard Model? or with Standard Model + Higgs? End of Sept 2000 Data not very consistent with S.M. Prob (Data ; S.M.) < 1% valid frequentist statement Turned by the press into: Prob (S.M. ; Data) < 1% and therefore Prob (Higgs ; Data) > 99% i.e. “It is almost certain that the Higgs has been seen”
39
39 p-value ≠ Prob of hypothesis being correct Given data and H0 = null hypothesis, Construct statistic T (e.g. χ 2 ) p-value = probability {T t observed }, assuming H0 = true If p = 10 -3, what is prob that H0 = true? e.g. Try to identify μ in beam (H0: particle = μ) with π contam. Prob (H0) depends on a) similarity of μ and π masses b) relative populations of μ and π If N(π) ~ N(μ), prob(H0) 0.5 If N(π) N(μ), prob(H0) ~ 1 0 p 1 If N(π) ~10*N(μ), prob(H0) ~ 0.1 i.e. prob(H0) varies with p-value, but is not equal to it
40
40 p-value ≠ Prob of hypothesis being correct After Conference Banquet speech: “Of those results that have been quoted as significant at the 99% level, about half have turned out to be wrong!” Supposed to be funny, but in fact is perfectly OK
41
41 PARADOX Histogram with 100 bins Fit 1 parameter S min : χ 2 with NDF = 99 (Expected χ 2 = 99 ± 14) For our data, S min (p 0 ) = 90 Is p 1 acceptable if S(p 1 ) = 115? 1)YES. Very acceptable χ 2 probability 2) NO. σ p from S(p 0 +σ p ) = S min +1 = 91 But S(p 1 ) – S(p 0 ) = 25 So p 1 is 5σ away from best value
42
42
43
43
44
44
45
45
46
46 Comparing data with different hypotheses
47
47 χ 2 with ν degrees of freedom? 1)ν = data – free parameters ? Why asymptotic (apart from Poisson Gaussian) ? a) Fit flatish histogram with y = N {1 + 10 -6 cos(x-x 0 )} x 0 = free param b) Neutrino oscillations: almost degenerate parameters y ~ 1 – A sin 2 (1.27 Δm 2 L/E) 2 parameters 1 – A (1.27 Δm 2 L/E) 2 1 parameter Small Δm 2
48
48 χ 2 with ν degrees of freedom? 2) Is difference in χ 2 distributed as χ 2 ? H0 is true. Also fit with H1 with k extra params e. g. Look for Gaussian peak on top of smooth background y = C(x) + A exp{-0.5 ((x-x 0 )/σ) 2 } Is χ 2 H0 - χ 2 H1 distributed as χ 2 with ν = k = 3 ? Relevant for assessing whether enhancement in data is just a statistical fluctuation, or something more interesting N.B. Under H0 (y = C(x)) : A=0 (boundary of physical region) x 0 and σ undefined
49
49 Is difference in χ 2 distributed as χ 2 ? Demortier: H0 = quadratic bgd H1 =……… + Gaussian of fixed width Protassov, van Dyk, Connors, …. H0 = continuum (a)H1 = narrow emission line (b)H1 = wider emission line (c)H1 = absorption line Nominal significance level = 5%
50
50 So need to determine the Δχ 2 distribution by Monte Carlo N.B. 1)Determining Δχ 2 for hypothesis H1 when data is generated according to H0 is not trivial, because there will be lots of local minima 2) If we are interested in 5σ significance level, needs lots of MC simulations Is difference in χ 2 distributed as χ 2 ?
51
51
52
52
53
53
54
54
55
55
56
56 X obs = -2 Now gives upper limit
57
57
58
58
59
59 Getting L wrong: Punzi effect Giovanni Punzi @ PHYSTAT2003 “Comments on L fits with variable resolution” Separate two close signals, when resolution σ varies event by event, and is different for 2 signals e.g. 1) Signal 1 1+cos 2 θ Signal 2 Isotropic and different parts of detector give different σ 2) M (or τ) Different numbers of tracks different σ M (or σ τ )
60
60 Events characterised by x i and σ i A events centred on x = 0 B events centred on x = 1 L (f) wrong = Π [f * G(x i,0,σ i ) + (1-f) * G(x i,1,σ i )] L (f) right = Π [f*p(x i,σ i ;A) + (1-f) * p(x i,σ i ;B)] p(S,T) = p(S|T) * p(T) p(x i,σ i |A) = p(x i |σ i,A) * p(σ i |A) = G(x i,0,σ i ) * p(σ i |A) So L (f) right = Π[f * G(x i,0,σ i ) * p(σ i |A) + (1-f) * G(x i,1,σ i ) * p(σ i |B)] If p(σ|A) = p(σ|B), L right = L wrong but NOT otherwise Punzi Effect
61
61 Punzi’s Monte Carlo for A : G(x,0, ) B : G(x,1 ) f A = 1/3 L wrong L right f A f f A f 1. 0 1. 0 0. 336(3) 0. 08 Same 1. 01. 1 0. 374(4) 0. 08 0. 333(0) 0 1. 02. 0 0. 645(6) 0. 12 0. 333(0) 0 1 2 1.5 3 0. 514(7) 0. 14 0. 335(2) 0. 03 1.0 1 2 0.482(9) 0.09 0.333(0) 0 1) L wrong OK for p p , but otherwise BIASSED 2) L right unbiassed, but L wrong biassed (enormously)! 3) L right gives smaller σ f than L wrong Punzi Effect
62
62 Explanation of Punzi bias σ A = 1 σ B = 2 A events with σ = 1 B events with σ = 2 x x ACTUAL DISTRIBUTION FITTING FUNCTION [N A /N B variable, but same for A and B events] Fit gives upward bias for N A /N B because (i) that is much better for A events; and (ii) it does not hurt too much for B events
63
63 Another scenario for Punzi problem: PID A B π K M TOF Originally: Positions of peaks = constant K-peak π-peak at large momentum σ i variable, (σ i ) A ≠ (σ i ) B σ i ~ constant, p K ≠ p π COMMON FEATURE: Separation / Error ≠ Constant Where else?? MORAL: Beware of event-by-event variables whose pdf’s do not appear in L
64
64 Avoiding Punzi Bias Include p(σ|A) and p(σ|B) in fit (But then, for example, particle identification may be determined more by momentum distribution than by PID) OR Fit each range of σ i separately, and add (N A ) i (N A ) total, and similarly for B Incorrect method using L wrong uses weighted average of (f A ) j, assumed to be independent of j Talk by Catastini at PHYSTAT05
65
65 BLIND ANALYSES Why blind analysis? Methods of blinding Study procedure with simulation only Add random number to result * Look at only first fraction of data Keep the signal box closed Keep MC parameters hidden Keep fraction visible for each bin hidden After analysis is unblinded, …….. * Luis Alvarez suggestion re “discovery” of free quarks
66
66 Conclusions Common problems: scope for learning from each other Large data sets Separating signal Testing models Signal / Discovery Targetted Workshops e.g. SAMSI: Jan May 2006 Banff: July 2006 (Limits with nuisance params; significance tests: signal-bgd sepn.) Summer schools: Spain, FNAL,…… Thanks to Stefen Lauritzen, Lance Miller, Subir Sarkar, Roberto Trotta. ………
67
67 Excellent Conference: Very big THANK YOU to Jogesh and Eric !
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.