Download presentation
Presentation is loading. Please wait.
Published byGregory Copeland Modified over 9 years ago
1
Jan Conrad NuFACT 06 25 August 2006 1 Some comments on ”What (should) sensitivity estimates mean ?” Jan Conrad Royal Institute of Technology (KTH) Stockholm
2
Jan Conrad NuFACT 06 25 August 2006 2 Outline l Definitions of “sensitivity” l confidence intervals/p-values with systematic uncertainties l Averaging l Profiling l An illustration l Remarks on the ensemble l Summary/recommendations The aim of this talk is to confuse as much as necessary but as little as possible
3
Jan Conrad NuFACT 06 25 August 2006 3 Definition of ”sensitivity” - I l 1 (well known HEP-statistics expert from Oxford) l Median upper limit obtained from repeated experiment with no signal. in a two dimensional problem keep one parameter fixed. l 2 (fairly well known HEP statistics expert from Germany) l Mean result of whatever the quantity we want to measure, for example 90 % confidence intervals, mean being taken over identical replica of the experiment. l 3 (less well known HEP statistics expert from Italy) l Look at that paper I wrote in arXiv:physics. Nobody has used it but it is the best definition.....
4
Jan Conrad NuFACT 06 25 August 2006 4 Definition of sensitivity -II l Definition using p-values (hypothesis test): l The experiment is said to be sensitive to a given value of the parameter Θ 13 = Θ 13 sens at signficance level α if the mean p-value obtained given Θ 13 sens is smaller than α. l The p-value is (per defintion) calculated given zero hypothesis Θ 13 = 0: test statistics, T, could be for example χ 2 Actually observed value of the test statistics
5
Jan Conrad NuFACT 06 25 August 2006 5 Definition of sensitivity –III (what nuFact people most often use ?) l Definition using confidence intervals (CI ) 1 l The experiment is said to be sensitive to a given value of the parameter Θ 13 = Θ 13 sens at signficance level α if the mean 2 1-α CI obtained, given Θ 13 sens, does not contain Θ 13 = 0. 1) This means using confidence intervals for hypothesis testing. I think I convinced myself, that the approaches are equivalent, but..... 2) some people prefer median.... (because the median is invariant under parameter transformations)
6
Jan Conrad NuFACT 06 25 August 2006 6 So what ? l Once we decided on the definition of sensitivity, two problems need to be addressed: l What method should be used to calculate the CI or the p- value ? l Since the experiment does not exist, what is the ensemble of experiments we use to calculate the mean (or other quantities) ?
7
Jan Conrad NuFACT 06 25 August 2006 7 P-values and the Neyman Pearson lemma l Uniformly most powerful test statistic: l To calculate p-values, we need to know the null- distribution of T. Therefore it comes handy that asymptotically: Remember:
8
Jan Conrad NuFACT 06 25 August 2006 8 Example: practical calculation using p-values l Simulate an observation were Θ 13 >0. Fit a model with Θ 13 = 0 and a model with Θ 13 >0 then: l δχ 2 is (under certain circumstances) χ 2 distributed. For problems with these approach Luc Demortier: "P-Values: What They Are and How to Use Them", draft report presented at the BIRS workshop on statistical inference problems in high energy physics and astronomy, July 15-20, 2006.
9
Jan Conrad NuFACT 06 25 August 2006 9 Some methods for p-value calculation l Conditioning l Prior-predictive l Posterior-predictive l Plug-In l Likelihood Ratio l Confidence Interval l Generalized frequentist I will not talk about these any more.
10
Jan Conrad NuFACT 06 25 August 2006 10 Some methods for confidence interval calculation (the Banff list) l Bayesian l Feldman & Cousins with Bayesian treatment of nuisance parameters l Profile Likelihood l Modified Likelihood l Feldman & Cousins with Profile Likelihood l Fully frequentist l Empirical Bayes I will talk a little bit about this one
11
Jan Conrad NuFACT 06 25 August 2006 11 Properties I: Coverage l A method is said to have coverage (1-α) if, in infinitely many repeated experiments the resulting CIs include (cover) the true value in a fraction (1-α) of all cases (irrespective of what the true value is). 1 -α s 1 0.9 over-covering under-covering
12
Jan Conrad NuFACT 06 25 August 2006 12 Properties II: Type I, type II error and power l Type I error: Reject H 0, though it is true. Prob(Type I error) = α (corresponds to coverage for hyp. tests) l Type II error: Accept H 0, though it is false Power β = 1 – Prob(Type II error) Given H 1, what is the probability that we will reject H 0 at given significance α ?
13
Jan Conrad NuFACT 06 25 August 2006 13 Nuisance parameters l Nuisance parameters are parameters which enter the data model, but which are not of prime interest. Example background: l You don’t want to give CIs (or p-values) dependent on nuisance parameters need a way to get rid of them
14
Jan Conrad NuFACT 06 25 August 2006 14 How to treat nuisance parameters ? l There is a wealth of approaches to dealing with nuisance parameters. Two are particularly common: l Averaging No time to discuss this, see: l Profiling l Example which I will present here: Profile Likelihood/MINUIT (which is similar to what many of you have been doing) J.C et. al. Phys. Rev D67:012002,2003 J.C & F. Tegenfeldt, Proceedings PhyStat 05, physics/0511055 F. Tegenfeldt & J.C. Nucl. Instr. Meth.A539:407-413, 2005 Bayesian
15
Jan Conrad NuFACT 06 25 August 2006 15 Profile Likelihood Intervals Lower limitUpper Limit 2.706 meas n, meas. b MLE of b given s MLE of b and s given observations To extract limits:
16
Jan Conrad NuFACT 06 25 August 2006 16 From MINUIT manual l See F. James, MINUIT Reference Manual, CERN Library Long Write-up D506, p.5: “ The MINOS error for a given parameter is defined as the change in the value of the parameter that causes the F’ to increase by the amount UP, where F’ is the minimum w.r.t to all other free parameters”. Confidence Interval Profile Likelihood ΔΧ 2 = 2.71 (90%), ΔΧ 2 = 1.07 (70 %)
17
Jan Conrad NuFACT 06 25 August 2006 17 Coverage of profile likelihood Background: Poisson (unc ~ 20 % -- 40 %), Efficiency: binomial (unc ~ 12%) Rolke et al Minuit W. Rolke, A. Lopez, J.C. Nucl. Inst.Meth A 551 (2005) 493- 503 (1- α) MC true s
18
Jan Conrad NuFACT 06 25 August 2006 18 Confidence Intervals for new particle searches at LHC? l Basic idea: calculate 5 σ confidence interval and claim discovery if s = 0 is not included. l Straw-man model: Observed in signal region Observed in background region K. S. Cranmer, Proceedings PhyStat 2005 Sideband of size τ - Bayesian under-covers badly (add 16 events to get correct significance) - Profile is the only method considered here which gives coverage (exc. full construction) Profile Bayesian
19
Jan Conrad NuFACT 06 25 August 2006 19 The profile likelihood and the χ 2 l The most common method in neutrino physics seems to be minimizing a χ 2 l Assume Likelihood function: l Omitting terms not dependent on parameter: χ 2 fit asymptotically equivalent to profile likelihood if you minimize w.r.t nuisance parameters Exactasymptotically
20
Jan Conrad NuFACT 06 25 August 2006 20 A simple example calculation. l Model generating the data: l This means: in each experiment you measure n and b meas, given s and b. σ b is assumed to be known. l In what follows I use the χ 2 to calculate a p-value (not a confidence interval)
21
Jan Conrad NuFACT 06 25 August 2006 21 Two approaches using χ 2 l Adding the uncertainty in quadrature: l Allowing for a nuisance parameter (background normalisation) and minimize with respect to the nuisance parameter:...seems to be quite common... Similar to what is used in for example: Burguet-Castell et. al.Nucl.Phys.B725:306- 326,2005 (Beta-beams at CERN SPS)
22
Jan Conrad NuFACT 06 25 August 2006 22 Coverage (type I error) Nominal χ2: what you assume is the correct null distribution Ignore/Profile/Quad. add etc: ”real” null distributions of what you call a χ2 Empirical: ”true” χ2 distribution...... to the extent you trust ROOT.....
23
Jan Conrad NuFACT 06 25 August 2006 23 What if we have only Gaussian processes ?
24
Jan Conrad NuFACT 06 25 August 2006 24 Which method is more sensitive to signal ? Power
25
Jan Conrad NuFACT 06 25 August 2006 25 Power and sensitivity ? l In most cases I saw, an average result is presented l This tells you very little about the probability that a given signal will yield a significant observation (power) Shot at ”What should sensitivity mean ?”: An experiment is sensitive to a finite value Θ of a parameter if the probability of obtaining an observation n which rejects Θ = 0 with at least significance α is at least β.
26
Jan Conrad NuFACT 06 25 August 2006 26 What is the ensemble.... l......of repeated experiments which l I should use to calculate the ”mean” (or the probability β) in the sensitivity calculation ? l I should use to calculate the coverage ?
27
Jan Conrad NuFACT 06 25 August 2006 27 My answer...... l.... both ensembles should be the same.... l Each pseudo-experiment: l has fixed true values of the prime parameter and the nuisance nuisance parameters. l yields a prime measurement (e.g. Number of observed events), yields one estimate for each nuisance parameter (e.g. background) 1) This estmate might come from auxiliary measurements in the same or other detectors or from theory. In the former case, care has to be taken that the measurement procedure is replicated as in the real experiment. In case of theoretical uncertainties, there is no real ”measurement process”. I would argue even theoretical uncertainties should be treated as there was a true value and an estimate, which we pretent is a random variable. 1) Shape and size of uncertainties known beforehand ? Otherwise generalize.....
28
Jan Conrad NuFACT 06 25 August 2006 28 Update ”what should sensitivity mean ?” An experiment is sensitive to a finite value Θ of a parameter if the probability of obtaining an observation n which rejects Θ = 0 with at least significance α is at least β. l The probability is hereby evaluated using replica of the experiment with fixed true parameter Θ and fixed nuisance parameters. The random variables in this experiment are thus the observation n and the estimates of the nuisance parameter. l The significance of the observation n hereby evaluated using replica of the experiment with fixed true parameter Θ = 0 and fixed nuisance parameters (assuming a p-value procedure, otherwise by CI)
29
Jan Conrad NuFACT 06 25 August 2006 29 l Unscientific study of 12 papers dealing with sensitivities to oscillation parameters l 0 papers seem to worry about the ensemble w.r.t which the ”mean” is calculated l 0 papers check the statistical validity of the χ 2 used l 3 papers treat systematics and write down explicitely what χ 2 used (or give enough information to reconstruct it in principle ) l 6 papers ignore the systematics or don’t say how they are included in the fit l 2 of the papers don’t say how signficance/CIs are calculated l 1 paper doesn’t even tell me the signficance level No paper is doing what I would like best, ¼ of the papers are in my opinion acceptable with some goodwill, ¾ of the papers I would reject. Binomial errors on these figures are neglected How do things look in the nuFact community ?
30
Jan Conrad NuFACT 06 25 August 2006 30 Summary/Recommendations l More is more ! l Include systematics in your calculation (or discuss why you neglect them)...not just neglect them.... l Report under which assumptions the data is generated. l Report the test statistic you are using explicetly. l What does ”mean” mean ? l I did not encounter any discussion of neither the power of your sensitivity analysis nor of the ensemble of experiments which is used for the ”average”.
31
Jan Conrad NuFACT 06 25 August 2006 31 Summary con’d l And the winner is.... l Most of the papers have been using a χ 2 fit. If you include nuisance parameters in those and minimize w.r.t them, this is equivalent to a Profile Likelihood approach for strictly Gaussian processes. Otherwise asymptotically equivalent. This approach seems to provide coverage in many even unexpected cases. l Don’t think, compute..... l Given the computer power available (and since the stakes are high), I think for sensitivity studies comparing different experimental configurations, there is no reason to stick slavely to the native χ 2 distribution instead of doing a Toy MC to construct the distribution of test statistics yourself. The thinking part is to choose the ensemble of experiments to simulate.
32
Jan Conrad NuFACT 06 25 August 2006 32 And a last one....for the customer l Best is not necessarily best. l The intuitive (and effective) result of including systematics (or doing a careful statistical analysis instead of a crude one) is to worsen the calculated sensitivity. If I was to spend XY M$ on an experiment, I would insist to understand how the sensitivity is calculated in detail. Otherwise, if anything, I would give the XY M$ to the group with the worse sensitivity but the more adequate calculation.
33
Jan Conrad NuFACT 06 25 August 2006 33 List of relevant references. l G. Feldman & R. Cousins. Phys.Rev.D57:3873-3889,1998 l THE method for confidence interval calculation l J.C et. al. Phys. Rev D67:012002,2003 l combining FC with Bayesian treatment of systematics. l J.C & F. Tegenfeldt, Proceedings PhyStat 05, Oxford, 2005, physics/0511055 l combined experiments, power calculations for CIs with Bayesian treatment of systematics l F. Tegenfeldt & J.C. Nucl. Instr. Meth.A539:407-413, 2005 l coverage of CI intervals. l L. Demortier: presented at the BIRS workshop on statistical inference problems in high energy physics and astronomy, July 15-20, 2006. l all you do want to know about p-values, but don’t dare to ask l W.Rolke, A. Lopez and J.C, Nucl. Inst. Meth A. 551 (2005) 493-503 l Profile likelihood and its coverage l K. S. Cranmer, Proceedings PhysStat 05. l Signficance calculation for the LHC l F. James, Computer Phys. Comm. 20 (1980) 29- 35 l profile likelihood without calling it that l G. Punzi, Proceedings of Phystat 2003, SLAC, Stanford (2003) l a defintion of sensitivity including power l S. Baker and R. Cousins l Likelihood and χ 2 in fits to histograms. l J. Burguet-Castell et. al. Nucl.Phys.B725:306-326,2005 l Example of a rather reasonable sensitivity calculation in neutrino physics (random pick, there are certainly others …maybe even better). l R. Barlow,.. J.C. et. al “The Banff comparison of methods to calculate Confidence Interval” l Systematic comparison of confidence interval methods, to be published beginning 2007.
34
Jan Conrad NuFACT 06 25 August 2006 34 Backups
35
Jan Conrad NuFACT 06 25 August 2006 35 What if we have 20 % unc ?
36
Jan Conrad NuFACT 06 25 August 2006 36 Added uncertainty in efficiency.
37
Jan Conrad NuFACT 06 25 August 2006 37 Requirements for χ 2 l Gauss distribution: N(s,s) l Hypothesis linear in parameters (so for example ”χ 2 ” = (n-s 2 )/s does’nt work) l Functional form for the hypothesis is correct.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.