Presentation is loading. Please wait.

Presentation is loading. Please wait.

G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum ATLAS Week Physics Plenary CERN, 23 June, 2011 Glen Cowan,

Similar presentations


Presentation on theme: "G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum ATLAS Week Physics Plenary CERN, 23 June, 2011 Glen Cowan,"— Presentation transcript:

1 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum ATLAS Week Physics Plenary CERN, 23 June, 2011 Glen Cowan, Eilam Gross, Kyle Cranmer* * on behalf of the ATLAS Statistics Forum

2 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 2 Outline Recent issues concerning upper limits Update to recommendation for PCL New software for low background analyses Interactions with the CMS Statistics Group Provisional agreement for summer conferences Longer-term issues

3 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 3 Setting Limits There are several methods one may use for setting limits: One-sided (frequentist), e.g., PCL, CLs Unified intervals (Feldman-Cousins) Bayesian In ATLAS, we have recommended using Power-Constrained Limits (PCL) and also to report CLs limits to allow for comparison with CMS. This recommendation was adopted by Physics Coordination after the Statistics Workshop held on 15 April 2011, and will be revisited at the upcoming PC meeting on 27 June 2011.

4 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 4 PCL Quick Review (see arXiv:1105.3166) Consider a parameter μ proportional to rate of signal (μ ≥ 0). “Naive” upper limits can exclude parameter values to which one has little or no sensitivity (for s << b, exclusion prob ~ 5%). CLs solves this by effectively penalizing the test of each parameter value by an amount that varies continuously with the sensitivity; result is a limit with coverage probability > 95%. PCL addresses the problem by regarding μ to be excluded if: (a)It is excluded by a statistical test at 95% CL. (b) One has sufficient sensitivity to μ. Here sensitivity is measured by the power M 0 (μ) of a test of μ with respect to the background-only alternative. I.e. require

5 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 5 PCL in practice PCL with M min = 0.16 Here power below threshold; do not exclude. median limit (unconstrained) +/  1σ band of limit dist. assuming μ = 0. observed limit Important to report both the constrained and unconstrained limits.

6 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 6 Choice of minimum power Choice of M min is convention. Formally it should be large relative to 1 – CL (5%). Earlier we have proposed because in case of x ~ Gauss(μ,σ) this means that one applies the power constraint if the observed limit fluctuates down by one standard deviation or more. For the Gaussian example, this gives μ min = 0.64σ, i.e., the lowest limit is similar to the intrinsic resolution of the measurement (σ). We have recently revisited this point and now propose moving the minimum power to M min = 0.5, i.e., PCL never goes below the median limit under assumption of background only.

7 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 7 Aggressive conservatism It could be that owing to practical constraints, certain systematic uncertainties are over-estimated in an analysis; this could be justified by wanting to be conservative. The consequence of this will be that the +/  1 sigma bands of the unconstrained limit are broader than they otherwise would be. If the unconstrained limit fluctuates low, it could be that the PCL limit, constrained at the  1sigma band, is lower than it would be had the systematics been estimated correctly. Being conservative could be more aggressive. If the power constraint M min is at 0.5, then by inflating the systematics the median of the unconstrained limit is expected to move less, and in any case upwards, i.e., it will lead to a less strong limit (as one would expect from “conservatism”).

8 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 8 Upper limits for Gaussian problem measurement → (unknown) true value →

9 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 9 Coverage probability for Gaussian problem (unknown) true value → P(μ ≤ μ up | μ) →

10 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 10 PCL summary of recent developments Proposal to move minimum power from 16% to 50%. Power constraint applied at the median limit. Improvement of approximations used for low-count analyses. New code available (see Statistics Forum twiki): https://twiki.cern.ch/twiki/bin/view/AtlasProtected/ StatisticsTools. Substantial improvement in speed. Substantial progress on documentation, including background on method and implementation details (see twiki).

11 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 11 New frequentist limit document https://twiki.cern.ch/twiki/pub/AtlasProtected/ StatisticsTools/Frequentist_Limit_Recommendation.pdf

12 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 12 New usage details on twiki Example from twiki of how to determine whether asymptotic formulae are valid. The new scripts implement the appropriate procedures for different regimes, e.g., asymptotic, b 10. https://twiki.cern.ch/twiki/bin/view/AtlasProtected/ FrequentistLimitRecommendationImplementation

13 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 13 Interactions with the CMS Statistics Group Interaction between ATLAS and CMS statistics groups began already several years ago in the context of the Higgs combination; this effort continues in the separate LHC Higgs Combination Group: In addition, the meetings between the ATLAS and CMS Statistics Groups have increased this year with the goal of agreeing on statistical tools and practice to facilitate comparison and eventual combination of results. ATLAS: G. Cowan, E. Gross, K. Cranmer, O. Vitells, W. Murray CMS: R. Cousins, L. Lyons, L. Demortier, T. Dorigo

14 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 14 Discussion on Limits with CMS Within CMS it has been recommended to use at least one of the three methods mentioned in the PDG Statistics Review: Bayesian CLs Feldman-Cousins In ATLAS, we have recommended using Power-Constrained Limits (PCL) and also to report CLs limits to allow for comparison with CMS. In recent meetings with CMS we have listed the mathematical properties of the various limits and on these we essentially agree. There is some disagreement on the importance that one should attach to different properties.

15 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 15 Properties of Frequentist Limits (1) One-sided (PCL, CLs) versus unified (Feldman-Cousins) Exclude parameter values because predicted rate higher than data, or because prediction ≠ data on other grounds (e.g., likelihood ratio wrt two-sided alternative). Coverage Substantial over-coverage for CLs and upper edge of F-C. Exact for full interval of F-C. Exact for PCL in region of sensitivity; 100% otherwise. Flip-flopping Violation of coverage if decision to report limit or two-sided interval is based on data. Not problem for F-C; OK for one-sided limits if one agrees to always report upper limit for searches (also should report p-value of background-only hypothesis, p 0 ). for one-sided limit can be avoided if for every search always report upper limit and discovery significance.

16 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 16 Properties of Frequentist Limits (2) Avoiding exclusion in cases with little/no sensitivity. PCL Discontinuous separation of (in)sensitive regions. CLs Ratio of p-values → penalty against low sensitivity. F-C Counts prob. of upwards fluctuation for upper limit. Power (related to median limit under background-only hypothesis). PCL Most powerful for region with sensitivity; zero otherwise. CLs Less powerful than PCL F-C upper edge as limit less powerful than PCL, CLs, but full interval also has power relative to higher values of μ. Correspondence with Bayesian result for some prior CLs yes; F-C yes (approx.); PCL no.

17 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 17 Properties of Frequentist Limits (3) Negatively biased relevant subsets Related to conditional coverage probability given that outcome is observed in some identifiable subset of data space. PCL, CLs, F-C do not have NBRS. If also condition on m, all methods have (adapted) NBRS. Familiarity in HEP community CLs widely used. F-C used for many problems but not often as a replacement for upper limits. PCL is new but core concepts are textbook statistics and documentation now greatly improved: arXiv:1105.3166 and info on method and implementation on https://twiki.cern.ch/twiki/bin/view/AtlasProtected/ StatisticsTools.

18 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 18 Areas where ATLAS and CMS agree Both collaborations support RooStats as the software tool for combinations. See, e.g., K. Cranmer talk at PHYSTAT 2011: https://indico.cern.ch/conferenceOtherViews.py?view=standard &confId=107747 Within both collaborations there are many who support the Bayesian approach, especially for limits (see, e.g., talks by A. Harel and D. Casadei at PHYSTAT 2011): Recent effort in ATLAS to establish recommendations for Bayesian limits (Georgios Choudalakis, Diego Casadei). Within both ATLAS and CMS there exist different views on unfolding, with a strong tendency away from use of bin-by-bin factors. (See e.g. talks by G. Choudalakis and M. Weber from PHYSTAT 2011).

19 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 19 Discussions on Discovery with CMS The two collaborations broadly agree on how to report the significance of a discovery. The test statistic recommended in ATLAS coincides with the Feldman-Cousins approach for testing the background-only model. There is also support in both collaborations for an approximate correction for the Look-Elsewhere Effect using the approach of Gross and Vitells (EPJC 70 (2010) 525, arXiv:1005.1891; arXiv:1105.4355). And there is no controversy if analyses correct for LEE exactly (e.g., floating-mass Higgs search), as long as the uncorrected (e.g., fixed-mass) discovery significance is also reported. Both collaborations have made some progress in studying Bayesian Model Selection using Bayes Factors (ongoing).

20 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 20 Summary and conclusions (1) PCL solves problem of “spurious exclusion” by separating the parameter space into regions in which one has/hasn’t sufficient sensitivity as given by the probability to reject μ if background- only model is true. Recommendations for ATLAS: Report unconstrained limit. Report power constrained limit (with power M 0 (μ) ≥ 0.5). Report p-value of background-only hypothesis. Also report CLs. In problems with low background, recent improvement to software implementation related to treatment of nuisance params. ATLAS also has ongoing effort to establish recommendations for Bayesian limits (Georgios Choudalakis, Diego Casadei). new

21 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 21 Summary and conclusions (2) Discussions with the CMS Statistics Group are ongoing. Goal is to agree on statistical tools and practice to facilitate comparison and eventual combination of results. Broad agreement in a number of areas but still non-trivial issues concerning limits: one-sided vs. unified PCL vs. CLs We essentially agree on the mathematical properties of the approaches; debate is on relative importance of various properties. Provisional agreement to use CLs as basis for comparison; in longer term Bayesian limit may play this role.

22 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 22 Extra slides

23 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 23 Some reasons to consider increasing M min M min is supposed to be “substantially” greater than α (5%). So M min = 16% is fine for 1 – α = 95%, but if we ever want 1 – α = 90%, then16% is not “large” compared to 10%; μ min = 0.28σ starts to look small relative to the intrinsic resolution of the measurement. Not an issue if we stick to 95% CL. PCL with M min = 16% is often substantially lower than CLs. This is because of the conservatism of CLs (see coverage). But goal is not to get a lower limit per se, rather ● to use a test with higher power in those regions where one feels there is enough sensitivity to justify exclusion and ● to allow for easy communication of coverage (95% for μ ≥ μ min ; 100% otherwise).

24 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 24 A few further considerations Obtaining PCL requires the distribution of unconstrained limits, from which one finds the M min (16%, 50%) percentile. In some analyses this can entail calculational issues that are expected to be less problematic for M min = 50% than for 16%. Analysts produce anyway the median limit, even in absence of the error bands, so with M min = 50% the burden on the analyst is reduced somewhat (but one would still want the error bands). We therefore recently proposed moving M min to 50%.

25 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 25 Treatment of nuisance parameters In most problems, the data distribution is not uniquely specified by μ but contains nuisance parameters θ. This makes it more difficult to construct an (unconstrained) interval with correct coverage probability for all values of θ, so sometimes approximate methods used (“profile construction”). More importantly for PCL, the power M 0 (μ) can depend on θ. So which value of θ to use to define the power? Since the power represents the probability to reject μ if the true value is μ = 0, to find the distribution of μ up we take the values of θ that best agree with the data for μ = 0: May seem counterintuitive, since the measure of sensitivity now depends on the data. We are simply using the data to choose the most appropriate value of θ where we quote the power.

26 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 26 ATLAS/CMS discussions on one-sided limits Some prefer to report one-sided frequentist upper limits (CLs, PCL); others prefer unified (Feldman-Cousins) limits, where the lower edge may or may not exclude zero. The prevailing view in the ATLAS Statistics Forum has been that in searches for new phenomena, one wants to know whether a cross section is excluded on the basis that its predicted rate is too high relative to the observation, not excluded on some other grounds (e.g., a mixture of too high or too low). Among statisticians there is support for both approaches.

27 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 27 Discussions concerning flip-flopping One-sided limits (CLs, PCL) can suffer from “flip-flopping”, i.e., violation of coverage probability if one decides, based on the data, whether to report an upper limit or a measurement with error bars (two-sided interval). This can be avoided by “always” reporting: (1) An upper limit based on a one-sided test. (2) The discovery significance (equivalent to p-value of background-only hypothesis). In practice, “always” can mean “for every analysis carried out as a search”, i.e., until the existence of the process is well established (e.g., 5σ). I.e. we only require what is done in practice to map approximately onto the idealized infinite ensemble.

28 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 28 Discussions on CLs and F-C CLs has been criticized as a method for preventing spurious exclusion as it leads to significant overcoverage that is in practice not communicated to the reader. This was the motivation behind PCL. We have also not supported using the upper edge of a Feldman- Cousins interval as a substitute for a one-sided upper limit, since when used in this way F-C has lower power. Furthermore F-C unified intervals protect against small (or null) intervals by counting the probability of upward data fluctuations, which are not relevant if the goal is to establish an upper limit.

29 G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 29 Discussions concerning PCL PCL has been criticized as it does not obviously map onto a Bayesian result for some choice of prior (CLs = Bayesian for special cases, e.g., x ~ Gauss(μ, σ), constant prior for μ ≥ 0). We are not convinced of the need for this. The frequentist properties of PCL are well defined, and as with all frequentist limits one should not interpret them as representing Bayesian credible intervals. Further criticism of PCL is related to an unconstrained limit that could exclude all values of μ. A remnant of this problem could survive after application of the power constraint (cf. “negatively biased relevant subsets”). PCL does not have negatively biased relevant subsets (nor does our unconstrained limit, as it never excludes μ = 0). On both points, debate still ongoing.


Download ppt "G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum ATLAS Week Physics Plenary CERN, 23 June, 2011 Glen Cowan,"

Similar presentations


Ads by Google