Download presentation
Presentation is loading. Please wait.
1
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics CERN-FNAL Hadron Collider Physics Summer School CERN, 6-15 June, 2007 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
2
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 2 Outline 1 Brief overview Probability: frequentist vs. subjective (Bayesian) Statistics: parameter estimation, hypothesis tests 2 Statistical tests for Particle Physics multivariate methods for event selection goodness-of-fit tests for discovery 3 Systematic errors Treatment of nuisance parameters Bayesian methods for systematics, MCMC Friday Wednesday
3
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 3 A definition of probability Consider a set S with subsets A, B,... Kolmogorov axioms (1933) Also define conditional probability of A given B (with P(B) ≠ 0): E.g. rolling dice:
4
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 4 Interpretation of probability I. Relative frequency A, B,... are outcomes of a repeatable experiment cf. quantum mechanics, particle scattering, radioactive decay... II. Subjective probability A, B,... are hypotheses (statements that are true or false) Both interpretations consistent with Kolmogorov axioms. In particle physics frequency interpretation often most useful, but subjective probability can provide more natural treatment of non-repeatable phenomena: systematic uncertainties, probability that Higgs boson exists,...
5
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 5 Bayes’ theorem From the definition of conditional probability we have and but, so we find Bayes’ theorem: The denominator P(B) serves to ensure probabilities sum to unity; often more convenient to use:
6
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 6 Frequentist Statistics − general philosophy In frequentist statistics, probabilities are associated only with the data, i.e., outcomes of repeatable observations. Probability = limiting frequency Probabilities such as P (Higgs boson exists), P (0.117 < s < 0.121), etc. are either 0 or 1, but we don’t know which. The tools of frequentist statistics tell us what to expect, under the assumption of certain probabilities, about hypothetical repeated observations. The preferred theories (models, hypotheses,...) are those for which our observations would be considered ‘usual’.
7
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 7 Bayesian Statistics − general philosophy In Bayesian statistics, interpretation of probability extended to degree of belief (subjective probability). Use this for hypotheses: posterior probability, i.e., after seeing the data prior probability, i.e., before seeing the data probability of the data assuming hypothesis H (the likelihood) normalization involves sum over all possible hypotheses Bayesian methods can provide more natural treatment of non- repeatable phenomena: systematic uncertainties, probability that Higgs boson exists,... No golden rule for priors (“if-then” character of Bayes’ thm.)
8
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 8 Likelihood We have data: x (could be a vector, discrete or continuous) and a probability model: ( could be vector of parameters) Now evaluate the probability function using the data that we observed and treat it as a function of the parameters. This is the likelihood function: (here x is constant) For example, if we have n independent observations of a random variable x, where x ~ f (x; ), then
9
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 9 Maximum Likelihood The likelihood function plays an important role in both frequentist and Bayesian statistics. E.g., to estimate the parameter , the method of maximum likelihood (ML) says to take the value that maximizes L( ). ML and other parameter estimation methods would be a large part of a longer course on statistics — for now need to move on...
10
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 10 For each reaction we consider we will have a hypothesis for the pdf of, e.g., Statistical tests (in a particle physics context) Suppose the result of a measurement for an individual event is a collection of numbers x 1 = number of muons, x 2 = jet p t of jets, x 3 = missing energy,... follows some n-dimensional joint pdf, which depends on the type of event produced, i.e., was it etc. Often H 0 is the Standard Model, (the background hypothesis), H 1... is a signal hypothesis we are searching for
11
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 11 Selecting events Suppose we have a data sample with two kinds of events, corresponding to hypotheses H 0 and H 1 and we want to select those of type H 0. Each event is a point in space. What decision boundary should we use to accept/reject events as belonging to event type H 0 ? accept H0H0 H1H1 Probably start with cuts:
12
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 12 Other ways to select events Or maybe use some other sort of decision boundary: accept H0H0 H1H1 H0H0 H1H1 linearor nonlinear How can we do this in an ‘optimal’ way?
13
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 13 Test statistics Construct a ‘test statistic’ of lower dimension (e.g. scalar) We can work out the pdfs Goal is to compactify data without losing ability to discriminate between hypotheses. Decision boundary is now a single cut on t. This effectively divides the sample space into two regions where we either: accept H 0 (acceptance region) or reject it (critical region).
14
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 14 Significance level and power of a test Probability to reject H 0 if it is true (error of the 1st kind): (significance level) Probability to accept H 0 if H 1 is true (error of the 2nd kind): ( power)
15
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 15 Efficiency, purity, etc. Signal efficiency Background efficiency Expected number of signal events: s = s s L Expected number of background events:b = b b L Prior probabilities s, b proportional to cross sections, so for e.g. the signal purity,
16
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 16 Constructing a test statistic How can we select events in an ‘optimal way’? Neyman-Pearson lemma states: To get the lowest b for a given s (highest power for a given significance level), choose acceptance region such that where c is a constant which determines s. Equivalently, optimal scalar test statistic is N.B. any monotonic function of this is just as good.
17
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 17 Purity vs. efficiency — optimal trade-off Consider selecting n events: expected numbers s from signal, b from background; → n ~ Poisson (s + b) Suppose b is known and goal is to estimate s with minimum relative statistical error. Take as estimator: Variance of Poisson variable equals its mean, therefore → So we should maximize equivalent to maximizing product of signal efficiency purity.
18
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 18 Why Neyman-Pearson doesn’t always help The problem is that we usually don’t have explicit formulae for the pdfs Instead we may have Monte Carlo models for signal and background processes, so we can produce simulated data, and enter each event into an n-dimensional histogram. Use e.g. M bins for each of the n dimensions, total of M n cells. But n is potentially large, → prohibitively large number of cells to populate with Monte Carlo data. Compromise: make Ansatz for form of test statistic with fewer parameters; determine them (e.g. using MC) to give best discrimination between signal and background.
19
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 19 Multivariate methods Many new (and some old) methods: Fisher discriminant Neural networks Kernel density methods Support Vector Machines Decision trees Boosting Bagging New software for HEP, e.g., TMVA, Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/0703039 StatPatternRecognition, I. Narsky, physics/0507143
20
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 20 Fisher discriminant accept H0H0 H1H1 Equivalent to Neyman-Pearson if the signal and background pdfs are multivariate Gaussian with equal covariances; otherwise not optimal, but still often a simple, practical solution. Sometimes first transform data to better approximate Gaussians. Assume linear test statistic, Corresponds to a linear decision boundary. and maximize ‘separation’ between the two classes:
21
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 21 Nonlinear test statistics The optimal decision boundary may not be a hyperplane, → nonlinear test statistic accept H0H0 H1H1 Multivariate statistical methods are a Big Industry: HEP can benefit from progress in Machine Learning See recent (03 & 05) PHYSTAT proceedings, e.g., papers by J. Friedman, K. Cranmer,...
22
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 22 Neural networks: the multi-layer perceptron Use e.g. logistic sigmoid activation function, The network output is given by Define values for ‘hidden nodes’
23
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 23 Neural network example from LEP II Signal: e e → W W (often 4 well separated hadron jets) Background: e e → qqgg (4 less well separated hadron jets) ← input variables based on jet structure, event shape,... none by itself gives much separation. Neural network output does better... (Garrido, Juste and Martinez, ALEPH 96-144)
24
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 24 Probability Density Estimation (PDE) techniques See e.g. K. Cranmer, Kernel Estimation in High Energy Physics, CPC 136 (2001) 198; hep-ex/0011057; T. Carli and B. Koblitz, A multi-variate discrimination technique based on range-searching, NIM A 501 (2003) 576; hep-ex/0211019 Construct non-parametric estimators of the pdfs and use these to construct the likelihood ratio (n-dimensional histogram is a brute force example of this.) More clever estimation techniques can get this to work for (somewhat) higher dimension.
25
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 25 Kernel-based PDE (KDE, Parzen window) Consider d dimensions, N training events, x 1,..., x N, estimate f (x) with Use e.g. Gaussian kernel: kernel bandwidth (smoothing parameter) Need to sum N terms to evaluate function (slow); faster algorithms only count events in vicinity of x (k-nearest neighbor, range search).
26
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 26 Product of one-dimensional pdfs First rotate to uncorrelated variables, i.e., find matrix A such that forwe have Estimate the d-dimensional joint pdf as the product of 1-d pdfs, (here x decorrelated) This does not exploit non-linear features of the joint pdf, but simple and may be a good approximation in practical examples.
27
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 27 Decision trees A training sample of signal and background data is repeatedly split by successive cuts on its input variables. Order in which variables used based on best separation between signal and background. Example by Mini-Boone, B. Roe et al., NIM A 543 (2005) 577 Iterate until stop criterion reached, based e.g. on purity, minimum number of events in a node. Resulting set of cuts is a ‘decision tree’. Tends to be sensitive to fluctuations in training sample.
28
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 28 Boosted decision trees Boosting combines a number classifiers into a stronger one; improves stability with respect to fluctuations in input data. To use with decision trees, increase the weights of misclassified events and reconstruct the tree. Iterate → forest of trees (perhaps > 1000). For the mth tree, Define a score m based on error rate of mth tree. Boosted tree = weighted sum of the trees: Algorithms: AdaBoost (Freund & Schapire), -boost (Friedman).
29
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 29 For all methods, need to check: Sensitivitiy to statistically unimportant variables (best to drop those that don’t provide discrimination); Level of smoothness in decision boundary (sensitivity to over-training) Given the test variable, next step is e.g., select n events and estimate a cross section of signal: Multivariate analysis discussion Now need to estimate systematic error... If e.g. training (MC) data ≠ Nature, test variable is not optimal, but not necessarily biased. But our estimates of background b and efficiencies would then be biased if based on MC. (True also for ‘simple cuts’.)
30
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 30 But in a cut-based analysis it may be easier to avoid regions where untested features of MC are strongly influencing the decision boundary. Look at control samples to test joint distributions of inputs. Try to estimate backgrounds directly from the data (sidebands). Multivariate analysis discussion (2) The purpose of the statistical test is often to select objects for further study and then measure their properties. Need to avoid input variables that are correlated with the properties of the selected objects that you want to study. (Not always easy; correlations may be poorly known.)
31
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 31 Comparing multivariate methods (TMVA) Choose the best one!
32
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 32 Some multivariate analysis references Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Springer (2001); Webb, Statistical Pattern Recognition, Wiley (2002); Kuncheva, Combining Pattern Classifiers, Wiley (2004); Specifically on neural networks: L. Lönnblad et al., Comp. Phys. Comm., 70 (1992) 167; C. Peterson et al., Comp. Phys. Comm., 81 (1994) 185; C.M. Bishop, Neural Networks for Pattern Recognition, OUP (1995); John Hertz et al., Introduction to the Theory of Neural Computation, Addison-Wesley, New York (1991).
33
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 33 Wrapping up lecture 1 Now we’ve defined a way to select e.g. SUSY, Higgs,... events Next apply the method to the real data and... depending on what we see, claim (or not) a new discovery. Next lecture: How do we decide when to make this claim? How do we incorporate systematic uncertainties?
34
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 34 Extra slides
35
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 35 Some statistics books, papers, etc. R.J. Barlow, Statistics, A Guide to the Use of Statistical in the Physical Sciences, Wiley, 1989 see also hepwww.ph.man.ac.uk/~roger/book.html G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998 see also www.pp.rhul.ac.uk/~cowan/sda L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986 W. Eadie et al., Statistical and Computational Methods in Experimental Physics, North-Holland, 1971 (New 2nd ed. 2007) S. Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998 (with program library on CD) W.M. Yao et al. (Particle Data Group), Review of Particle Physics, Journal of Physics G 33 (2006) 1; see also pdg.lbl.gov sections on probability statistics, Monte Carlo
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.