Introduction to Statistics − Day 2

Slides:



Advertisements
Similar presentations
Computing and Statistical Data Analysis / Stat 4
Advertisements

Statistics review of basic probability and statistics.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Review of Basic Probability and Statistics
Probability Densities
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
Statistics.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
K. Desch – Statistical methods of data analysis SS10
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Introduction to Statistics − Day 2
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Lecture II-2: Probability Review
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
1 Probability and Statistics  What is probability?  What is statistics?
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis YETI IPPP Durham Young Experimentalists and.
Uniovi1 Some distributions Distribution/pdfExample use in HEP BinomialBranching ratio MultinomialHistogram with fixed N PoissonNumber of events found UniformMonte.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
880.P20 Winter 2006 Richard Kass Binomial Probability Distribution For the binomial distribution P is the probability of m successes out of N trials. Here.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 1: Introduction 清华大学高能物理研究中心 2010 年 4 月 12—16 日 Glen Cowan.
G. Cowan Computing and Statistical Data Analysis / Stat 2 1 Computing and Statistical Data Analysis Stat 2: Catalogue of pdfs London Postgraduate Lectures.
1 Introduction to Statistical Methods for High Energy Physics Glen Cowan 2006 CERN Summer Student Lectures CERN Summer Student Lectures on Statistics Glen.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis RWTH Aachen Graduiertenkolleg February, 2007.
Computer simulation Sep. 9, QUIZ 2 Determine whether the following experiments have discrete or continuous out comes A fair die is tossed and the.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Statistical Data Analysis: Lecture 4 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 11 Statistical Methods for Particle Physics Lecture 1: probability, random variables, MC.
1 Introduction to Statistical Methods for High Energy Physics Glen Cowan 2005 CERN Summer Student Lectures CERN Summer Student Lectures on Statistics Glen.
Sampling and Sampling Distributions
Probability and Statistics for Particle Physics
Ex1: Event Generation (Binomial Distribution)
Appendix A: Probability Theory
Lectures on Statistical Data Analysis
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Lecture 4 1 Probability (90 min.)
Goodness-of-Fit Tests
Unfolding Problem: A Machine Learning Approach
Chapter 7 Random Number Generation
Chapter 7 Random-Number Generation
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Lecture 2 – Monte Carlo method in finance
Computing and Statistical Data Analysis / Stat 8
Computing and Statistical Data Analysis Stat 3: The Monte Carlo Method
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 1: Fundamentals
Terascale Statistics School DESY, February, 2018
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
Computing and Statistical Data Analysis / Stat 6
Computing and Statistical Data Analysis / Stat 7
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Unfolding with system identification
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Presentation transcript:

Introduction to Statistics − Day 2 Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability densities The Monte Carlo method. Lecture 3 Statistical tests Fisher discriminants, neural networks, etc Goodness-of-fit tests Lecture 4 Parameter estimation Maximum likelihood and least squares Interval estimation (setting limits) → Glen Cowan CERN Summer Student Lectures on Statistics

Some distributions Distribution/pdf Example use in HEP Binomial Branching ratio Multinomial Histogram with fixed N Poisson Number of events found Uniform Monte Carlo method Exponential Decay time Gaussian Measurement error Chi-square Goodness-of-fit Cauchy Mass of resonance Landau Ionization energy loss Glen Cowan CERN Summer Student Lectures on Statistics

Binomial distribution Consider N independent experiments (Bernoulli trials): outcome of each is ‘success’ or ‘failure’, probability of success on any given trial is p. Define discrete r.v. n = number of successes (0 ≤ n ≤ N). Probability of a specific outcome (in order), e.g. ‘ssfsf’ is But order not important; there are ways (permutations) to get n successes in N trials, total probability for n is sum of probabilities for each permutation. Glen Cowan CERN Summer Student Lectures on Statistics

Binomial distribution (2) The binomial distribution is therefore random variable parameters For the expectation value and variance we find: Glen Cowan CERN Summer Student Lectures on Statistics

Binomial distribution (3) Binomial distribution for several values of the parameters: Example: observe N decays of W±, the number n of which are W→mn is a binomial r.v., p = branching ratio. Glen Cowan CERN Summer Student Lectures on Statistics

Multinomial distribution Like binomial but now m outcomes instead of two, probabilities are For N trials we want the probability to obtain: n1 of outcome 1, n2 of outcome 2,  nm of outcome m. This is the multinomial distribution for Glen Cowan CERN Summer Student Lectures on Statistics

Multinomial distribution (2) Now consider outcome i as ‘success’, all others as ‘failure’. → all ni individually binomial with parameters N, pi for all i One can also find the covariance to be Example: represents a histogram with m bins, N total entries, all entries independent. Glen Cowan CERN Summer Student Lectures on Statistics

Poisson distribution Consider binomial n in the limit → n follows the Poisson distribution: Example: number of scattering events n with cross section s found for a fixed integrated luminosity, with Glen Cowan CERN Summer Student Lectures on Statistics

Uniform distribution Consider a continuous r.v. x with -∞ < x < ∞ . Uniform pdf is: N.B. For any r.v. x with cumulative distribution F(x), y = F(x) is uniform in [0,1]. Example: for p0 → gg, Eg is uniform in [Emin, Emax], with Glen Cowan CERN Summer Student Lectures on Statistics

Exponential distribution The exponential pdf for the continuous r.v. x is defined by: Example: proper decay time t of an unstable particle (t = mean lifetime) Lack of memory (unique to exponential): Glen Cowan CERN Summer Student Lectures on Statistics

Gaussian distribution The Gaussian (normal) pdf for a continuous r.v. x is defined by: (N.B. often m, s2 denote mean, variance of any r.v., not only Gaussian.) Special case: m = 0, s2 = 1 (‘standard Gaussian’): If y ~ Gaussian with m, s2, then x = (y - m) /s follows  (x). Glen Cowan CERN Summer Student Lectures on Statistics

Gaussian pdf and the Central Limit Theorem The Gaussian pdf is so useful because almost any random variable that is a sum of a large number of small contributions follows it. This follows from the Central Limit Theorem: For n independent r.v.s xi with finite variances si2, otherwise arbitrary pdfs, consider the sum In the limit n → ∞, y is a Gaussian r.v. with Measurement errors are often the sum of many contributions, so frequently measured values can be treated as Gaussian r.v.s. Glen Cowan CERN Summer Student Lectures on Statistics

Central Limit Theorem (2) The CLT can be proved using characteristic functions (Fourier transforms), see, e.g., SDA Chapter 10. For finite n, the theorem is approximately valid to the extent that the fluctuation of the sum is not dominated by one (or few) terms. Beware of measurement errors with non-Gaussian tails. Good example: velocity component vx of air molecules. OK example: total deflection due to multiple Coulomb scattering. (Rare large angle deflections give non-Gaussian tail.) Bad example: energy loss of charged particle traversing thin gas layer. (Rare collisions make up large fraction of energy loss, cf. Landau pdf.) Glen Cowan CERN Summer Student Lectures on Statistics

Multivariate Gaussian distribution Multivariate Gaussian pdf for the vector are column vectors, are transpose (row) vectors, For n = 2 this is where r = cov[x1, x2]/(s1s2) is the correlation coefficient. Glen Cowan CERN Summer Student Lectures on Statistics

Chi-square (c2) distribution The chi-square pdf for the continuous r.v. z (z ≥ 0) is defined by n = 1, 2, ... = number of ‘degrees of freedom’ (dof) For independent Gaussian xi, i = 1, ..., n, means mi, variances si2, follows c2 pdf with n dof. Example: goodness-of-fit test variable especially in conjunction with method of least squares. Glen Cowan CERN Summer Student Lectures on Statistics

Cauchy (Breit-Wigner) distribution The Breit-Wigner pdf for the continuous r.v. x is defined by (G = 2, x0 = 0 is the Cauchy pdf.) E[x] not well defined, V[x] →∞. x0 = mode (most probable value) G = full width at half maximum Example: mass of resonance particle, e.g. r, K*, f0, ... G = decay rate (inverse of mean lifetime) Glen Cowan CERN Summer Student Lectures on Statistics

Landau distribution For a charged particle with b = v /c traversing a layer of matter of thickness d, the energy loss D follows the Landau pdf: D b + - + - - + - + d L. Landau, J. Phys. USSR 8 (1944) 201; see also W. Allison and J. Cobb, Ann. Rev. Nucl. Part. Sci. 30 (1980) 253. Glen Cowan CERN Summer Student Lectures on Statistics

Landau distribution (2) Long ‘Landau tail’ → all moments ∞ Mode (most probable value) sensitive to b , → particle i.d. Glen Cowan CERN Summer Student Lectures on Statistics

The Monte Carlo method What it is: a numerical technique for calculating probabilities and related quantities using sequences of random numbers. The usual steps: (1) Generate sequence r1, r2, ..., rm uniform in [0, 1]. (2) Use this to produce another sequence x1, x2, ..., xn distributed according to some pdf f (x) in which we’re interested (x can be a vector). (3) Use the x values to estimate some property of f (x), e.g., fraction of x values with a < x < b gives → MC calculation = integration (at least formally) MC generated values = ‘simulated data’ → use for testing statistical procedures Glen Cowan CERN Summer Student Lectures on Statistics

Random number generators Goal: generate uniformly distributed values in [0, 1]. Toss coin for e.g. 32 bit number... (too tiring). → ‘random number generator’ = computer algorithm to generate r1, r2, ..., rn. Example: multiplicative linear congruential generator (MLCG) ni+1 = (a ni) mod m , where ni = integer a = multiplier m = modulus n0 = seed (initial value) N.B. mod = modulus (remainder), e.g. 27 mod 5 = 2. This rule produces a sequence of numbers n0, n1, ... Glen Cowan CERN Summer Student Lectures on Statistics

Random number generators (2) The sequence is (unfortunately) periodic! Example (see Brandt Ch 4): a = 3, m = 7, n0 = 1 ← sequence repeats Choose a, m to obtain long period (maximum = m - 1); m usually close to the largest integer that can represented in the computer. Only use a subset of a single period of the sequence. Glen Cowan CERN Summer Student Lectures on Statistics

Random number generators (3) are in [0, 1] but are they ‘random’? Choose a, m so that the ri pass various tests of randomness: uniform distribution in [0, 1], all values independent (no correlations between pairs), e.g. L’Ecuyer, Commun. ACM 31 (1988) 742 suggests a = 40692 m = 2147483399 Far better algorithms available, e.g. RANMAR, period See F. James, Comp. Phys. Comm. 60 (1990) 111; Brandt Ch. 4 Glen Cowan CERN Summer Student Lectures on Statistics

The transformation method Given r1, r2,..., rn uniform in [0, 1], find x1, x2,..., xn that follow f (x) by finding a suitable transformation x (r). Require: i.e. That is, set and solve for x (r). Glen Cowan CERN Summer Student Lectures on Statistics

Example of the transformation method Exponential pdf: Set and solve for x (r). → works too.) Glen Cowan CERN Summer Student Lectures on Statistics

The acceptance-rejection method Enclose the pdf in a box: (1) Generate a random number x, uniform in [xmin, xmax], i.e. r1 is uniform in [0,1]. (2) Generate a 2nd independent random number u uniformly distributed between 0 and fmax, i.e. (3) If u < f (x), then accept x. If not, reject x and repeat. Glen Cowan CERN Summer Student Lectures on Statistics

Example with acceptance-rejection method If dot below curve, use x value in histogram. Glen Cowan CERN Summer Student Lectures on Statistics

Monte Carlo event generators Simple example: e+e- → m+m- Generate cosq and f: Less simple: ‘event generators’ for a variety of reactions: e+e- → m+m-, hadrons, ... pp → hadrons, D-Y, SUSY,... e.g. PYTHIA, HERWIG, ISAJET... Output = ‘events’, i.e., for each event we get a list of generated particles and their momentum vectors, types, etc. Glen Cowan CERN Summer Student Lectures on Statistics

Monte Carlo detector simulation Takes as input the particle list and momenta from generator. Simulates detector response: multiple Coulomb scattering (generate scattering angle), particle decays (generate lifetime), ionization energy loss (generate D), electromagnetic, hadronic showers, production of signals, electronics response, ... Output = simulated raw data → input to reconstruction software: track finding, fitting, etc. Predict what you should see at ‘detector level’ given a certain hypothesis for ‘generator level’. Compare with the real data. Estimate ‘efficiencies’ = #events found / # events generated. Programming package: GEANT Glen Cowan CERN Summer Student Lectures on Statistics

Wrapping up lecture 2 We’ve looked at a number of important distributions: Binomial, Multinomial, Poisson, Uniform, Exponential Gaussian, Chi-square, Cauchy, Landau, and we’ve seen the Monte Carlo method: calculations based on sequences of random numbers, used to simulate particle collisions, detector response. So far, we’ve mainly been talking about probability. But suppose now we are faced with experimental data. We want to infer something about the (probabilistic) processes that produced the data. This is statistics, the main subject of the next two lectures. Glen Cowan CERN Summer Student Lectures on Statistics