Practical Statistics for Physicists

Slides:



Advertisements
Similar presentations
Experimental Measurements and their Uncertainties
Advertisements

DO’S AND DONT’S WITH LIKELIHOODS Louis Lyons Oxford (CDF)
Lecture 6 CS5661 Pairwise Sequence Analysis-V Relatedness –“Not just important, but everything” Modeling Alignment Scores –Coin Tosses –Unit Distributions.
1 χ 2 and Goodness of Fit Louis Lyons IC and Oxford SLAC Lecture 2’, Sept 2008.
1 Practical Statistics for Physicists Sardinia Lectures Oct 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
1 Practical Statistics for Physicists LBL January 2008 Louis Lyons Oxford
1 Practical Statistics for Physicists Dresden March 2010 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
Introduction to experimental errors
1 Practical Statistics for Physicists SLUO Lectures February 2007 Louis Lyons Oxford
1 Practical Statistics for Particle Physicists CERN Academic Lectures October 2006 Louis Lyons Oxford
Probability and Probability Distributions
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Parameters and Statistics Probabilities The Binomial Probability Test.
1 Practical Statistics for Physicists Stockholm Lectures Sept 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Chapter 5 Sampling Distributions
1 Probability and Statistics  What is probability?  What is statistics?
1 If we can reduce our desire, then all worries that bother us will disappear.
Modeling and Simulation CS 313
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Lab 3b: Distribution of the mean
Statistics Lectures: Questions for discussion Louis Lyons Imperial College and Oxford CERN Summer Students July
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
Definition A random variable is a variable whose value is determined by the outcome of a random experiment/chance situation.
1 Practical Statistics for Physicists CERN Latin American School March 2015 Louis Lyons Imperial College and Oxford CMS expt at LHC
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
INTRODUCTORY LECTURE 3 Lecture 3: Analysis of Lab Work Electricity and Measurement (E&M)BPM – 15PHF110.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Summer Students July 2014.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
R.Kass/F02 P416 Lecture 1 1 Lecture 1 Probability and Statistics Introduction: l The understanding of many physical phenomena depend on statistical and.
1 Practical Statistics for Physicists CERN Summer Students July 2014 Louis Lyons Imperial College and Oxford CMS expt at LHC
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
THE NORMAL DISTRIBUTION
Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.
Theoretical distributions: the Normal distribution.
Probability distributions and likelihood
Louis Lyons Imperial College
Practical Statistics for Physicists
Introductory Statistics and Data Analysis
CHAPTER 6 Random Variables
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Statistical Methods used for Higgs Boson Searches
Supplemental Lecture Notes
Statistical Modelling
Lecture 1 Probability and Statistics
χ2 and Goodness of Fit & Likelihood for Parameters
Probability and Statistics for Particle Physics
Practical Statistics for Physicists
Chapter 1 ІМќ Discrete Probability Distribution: Binomial Distribution
BAYES and FREQUENTISM: The Return of an Old Controversy
Stat 31, Section 1, Last Time Sampling Distributions
Discrete Probability Distributions
Chapter 5 Joint Probability Distributions and Random Samples
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Probability distributions
Introduction to Instrumentation Engineering
7.5 The Normal Curve Approximation to the Binomial Distribution
Chapter 5 Sampling Distributions
Lecture 2 Binomial and Poisson Probability Distributions
12/12/ A Binomial Random Variables.
Mathematical Foundations of BME Reza Shadmehr
Chapter 8: Binomial and Geometric Distributions
Presentation transcript:

Practical Statistics for Physicists Louis Lyons Imperial College and Oxford CMS expt at LHC l.lyons@physics.ox.ac.uk TAE School Benasque, Sept 2016

Books Statistics for Nuclear and Particle Physicists Cambridge University Press, 1986 Available from CUP Errata in these lectures

Topics 1) Introduction 2) Bayes and Frequentism 3) Likelihoods 4) Search for New Physics (e.g. Higgs boson) Time for discussion

Probability and Statistics Why uncertainties? Introductory remarks What is Statistics? Probability and Statistics Why uncertainties? Random and systematic uncertainties Combining uncertainties Combining experiments Binomial, Poisson and Gaussian distributions

What do we do with Statistics? Parameter Determination (best value and range) e.g. Mass of Higgs = 80  2 Goodness of Fit Does data agree with our theory? Hypothesis Testing Does data prefer Theory 1 to Theory 2? (Decision Making What experiment shall I do next?) Why bother? HEP is expensive and time-consuming so Worth investing effort in statistical analysis  better information from data

Probability and Statistics Example: Dice Given P(5) = 1/6, what is Given 20 5’s in 100 trials, P(20 5’s in 100 trials)? what is P(5)? And its unceretainty? If unbiassed, what is Given 60 evens in 100 trials, P(n evens in 100 trials)? is it unbiassed? Or is P(evens) =2/3? THEORY DATA DATA THEORY

Probability and Statistics Example: Dice Given P(5) = 1/6, what is Given 20 5’s in 100 trials, P(20 5’s in 100 trials)? what is P(5)? And its uncertainty? Parameter Determination If unbiassed, what is Given 60 evens in 100 trials, P(n evens in 100 trials)? is it unbiassed? Goodness of Fit Or is P(evens) =2/3? Hypothesis Testing N.B. Parameter values not sensible if goodness of fit is poor/bad

Why do we need uncertainties? Affects conclusion about our result e.g. Result / Theory = 0.970 If 0.970 ± 0.050, data compatible with theory If 0.970 ± 0.005, data incompatible with theory If 0.970 ± 0.7, need better experiment Historical experiment at Harwell testing General Relativity

Random + Systematic Uncertainties Random/Statistical: Limited accuracy, Poisson counts Spread of answers on repetition (Method of estimating) Systematics: May cause shift, but not spread e.g. Pendulum g = 4π2L/2,  = T/n Statistical uncertainties: T, L Systematics: T, L Calibrate: Systematic  Statistical More systematics: Formula for undamped, small amplitude, rigid, simple pendulum Might want to correct to g at sea level: Different correction formulae Ratio of g at different locations: Possible systematics might cancel. Correlations relevant

Presenting result Quote result as g ± σstat ± σsyst Or combine uncertainties in quadrature  g ± σ Other extreme: Show all systematic contributions separately Useful for assessing correlations with other measurements Needed for using: improved outside information, combining results using measurements to calculate something else.

Combining uncertainties z = x - y δz = δx – δy [1] Why σz2 = σx2 + σy2 ? [2]

Combining uncertainties z = x - y δz = δx – δy [1] Why σz2 = σx2 + σy2 ? [2] [1] is for specific δx, δy Could be so on average ? N.B. Mneumonic, not proof 2) σz2 = δz2 = δx2 + δy2 – 2 δx δy = σx2 + σy2 provided…………..

3) Averaging is good for you: N measurements xi ± σ [1] xi ± σ or [2] xi ± σ/√N ? 4) Tossing a coin: Score 0 for tails, 2 for heads (1 ± 1) After 100 tosses, [1] 100 ± 100 or [2] 100 ± 10 ? 0 100 200 Prob(0 or 200) = (1/2)99 ~ 10-30 Compare age of Universe ~ 1018 seconds (Prob ~ 1% for 100 people doing it 1/μsec since Big Bang)

Rules for different functions Linear: z = k1x1 + k2x2 + ……. σz = k1 σ1 & k2 σ2 & means “combine in quadrature” N.B. Fractional errors NOT relevant e.g. z = x – y z = your height x = position of head wrt moon y = position of feet wrt moon x and y measured to 0.1% z could be -50 kms

Rules for different functions 2) Products and quotients z = xα yβ……. σz/z = α σx/x & β σy/y Useful for x2, xy, x/√y,…….

3) Anything else: z = z(x1, x2, …..) σz = ∂z/∂x1 σ1 & ∂z/∂x2 σ2 & ……. OR numerically: z0 = z(x1, x2, x3….) z1 = z(x1+σ1, x2, x3….) z2 = z(x1, x2+ σ2, x3….) σz = (z1-z0) & (z2-z0) & …. N.B. All formulae approx (except 1)) – assumes small uncertainties

N.B. Better to combine data! BEWARE 100±10 2±1? 1±1 or 50.5±5?

Difference between averaging and adding Isolated island with conservative inhabitants How many married people ? Number of married men = 100 ± 5 K Number of married women = 80 ± 30 K Total = 180 ± 30 K Wtd average = 99 ± 5 K CONTRAST Total = 198 ± 10 K GENERAL POINT: Adding (uncontroversial) theoretical input can improve precision of answer Compare “kinematic fitting”

Binomial Distribution Fixed N independent trials, each with same prob of success p What is prob of s successes? e.g. Throw dice 100 times. Success = ‘6’. What is prob of 0, 1,…. 49, 50, 51,… 99, 100 successes? Effic of track reconstrn = 98%. For 500 tracks, prob that 490, 491,...... 499, 500 reconstructed. Ang dist is 1 + 0.7 cosθ? Prob of 52/70 events with cosθ > 0 ? (More interesting is statistics question)

Ps = N! ps (1-p) N-s , as is obvious (N-s)! s! Expected number of successes = ΣsPs = Np, as is obvious Variance of no. of successes = Np(1-p) Variance ~ Np, for p~0 ~ N(1-p) for p~1 NOT Np in general. NOT s ±√s e.g. 100 trials, 99 successes, NOT 99 ± 10

Statistics: Estimate p and σp from s (and N) p = s/N σp2 = 1/N s/N (1 – s/N) If s = 0, p = 0 ± 0 ? If s = 1, p = 1.0 ± 0 ? Limiting cases: ● p = const, N ∞: Binomial  Gaussian μ = Np, σ2 = Np(1-p) ● N ∞, p0, Np = const: Binomial  Poisson μ = Np, σ2 = Np {N.B. Gaussian continuous and extends to -∞}

Binomial Distributions

Poisson Distribution Prob of n independent events occurring in time t when rate is r (constant) e.g. events in bin of histogram NOT Radioactive decay for t ~ τ Limit of Binomial (N∞, p0, Npμ) Pn = e-r t (r t)n /n! = e -μ μn/n! (μ = r t) <n> = r t = μ (No surprise!) σ 2n = μ “n ±√n” BEWARE 0 ± 0 ? μ∞: Poisson Gaussian, with mean = μ, variance =μ Important for χ2

For your thought Poisson Pn = e -μ μn/n! P0 = e–μ P1 = μ e–μ P2 = μ2 /2 e-μ For small μ, P1 ~ μ, P2 ~ μ2/2 If probability of 1 rare event ~ μ, why isn’t probability of 2 events ~ μ2 ?

Poisson Distributions Approximately Gaussian

Gaussian or Normal Significance of σ i) RMS of Gaussian = σ (hence factor of 2 in definition of Gaussian) ii) At x = μ±σ, y = ymax/√e ~0.606 ymax (i.e. σ = half-width at ‘half’-height) iii) Fractional area within μ±σ = 68% iv) Height at max = 1/(σ√2π)

Area in tail(s) of Gaussian 0.002

Relevant for Goodness of Fit