Statistics (3) Errors Roger Barlow Manchester University IDPASC school Sesimbra 15 th December 2010.

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Sampling: Final and Initial Sample Size Determination
Sampling Distributions (§ )
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Statistics for Particle Physics: Intervals Roger Barlow Karlsruhe: 12 October 2009.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.
Lec 6, Ch.5, pp90-105: Statistics (Objectives) Understand basic principles of statistics through reading these pages, especially… Know well about the normal.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Lecture II-2: Probability Review
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Physics 270 – Experimental Physics. Standard Deviation of the Mean (Standard Error) When we report the average value of n measurements, the uncertainty.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Lab 3b: Distribution of the mean
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
A bin-free Extended Maximum Likelihood Fit + Feldman-Cousins error analysis Peter Litchfield  A bin free Extended Maximum Likelihood method of fitting.
Surveying II. Lecture 1.. Types of errors There are several types of error that can occur, with different characteristics. Mistakes Such as miscounting.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
R. Kass/W04 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
Confidence Limits and Intervals 3: Various other topics Roger Barlow SLUO Lectures on Statistics August 2006.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
Statistics (2) Fitting Roger Barlow Manchester University IDPASC school Sesimbra 14 th December 2010.
Treatment of Errors in Analysis of Particle Physics Experiments Roger Barlow Manchester University Hadron Collider Physics Summer School, Fermilab 21 st.
Stats Methods at IC Lecture 3: Regression.
PSY 626: Bayesian Statistics for Psychological Science
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Sampling Distribution Models
χ2 and Goodness of Fit & Likelihood for Parameters
Introduction, class rules, error analysis Julia Velkovska
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
The Gaussian Probability Distribution Function
Multi-dimensional likelihood
Sampling Distribution Models
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
PSY 626: Bayesian Statistics for Psychological Science
CHAPTER 14: Confidence Intervals The Basics
Discrete Event Simulation - 4
Chapter 8: Estimating with Confidence
Computing and Statistical Data Analysis / Stat 7
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Top mass measurements at the Tevatron and the standard model fits
Chapter 8: Estimating with Confidence
8. One Function of Two Random Variables
Sampling Distributions (§ )
Chapter 8: Estimating with Confidence
Statistics for HEP Roger Barlow Manchester University
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
8. One Function of Two Random Variables
Presentation transcript:

Statistics (3) Errors Roger Barlow Manchester University IDPASC school Sesimbra 15 th December 2010

Summary Central Limit Theorem Gaussian and nonGaussu an Errors from numerical ML. Many parameters Limitations Correlations Systematic Errors Checks

The Central LimitTheorem Suppose a random variable x is the sum of several independent identically (or similarly) distributed variables x 1,x 3,x 3...x N. Then (1) The mean of the distribution for x is the sum of the means: μ=μ 1 +μ 2 +μ μ N. (2) The Variance of the distribution for x is the sum of the Variances: V=V 1 +V 2 +V V N. (3) The distribution for x becomes Gaussian for large N

CLT - why? Proof is fun but a bit long – see book Illustration: Take uniform distribution in range 0 to 1.. One on its own is rectangular Two added give a triangular distribution in range 0 to 2 Three added start with a concanve parabola, switch to a convex one, flat at the top the n back down Twelve give something so Gaussian it's used to generate random numbers

IDPASC Statistics Lectures 2010 Roger Barlow Slide 5/ Thoughts on the CLT parts 1,2,3 (1) Is obvious. (2) is simple and explains 'adding errors in quadrature.' (1) and (2) do not depend on the form of the distribution (3) Explains why Gaussians are 'normal' If you find a distribution which is not Gaussian, there must be a reason Probably one contribution dominates

IDPASC Statistics Lectures 2010 Roger Barlow Slide 6/ Application of CLT If a variable has a non-Gaussian pdf you can still apply parts (1) and (2): adding variances, using combination of errors, etc. The only thing you can't do is equate deviations with confidence regions (68% within one sigma etc) However your variable is probably intermediate and will be a contribution to some final result – Gaussian by (3). So carry on Non-Gaussian distributions hold no terrors

IDPASC Statistics Lectures 2010 Roger Barlow Slide 7/ Errors from Likelihood iv) We can approximate v) Read off  from  lnL=-½ Estimate a model parameter M by maximising the likelihood In the large N limit i) This is unbiassed and efficient ii) The error is given by iii) ln L is a parabola

IDPASC Statistics Lectures 2010 Roger Barlow Slide 8/ Neat way to find Confidence Intervals Take  lnL= -½ for 68% CL (1  )  lnL=-2 for 95.4% CL (2  ) Or whatever you choose 2-sided or 1-sided

IDPASC Statistics Lectures 2010 Roger Barlow Slide 9/ For finite N None of the above are true Never mind! We could transform from M  M' where it was parabolic, find the limits, and transform back Would give  lnL=-½ for 68% CL etc as before Hence asymmetric errors Everybody does this

IDPASC Statistics Lectures 2010 Roger Barlow Slide 10/ Is it valid? Try and see with toy model (lifetime measurement) where we can do the Neyman construction For various numbers of measurements, N, normalised to unit lifetime There are some quite severe differences!

IDPASC Statistics Lectures 2010 Roger Barlow Slide 11/ More dimensions Suppose 2 uncorrelated parameters, a and b For fixed b,  lnL=-½ will give 68% CL region for a And likewise, fixing a, for b Confidence level for square is =46% Confidence level for ellipse (contour) is 39% Jointly,  lnL=-½ gives 39% CL region for 68% need  lnL=-1.15

IDPASC Statistics Lectures 2010 Roger Barlow Slide 12/ More dimensions, other limits Generalisation to correlated gaussians is straightforward Generalisation to more variables is straight forward. Need the larger  lnL 68%95%99% etc Useful to write -2  lnL=  2 Careful! Given a multidimensional Gaussian,n ln L =-  2 /2. Hence can also use Δχ 2 =1 for errors But -2  lnL obeys a  2 distribution only in the large N limit... Level is given by finding  2 such that P(  2,N)=1-CL

IDPASC Statistics Lectures 2010 Roger Barlow Slide 13/ Small N non-Gaussian measurements No longer ellipses/ellipsoids Use  lnL to define confidence regions, mapping out contours Probably not totally accurate, but universal

IDPASC Statistics Lectures 2010 Roger Barlow Slide 14/ What's the alternative? Toy Monte Carlo Have dataset Take point M in parameter space. Is it in or out of the 68% (or...) contour? Find clearly small T is 'good' Generate many MC sets of R, using M How often is T MC >T data ? If more than 68%, M is in the contour We are ordering the points by their value of T (the Likelihood Ratio) – almost contours but not quite

IDPASC Statistics Lectures 2010 Roger Barlow Slide 15/ Correlated Errors Given some f(x,y) Also matrices

IDPASC Statistics Lectures 2010 Roger Barlow Slide 16/ Example Collect N T events N F forwards N B backwards Want error on R=N F /N T Everything Poisson F and B uncorrelated F and T correlated Cov= - =V(N F )=N F using = ρ=√(N F /N T )

IDPASC Statistics Lectures 2010 Roger Barlow Slide 17/ Continued.. Using R=N F /N T Using R=N F /(N F +N B )

IDPASC Statistics Lectures 2010 Roger Barlow Slide 18/ Systematic Errors “Systematic errors arise from neglected effects such as incorrectly calibrated equipment.” Agree or disagree?

IDPASC Statistics Lectures 2010 Roger Barlow Slide 19/ Systematic Errors “Systematic errors arise from neglected effects such as incorrectly calibrated equipment.” FALSE! A neglected effect is a MISTAKE A MISTAKE is not an ERROR (as we tell the undergraduates on day1) So what are they?

IDPASC Statistics Lectures 2010 Roger Barlow Slide 20/ Systematic Errors Analysis of your results involves a whole set of numerical factors: efficiencies, magnetic fields, dimensions, calibrations... Occasionally these are implicit: these are especially dangerous All these numbers have an associated uncertainty. These uncertainties are the systematic errors. They obey all the usual error laws, but they affect all measurement

IDPASC Statistics Lectures 2010 Roger Barlow Slide 21/ Examples The magnetic field in p= 0.3 B R Calorimeter energy calibration 'Jet energy scale' Detector efficiency... If you can't think of them for your experiment, ask a colleague with a talent for destructive criticism. (There are plenty around)

IDPASC Statistics Lectures 2010 Roger Barlow Slide 22/ How it works... Effect of uncertainty in B on the error matrix for two momentum measurements Errors on p 1 and p 2 as given by simple combination of errors. Also covariance /correlation term. Errors in B effect both momentum measurements the same way

IDPASC Statistics Lectures 2010 Roger Barlow Slide 23/ More complicated... Many properties of the reconstruction don't work through simple algebra. Example: background to your signal simulated by Monte Carlo containing several (?) adjustable parameters... Work numerically. Run standard MC, then adjust parameter by +σ and repeat, -σ and repeat Read off error from shift in result If you can convince yourself that the 3 points are a straight line then do so

IDPASC Statistics Lectures 2010 Roger Barlow Slide 24/ Important guide Don't sweat the small stuff! Errors add in quadrature. Go for the biggest. Reducing small errors still further is a waste of your energy: √( )=10.20 √( )=10.05

IDPASC Statistics Lectures 2010 Roger Barlow Slide 25/ Checks Check your result by altering features which should make no (significant) difference. This adds to its credibility Run on subsets of the data (time etc) Change cuts on quality and kinematic quantities Check that a full blown analysis on simulated data returns the physics you put in Repeat until you (and you supervisor and review committee) really believe

IDPASC Statistics Lectures 2010 Roger Barlow Slide 26/ 'Significant' Repeating with some difference in technique will give a different result. You have to decide whether this is significant. “Within Errors” may be overgenerous as results share the same data (or some of it) Subtraction in quadrature is one way: Basic result 12.3 ± 0.4. Check 11.7 ± 0.5 Compare difference 0.6 against √( )=.3

IDPASC Statistics Lectures 2010 Roger Barlow Slide 27/ Results of checks If the analysis passes the check with a small difference Tick the box and move on Do not fold that small difference into the systematic error If the analysis fails the check 1) Check the check 2) Check the analysis and find the problem 3) Maybe convince yourself that this 'harmless' change could cause a systematic shift and devise an appropriate error Do not fold the difference into the systematic error

IDPASC Statistics Lectures 2010 Roger Barlow Slide 28/ Source of confusion Vary – Energy scale – Mag field – Trigger effcy – MC parameters – … – and include results in systematic errors Vary – Energy cut – Lepton quality – isolation –... – But do not include results in systematic errors Two tables – similar yet different Vary by 1 sigm a Vary by Arbitr ary amou nt

Summary Central Limit Theorem Gaussian and nonGaussu an Errors from numerical ML. Many parameters Limitations Correlations Systematic Errors Checks

IDPASC Statistics Lectures 2010 Roger Barlow Slide 30/ Final thought Statistics is a science, not an art. There is a reason for everything. Understand what you are doing and why. Cheap computing is opening many new ways of doing things. Use it! There is a lot of bad practice out there. Do not take the advice of your supervisor/senior colleague/professor as infallible.