Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Brief introduction on Logistic Regression
Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
Chapter 6 Sampling and Sampling Distributions
BA 275 Quantitative Business Methods
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Point estimation, interval estimation
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
Correlation and Regression Analysis
Simple Linear Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Relationships Among Variables
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Sung-Won Lee 1 Study of Jets Production Association with a Z boson in pp Collision at 7 and 8 TeV with the CMS Detector Kittikul Kovitanggoon Ph. D. Thesis.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Measurement Uncertainties Physics 161 University Physics Lab I Fall 2007.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
 Candidate events are selected by reconstructing a D, called a tag, in several hadronic modes  Then we reconstruct the semileptonic decay in the system.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Academic Research Academic Research Dr Kishor Bhanushali M
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Treatment of correlated systematic errors PDF4LHC August 2009 A M Cooper-Sarkar Systematic differences combining ZEUS and H1 data  In a QCD fit  In a.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Chapter 6 Sampling and Sampling Distributions
Chapter 4: Basic Estimation Techniques
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Confidence Intervals and Sample Size
Chapter 4 Basic Estimation Techniques
Linear Regression.
Basic Estimation Techniques
Introduction, class rules, error analysis Julia Velkovska
Global QCD Analysis and Collider Phenomenology — CTEQ
Regression model Y represents a value of the response variable.
BA 275 Quantitative Business Methods
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Regression Models - Introduction
Section 7.7 Introduction to Inference
Lecture 1: Descriptive Statistics and Exploratory
Top mass measurements at the Tevatron and the standard model fits
Introduction to Analytical Chemistry
Sample vs Population (true mean) (sample mean) (sample variance)
Testing Causal Hypotheses
Regression Models - Introduction
Presentation transcript:

Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham March 2002 M. J. Wang Institute of Physics Academia Sinica Advanced Statistical Techniques in Particle Physics Grey College, Durham March 2002 M. J. Wang Institute of Physics Academia Sinica

Preface Motivation and gratitude – Learn quite a lot at the workshop on confidence limits at Fermilab in 2000 – Thanks for hosting this conference Main title: Detect Unknown Systematic Effect – More suitable to this conference aim – Important for experimentalists – Might be able to detect it in global fit Sub-title: Diagnose bad fit to multiple data sets – Global fit is not internally consistent – Don’t know which part is wrong? – Need to diagnose the data sample

Outline Introduction Global fit and its goodness of fit Parameter fitting criterion Diagnose bad fit to multiple data sets Conclusion

Introduction Knowledge of parton distribution function is essential for hadron collider research Global fit is used to obtain parton distribution function Uncertainties of parton distribution function parameters – Precision hadron collider results require estimates of uncertainties of parton distribution function parameters – Important for Fermilab RunII and LHC physics analyses

Introduction Knowledge of parton distribution function is essential for hadron collider research – Interpretation of data with SM – SM parameter precision measurement – Search for beyond SM signal Global fit is used to obtain parton distribution function – Non-perturbative parton distribution functions could not be determined by PQCD – Therefore, they are determined by global fit

Global fit and goodness of fit Reliable parton distribution function parameter and uncertainty estimates require passing goodness of fit criterion – Total chi-square is used for goodness of fit – +/- sqrt(2N) is used as a accepted range Is total chi-square good enough for goodness of fit ? – Total chi-square is insensitive to small subset of data with bad fit Is there any way for more stringent criterion? – Need new idea

Parameter fitting criterion Idea motivated by Louis Lyons’s goodness of fit paradox at ACAT 2000 J.C. Collins and J. Pumplin applied this idea to the goodness of fit for global fit – Hypothesis-testing vs parameter-fitting criteria – Subset chi-square against total chi- square – Found inconsistent data sets in CTEQ5 data sets Still don‘t know which part is correct or wrong ?

Parameter fitting criterion – Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep- ph/ , p.3 )

Parameter fitting criterion – Subset chi-square against total chi-square( cited from J.C. Collins, J. Pumplin, hep-ph/ , p.10 )

Parameter fitting criterion – Found inconsistent data sets in CTEQ5 data ( cited from J.C. Collins, J. Pumplin, hep- ph/ , p. 13 )

Diagnose bad fit to multiple data sets Importance of studying bad fit – Is the inconsistent data set free of unknown systematic effects? – Is the theoretical prediction adequate? – Is there any hint for new physics? Any statistics for the diagnose purpose? – Pull can be used to identify inconsistent experiment or data point ( thanks to F. James’s “Statistical methods in experimental physics” ) – But for real data, there is no measured pull distribution for each data point – What should we do with pull ?

Diagnose bad fit to multiple data sets Pull definition for each data point Mi = Ti + ( random error ) Ri = Ti - Mi = -( random error ) Pi = Ri / sigma( Ri ) Pull properties – Gaussian shape – Center at zero – With unit variance – Independence among pulls of different data points

Diagnose bad fit to multiple data sets Systematic effects introduce correlation among pulls – Constant shift on all data points – Correlated shift on all data points

Diagnose bad fit to multiple data sets Correlation among pulls is the key for detecting unknown systematic effects Pull correlation study – Pull distribution consists of all data points in one experiment( experiment pull distribution ) – Pull as a function of measurement variable X

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties Mi = Ti + ( random error ) + Si ( or S ) Ri = Ti - Mi = -( random error ) - Si ( or S ) Pi = Ri / sigma( Ri )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( MC data vs true curve )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( residual dis. of first 6 channels with 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( 10% uncertainty on error estimate of the first 6 channels with 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( pull dis. of the first 6 channels with 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( effect of error estimate uncertainties 0%,10%,20% on pull dis. With 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift ( experiment residual and pull dis. with 100,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift ( experiment residual and pull profiles as function of X with 100,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( experiment residual and pull dis. with 100 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( experiment residual and pull profile as function of X with 100 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( MC data vs true curve )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( residual dis. Of the first 6 channels with 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull dis. with 100,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull profile as function of X with 100,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull dis. as function of X with 100 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull profiles as function of X with 100 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( MC data vs true curve )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( residual dis. Of the first 6 channels with 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. with 100,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100,000 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. as function of X with 100 entries )

Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100 entries )

Diagnose bad fit to multiple data sets Real case with known systematic uncertainties Mi = Ti + ( random error ) + ( systematic error ) + Si ( or S ) Ri = Ti – Mi = - ( random error ) – ( systematic error ) - Si( or S ) Pi = Ri / sigma( Ri )

Diagnose bad fit to multiple data sets Real case with known systematic uncertainties – Need to take out known systematic uncertainty term in order to restore the independence property – Need to fit the residual systematic effect with the aid of global fit – Regain the naive case results

Conclusion Global fit is important in determining parton distribution function parameter and uncertainties There are inconsistent data samples found by the parameter fitting criterion Correlations among pulls could be a technique of detecting unknown systematic effects Will apply and implement this technique to global fit