Download presentation
Presentation is loading. Please wait.
1
Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 - 22 March 2002 M. J. Wang Institute of Physics Academia Sinica Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 - 22 March 2002 M. J. Wang Institute of Physics Academia Sinica
2
Preface Motivation and gratitude – Learn quite a lot at the workshop on confidence limits at Fermilab in 2000 – Thanks for hosting this conference Main title: Detect Unknown Systematic Effect – More suitable to this conference aim – Important for experimentalists – Might be able to detect it in global fit Sub-title: Diagnose bad fit to multiple data sets – Global fit is not internally consistent – Don’t know which part is wrong? – Need to diagnose the data sample
3
Outline Introduction Global fit and its goodness of fit Parameter fitting criterion Diagnose bad fit to multiple data sets Conclusion
4
Introduction Knowledge of parton distribution function is essential for hadron collider research Global fit is used to obtain parton distribution function Uncertainties of parton distribution function parameters – Precision hadron collider results require estimates of uncertainties of parton distribution function parameters – Important for Fermilab RunII and LHC physics analyses
5
Introduction Knowledge of parton distribution function is essential for hadron collider research – Interpretation of data with SM – SM parameter precision measurement – Search for beyond SM signal Global fit is used to obtain parton distribution function – Non-perturbative parton distribution functions could not be determined by PQCD – Therefore, they are determined by global fit
6
Global fit and goodness of fit Reliable parton distribution function parameter and uncertainty estimates require passing goodness of fit criterion – Total chi-square is used for goodness of fit – +/- sqrt(2N) is used as a accepted range Is total chi-square good enough for goodness of fit ? – Total chi-square is insensitive to small subset of data with bad fit Is there any way for more stringent criterion? – Need new idea
7
Parameter fitting criterion Idea motivated by Louis Lyons’s goodness of fit paradox at ACAT 2000 J.C. Collins and J. Pumplin applied this idea to the goodness of fit for global fit – Hypothesis-testing vs parameter-fitting criteria – Subset chi-square against total chi- square – Found inconsistent data sets in CTEQ5 data sets Still don‘t know which part is correct or wrong ?
8
Parameter fitting criterion – Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep- ph/0105207, p.3 )
9
Parameter fitting criterion – Subset chi-square against total chi-square( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.10 )
10
Parameter fitting criterion – Found inconsistent data sets in CTEQ5 data ( cited from J.C. Collins, J. Pumplin, hep- ph/0105207, p. 13 )
11
Diagnose bad fit to multiple data sets Importance of studying bad fit – Is the inconsistent data set free of unknown systematic effects? – Is the theoretical prediction adequate? – Is there any hint for new physics? Any statistics for the diagnose purpose? – Pull can be used to identify inconsistent experiment or data point ( thanks to F. James’s “Statistical methods in experimental physics” ) – But for real data, there is no measured pull distribution for each data point – What should we do with pull ?
12
Diagnose bad fit to multiple data sets Pull definition for each data point Mi = Ti + ( random error ) Ri = Ti - Mi = -( random error ) Pi = Ri / sigma( Ri ) Pull properties – Gaussian shape – Center at zero – With unit variance – Independence among pulls of different data points
13
Diagnose bad fit to multiple data sets Systematic effects introduce correlation among pulls – Constant shift on all data points – Correlated shift on all data points
14
Diagnose bad fit to multiple data sets Correlation among pulls is the key for detecting unknown systematic effects Pull correlation study – Pull distribution consists of all data points in one experiment( experiment pull distribution ) – Pull as a function of measurement variable X
15
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties Mi = Ti + ( random error ) + Si ( or S ) Ri = Ti - Mi = -( random error ) - Si ( or S ) Pi = Ri / sigma( Ri )
16
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( MC data vs true curve )
17
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( residual dis. of first 6 channels with 10,000 entries )
18
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( 10% uncertainty on error estimate of the first 6 channels with 10,000 entries )
19
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( pull dis. of the first 6 channels with 10,000 entries )
20
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( effect of error estimate uncertainties 0%,10%,20% on pull dis. With 10,000 entries )
21
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift ( experiment residual and pull dis. with 100,000 entries )
22
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift ( experiment residual and pull profiles as function of X with 100,000 entries )
23
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( experiment residual and pull dis. with 100 entries )
24
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 1. Constant horizontal shift( experiment residual and pull profile as function of X with 100 entries )
25
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( MC data vs true curve )
26
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( residual dis. Of the first 6 channels with 10,000 entries )
27
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )
28
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull dis. with 100,000 entries )
29
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull profile as function of X with 100,000 entries )
30
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull dis. as function of X with 100 entries )
31
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 2. Constant vertical shift( experiment residual and pull profiles as function of X with 100 entries )
32
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( MC data vs true curve )
33
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( residual dis. Of the first 6 channels with 10,000 entries )
34
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )
35
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. with 100,000 entries )
36
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100,000 entries )
37
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. as function of X with 100 entries )
38
Diagnose bad fit to multiple data sets Naive case without known systematic uncertainties – Representative systematic shifts 3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100 entries )
39
Diagnose bad fit to multiple data sets Real case with known systematic uncertainties Mi = Ti + ( random error ) + ( systematic error ) + Si ( or S ) Ri = Ti – Mi = - ( random error ) – ( systematic error ) - Si( or S ) Pi = Ri / sigma( Ri )
40
Diagnose bad fit to multiple data sets Real case with known systematic uncertainties – Need to take out known systematic uncertainty term in order to restore the independence property – Need to fit the residual systematic effect with the aid of global fit – Regain the naive case results
41
Conclusion Global fit is important in determining parton distribution function parameter and uncertainties There are inconsistent data samples found by the parameter fitting criterion Correlations among pulls could be a technique of detecting unknown systematic effects Will apply and implement this technique to global fit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.