Sept , 2005M. Block, Phystat 05, Oxford PHYSTAT 05 - Oxford 12th - 15th September 2005 Statistical problems in Particle Physics, Astrophysics and Cosmology “Sifting data in the real world” Martin Block Northwestern University
Sept , 2005M. Block, Phystat 05, Oxford “Sifting Data in the Real World”, M. Block, arXiv:physics/ (2005). “Fishing” for Data
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford Generalization of the Maximum Likelihood Function
Sept , 2005M. Block, Phystat 05, Oxford Hence,minimize i (z), or equivalently, we minimize 2 i 2 i
Sept , 2005M. Block, Phystat 05, Oxford Problem with Gaussian Fit when there are Outliers
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford Robust Feature: w(z) 1/ i 2 for large i 2
Sept , 2005M. Block, Phystat 05, Oxford Lorentzian Fit used in “Sieve” Algorithm
Sept , 2005M. Block, Phystat 05, Oxford Why choose normalization constant =0.179 in Lorentzian 0 2 ? Computer simulations show that the choice of =0.179 tunes the Lorentzian so that minimizing 0 2, using data that are gaussianly distributed, gives the same central values and approximately the same errors for parameters obtained by minimizing these data using a conventional 2 fit. If there are no outliers, it gives the same answers as a 2 fit. Hence, using the tuned Lorentzian 0 2, much like using the Hippocratic oath, does “no harm”.
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford “Sieve’’ Algorithm: SUMMARY
Sept , 2005M. Block, Phystat 05, Oxford All cross section data for E cms > 6 GeV, pp and pbar p, from Particle Data Group
Sept , 2005M. Block, Phystat 05, Oxford All data (Real/Imaginary of forward scattering amplitude), for E cms > 6 GeV, pp and pbar p, from Particle Data Group
Sept , 2005M. Block, Phystat 05, Oxford We use real analytical amplitudes that saturate the Froissart bound with the term ln 2 ( /m), where is the laboratory energy and m is the proton (pion) mass. We simultaneously fit the cross section and (the ratio of the real to the imaginary portion of the forward scattering amplitude), where: Fitting the “Sieved” pp and p data with analytic amplitudes
Sept , 2005M. Block, Phystat 05, Oxford Only 3 Free Parameters However, only 2, c 1 and c 2, are needed in cross section fits !
Sept , 2005M. Block, Phystat 05, Oxford Cross section model fits for E cms > 6 GeV, anchored at 4 GeV, pp and pbar p, after applying “Sieve” algorithm to Real World data
Sept , 2005M. Block, Phystat 05, Oxford -value fits for E cms > 6 GeV, anchored at 4 GeV, pp and pbar p, after applying “Sieve” algorithm
Sept , 2005M. Block, Phystat 05, Oxford What the “Sieve” algorithm accomplished for the pp and pbar p data Before imposing the “Sieve algorithm: 2 /d.f.=5.7 for 209 degrees of freedom; Total 2 = After imposing the “Sieve” algorithm: Renormalized 2 /d.f.=1.09 for 184 degrees of freedom, for 2 i > 6 cut; Total 2 = Probability of fit ~0.2. The 25 rejected points contributed 981 to the total 2, an average 2 i of ~39 per point. Similar results were found when fitting + p and - p data from the Particle Data Group (not shown due to lack of time!)
Sept , 2005M. Block, Phystat 05, Oxford Cross section and -value predictions for pp and pbar-p The errors are due to the statistical uncertainties in the fitted parameters LHC prediction Cosmic Ray Prediction
Sept , 2005M. Block, Phystat 05, Oxford 100 data points, gaussianly distributed on the straight line y=1-2x; 20 noise points, randomly distributed, with 2 i >6. After 2 i >6 cut: Best fit is y= x; R 2 min / =1.01; fit to all data has 2 min / =4.8
Sept , 2005M. Block, Phystat 05, Oxford 100 data points, gaussianly distributed about the constant y=10; 40 noise points, randomly distributed, with 2 i >4. After 2 i >4 cut: Best fit is y=9.98 R 2 min / =1.09; fit to all data has 2 min / =4.39.
Sept , 2005M. Block, Phystat 05, Oxford Lessons learned from computer studies of a straight line and a constant model where is the parameter error found in the 2 fit
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford 2 renorm = 2 obs / R -1 renorm = r 2 obs, where is the parameter error
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford 100 data points, gaussianly distributed about the parabola y=1+2x +0.5x 2 ; 35 noise points, randomly distributed about nearby parabola y=12+2x+0.2x 2 ; We have 13 “inliers”. After 2 i >6 cut: 113 points are kept; Best fit is y= x+0.48x 2 BONUS: Seems to also work reasonably well in separating two similar distributions! What happens when we try to separate two similar distributions?
Sept , 2005M. Block, Phystat 05, Oxford log 2 ( /m p ) fit compared to log( /m p ) fit: All known n-n data
Sept , 2005M. Block, Phystat 05, Oxford p log 2 ( /m) fit, compared to the p even amplitude fit M. Block and F. Halzen, Phys Rev D 70, , (2004)
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford
Sept , 2005M. Block, Phystat 05, Oxford 2 renorm = 2 obs / R -1 renorm = r 2 obs, where is the parameter error
Sept , 2005M. Block, Phystat 05, Oxford All cross section data for E cms > 6 GeV, + p and - p, from Particle Data Group
Sept , 2005M. Block, Phystat 05, Oxford All data (Real/Imaginary of forward scattering amplitude), for E cms > 6 GeV, + p and - p, from Particle Data Group
Sept , 2005M. Block, Phystat 05, Oxford Cross section model fits for E cms > 6 GeV, anchored at 2.6 GeV, + p and - p, after applying “Sieve” algorithm to Real World data
Sept , 2005M. Block, Phystat 05, Oxford -value fits for E cms > 6 GeV, anchored at 2.6 GeV, + p and - p, after applying “Sieve” algorithm