Reduction of Variables in Parameter Inference

Slides:



Advertisements
Similar presentations
The Bayesian Effects in measurement of Asymmetry of Poisson Traffic Flows S.Bityukov (IHEP,Protvino),N.Krasnikov (INR RAS,Moscow), A.Kuznetsov(NEU,Boston),V.Smirnova.
Advertisements

Sampling: Final and Initial Sample Size Determination
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.
Statistical Image Modelling and Particle Physics Comments on talk by D.M. Titterington Glen Cowan RHUL Physics PHYSTAT05 Glen Cowan Royal Holloway, University.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Review of normal distribution. Exercise Solution.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Irakli Chakaberia Final Examination April 28, 2014.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
1 Iterative dynamically stabilized (IDS) method of data unfolding (*) (*arXiv: ) Bogdan MALAESCU CERN PHYSTAT 2011 Workshop on unfolding.
Chapter 8: Confidence Intervals based on a Single Sample
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
The KOSMOSHOWS What is it ? The statistic inside What it can do ? Future development Demonstration A. Tilquin (CPPM)
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
1 Probability and Statistics Confidence Intervals.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
September 10, 2002M. Fechner1 Energy reconstruction in quasi elastic events unfolding physics and detector effects M. Fechner, Ecole Normale Supérieure.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Confidential and Proprietary Business Information. For Internal Use Only. Statistical modeling of tumor regrowth experiment in xenograft studies May 18.
Probability and Statistics for Particle Physics Javier Magnin CBPF – Brazilian Center for Research in Physics Rio de Janeiro - Brazil.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
1. 2 Old Efficiency Curve This is not an Apples to Apples comparison: ● SM PYTHIA includes off-shell Z, also allows inclusive decay of second Z.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
MECH 373 Instrumentation and Measurements
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Statistical Estimation
Chapter 7. Classification and Prediction
Data Assimilation Research Testbed Tutorial
Oliver Schulte Machine Learning 726
χ2 and Goodness of Fit & Likelihood for Parameters
M. Kuhn, P. Hopchev, M. Ferro-Luzzi
Ex1: Event Generation (Binomial Distribution)
Muon momentum scale calibration with J/y peak
The Maximum Likelihood Method
Sílvia Bravo i Gallart IFAE, Barcelona Moriond-QCD, 18 March 2002
Multi-dimensional likelihood
Chapter 3 Component Reliability Analysis of Structures.
Roberto Battiti, Mauro Brunato
Making Statistical Inferences
Unfolding Problem: A Machine Learning Approach
1 Department of Engineering, 2 Department of Mathematics,
p0 life time analysis: general method, updates and preliminary result
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
1 Department of Engineering, 2 Department of Mathematics,
Discrete Event Simulation - 4
Filtering and State Estimation: Basic Concepts
1 Department of Engineering, 2 Department of Mathematics,
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
CS 188: Artificial Intelligence
3.1 Sums of Random Variables probability of z = x + y
3.0 Functions of One Random Variable
Computing and Statistical Data Analysis / Stat 7
Power and Sample Size I HAVE THE POWER!!! Boulder 2006 Benjamin Neale.
Sampling Distributions (§ )
More Monte Carlo Methods
Unfolding with system identification
CLAS 12 Status of the test bench Calculation of the time resolution
Lesson 9: Basic Monte Carlo integration
Fractional-Random-Weight Bootstrap
Classical regression review
Presentation transcript:

Reduction of Variables in Parameter Inference Günter Zech, Universität Siegen Motivation: Parameter fitting from multidimensional histograms often suffers from statistical difficulties due to low numbers of events per bin. (Relevant if data have to be compared to a Monte Carlo simulation and therefore a simple likelihood fit is not possible.) Goal: Reduce the dimensionality without loss of information Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Historical example Determination of V/A coupling in t-decay at PETRA reaction: distribution: 1 parameter, 6 variables, about 30 events with 3 bins per variable we get about 2 events / bin (A simple likelihood fit was not applicable due to acceptance corrections by Monte Carlo simulation.) Some groups fitted the distribution. Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Simple case: 2 random variables, 1 linear parameter Define new variables: We get The only relevant variable is u (The analytic expression of g(u,v|q) is not required!) The generalization to more than 2 variables is trivial Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Example: Experimental data xi,yi,ziui MC: generate x,y,z  u Perform a likelihood fit to a superposition of the two MC distributions of u Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Nonlinear parameter dependence Linearize, approximate by Taylor expansion at first estimate q0 of q, fit Dq Several parameters We need one variable per parameter (makes only sense if initially the number of variables is larger than the number of parameters) Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Can we do any better? Approximate a sufficient statistic Example: distorted lifetime distribution (exponential) Mean value of experimental data is still approximatively sufficient. Compute relation between observed and true value by Monte Carlo simulation. [Full detector simulation for t0  t0‘ Reweight MC events  t(t‘)] Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Monte Carlo  curve Data  tobserved + error  estimated t + error Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Approximate likelihood estimate pdf: (x, q could be multidimensional) ignore acceptance and resolution effects and determine parameters + errors from a likelihood fit to the the observed data generate Monte Carlo events for loop , re-weight events by and perform likelihood fit  correct experimental value Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Remarks: The fit of the experimental data to the uncorrected pdf provides an approximate estimate for the parameters. Other sufficient statistics may be used, which do not require a likelihood fit. In some cases where the resolution is bad the pdf may be undefined for some experimental values of x. Shifting or scaling of data helps. For more than 2 parameters it is tedious to determine the relation between true and observed parameter values. In case acceptance and resolution effects are very large, we may have to take them into account. How? Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Acceptance effects Acceptance effects do not necessarily spoil the method. Example: The mean value of lifetimes remains a sufficient statistic when the exponential is truncated at large times. Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen General case (only losses, no resolution effects): a(x) = acceptance Likelihood: The last term is a constant and can be discarded. The integrated acceptance A(q) has to be estimated by a Monte Carlo simulation. (Table or approximated by an analytic expression) The acceptance estimate may be crude. Approximations reduce the precision but do not bias the result. The simulation q(qobseved) takes care of everything. Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Resolution effects Can normally be neglected (remember: approximation do not bias the result) When non-negligible: Perform binning-free unfolding (see my SLAC contribution) Do a likelihood fit with the unfolded data simulate complete procedure with MC (may require some CPU power.) Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Approximate estimators for linear and quadratic pdfs (in case acceptance and resolution effects are small) p.d.f.: Asume a=a0+a, b=b0+b, f f0(x)=f(x |a0,b0) a, b small Neglect quadratic terms in a, b (very fast, could be used online) Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Summary Method 1: Reduction of variables The Number of variables can be reduced to the number of parameters. This simplifies a likelihood inference of the parameters if the number of parameters is less than the number of variables. Goodnes-of-fit can be applied to the new variable(s) (simplifies g.o.f.) Acceptance and resolution effects can be taken into account in a similar way as in the second method. (has not been demonstrated) Phystat2005, Oxford G. Zech, Universitaet Siegen

Phystat2005, Oxford G. Zech, Universitaet Siegen Method 2: Use of an approximatly sufficient statistic or likelihood estimate No large resolution and acceptance effects: Perform fit with uncorrected data and undistorted likelihood function. Acceptance losses but small distortions: Compute global acceptance by MC and include in the likelihood function. Stong resolution effects: Perform crude unfolding. All approximations are corrected by the Monte Carlo simulation. The loss in precision introduced by the approximations is usually completely negligible. Phystat2005, Oxford G. Zech, Universitaet Siegen