Download presentation
Presentation is loading. Please wait.
1
2.3 Estimating PDFs and PDF Parameters
estimating means - discrete and continuous estimating variance using a known mean estimating variance with an estimated mean estimating a discrete pdf estimating a continuous pdf estimating a pdf with a known functional form 2.3 : 1/11
2
Estimating a Mean with Finite Data
Consider the experiment where two dice are rolled and the blue value is subtracted from the red. The experiment is repeated 10 times yielding the following data: {2,0,1,-4,-3,0,-3,3,-2,-1}. Determine the frequency, f(x), of observing each possible outcome. f(-5) = 0 f(-4) = 1 f(-3) = 2 f(-2) = 1 f(-1) = 1 f(0) = 2 f(1) = 1 f(2) = 1 f(3) = 1 f(4) = 0 f(5) = 0 Write the estimated mean, m' , using an estimated probability, p'=f(x)/N, and the expectation value formalism. The true mean, m, will be the value of m' taken in the limit. 2.3 : 2/11
3
The Arithmetic Average
Start with the fraction from the previous page and convert each multiplication into a sum, e.g. (-3)2 = (-3)+(-3). Note that this expression is exactly the same as that obtained from an arithmetic average (data listed in the order measured). With the arithmetic average, the probability of each measured value is estimated as p = 1/N. Taking the average in the limit as N ∞ is mathematically identical to computing the expectation value of the random variable! Note that the first sum is over the possible outcomes, while the second sum is over the data set. 2.3 : 3/11
4
Example for Rolling Two Dice
How well does the measured average recover the true mean of the pdf? Two dice are rolled with the blue value subtracted from the red value. What is the pdf for the average when different numbers of rolls are used in the computation? (n = 10, 100, 1000; N is 10,000). The uncertainty in the estimation of m is given by the width of the pdf. As the number of replicates (rolls) used to compute the average increases, the width of the pdf decreases. Theory states that the width should decrease a factor of 10 going from 10 rolls to 1000 rolls. This expectation is substantiated by the graphs. 2.3 : 4/11
5
Estimating Variance It is tempting to employ the same strategy with variance that worked in the limit with the mean. Case 1: the mean is known This approximation works quite well in the limit. Case 2: the mean is estimated by the arithmetic average This does not work. The result is biased because of the uncertainty in the average. The bias is eliminated by multiplying by N/(N-1).* *We will prove this later. 2.3 : 5/11
6
Example for Rolling Two Dice
What is the pdf for the biased and unbiased variance when different numbers of rolls are used in the computation? Note that s2 = 5.83 variance variance variance variance variance variance When the number of rolls is large, the two s2 have a similar pdf i.e. N - 1 N. 2.3 : 6/11
7
Estimating a Discrete PDF (1)
A pdf with an unknown functional form can be estimated by performing a large number of measurements and estimating the probability of each expected outcome. How many measurements are necessary? start by choosing the minimum probability that needs to be estimated and the desired precision of that estimation treat the observation of that outcome as a binomial pdf, where p is the probability of observing the outcome and q is the probability of observing all other outcomes (the binomial parameter, n, will be 1) use the fact that where s is the standard deviation for one trial of the binomial pdf, and is the standard deviation after averaging N trials (note that q 1) 2.3 : 7/11
8
Estimating a Discrete PDF (2)
An initial guess at the required number of trials might be m(1/p) where m is an integer and p is the minimum probability to be estimated. For the example of rolling two dice, -5 and +5 had the smallest probability, 1/36. Use the equation on the previous page and let N = m(1/p). With m = 1 the standard deviation is equal to p (which is too large an uncertainty). Larger values of m will improve the estimate. For the die roll example choose m = This means that 10036 trials need to be made. The graph at the right shows this result. blue is theory and red is the average of 3,600 trials 2.3 : 8/11
9
Estimating a Continuous PDF (1)
Estimation of the shape for a continuous, unknown PDF requires that the data be binned. To demonstrate this, the graph at the right contains data from an exponential pdf that are not binned. The primary difficulty with a continuous random variable is estimating the bin size. To do this compute the minimum observed value and maximum observed value. For small numbers of events, an initial estimate for the number of bins might be one tenth the number of observations. 2.3 : 9/11
10
Estimating a Continuous PDF (2)
Bin width will determine the resolution of the estimated pdf. For large numbers of replicates the resolution, Dt, and precision of the probability, sp, are traded against each other. The following graphs show 1,000 events examined with two bin widths. In the left graph the counts are known to good precision (RSD = 6%), but the resolution is poor, Dt = 1.8 ns. In the right graph the resolution of the pdf is higher, Dt = 0.37 ns, but the precision is worse, RSD = 12% . 2.3 : 10/11
11
Estimation of a Known Function
When the data come from a random process with a known pdf, the shape of the pdf can be estimated using moments. As an example, suppose 20 photons arise from an exponential decay. The average of the observation time for the set of photons can be used as an estimate of t. The following graph shows a 5-ns decay along with the estimated function using two random sets of 20 photons. The red line is the true pdf, while the blue and green lines are computed from the mean. Even for 20 photons the pdf can be estimated much better with moments than using a histogram. This approach requires knowing the functional form of the pdf! 2.3 : 11/11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.