3.1 Sums of Random Variables probability of z = x + y

mean and variance of z = x + y skewness and kurtosis of z = x + y standard deviation of the average bias of the experimental variance discrete pdf for z = x + y convolution of discrete pdfs extension to continuous pdfs central limit theorem 3.1 : 1/15

2 Probability of Sums Assume the function, z = x + y, where x and y are statistically independent measurements. For generality assume that all three random variables have different pdfs - f(x), g(y) and h(z). Since x and y are statistically independent the probability of observing any specific values x' and y' is given by a product. It is not possible to equate this product to p(z') nor h(z'), because there can be multiple combinations of x and y which yield z'. However, integration of the pdfs does result in an equality. This last expression does not yield the functional form of h(z), but it does permit the determination of the moments of h(z). 3.1 : 2/15

3 Mean and Variance of Sums
The mean is determined by replacing z with (x + y) and h(z)dz with f(x)dx·g(y)dy. Variance is determined by an identical approach. 3.1 : 3/15

4 Skewness and Kurtosis of Sums
Using the same procedure as that for variance, it can be shown that skewness adds, while the coefficient of skewness does not add. Instead it approaches zero as the number of summed random variables increases. Again using the same procedure it can be seen that the kurtosis does not add. In this case the coefficient of kurtosis approaches three as the number of summed random variables increases. These two facts indicate that the functional form of h(z) does not have a simple relationship to the functional forms of f(x) and g(y). 3.1 : 4/15

5 Standard Deviation of the Average
The variance of the experimental average can be computed by propagating variance. The average involves the summation of N random variables all coming from the same pdf. Let this pdf have a standard deviation of s. see slide 3.0-2 This is true for any pdf that has a finite variance (thus it is not true for the Lorentzian density). This process of reducing noise is called signal averaging. That is, 3.1: 5/15

6 Bias of the Experimental Variance
The variance can be written as a constant, 1/N, times a sum of random x2 terms, minus a random term. Proof: 3.1 : 6/15

7 Bias of the Experimental Variance
Variance computed this way is a dependent random variable. If the computation were unbiased, the mean of s2 would be s2. Take the mean of the preceding expression for s2 remembering that the expectation value of y = x2 is given by mx2 + sx2. To remove the bias, the equation for s2 needs to be multiplied by the constant, N/(N-1). The result is the well-known equation for the experimental variance. 3.1 : 6/15

8 Bias of the Experimental Variance
The origin of this bias can be understood qualitatively from the use of the experimental mean in evaluation of the experimental standard deviation. Every time the experiment is repeated and the experimental mean determined, the experimental mean is always shifted “closer” toward the set of x values, resulting in a calculated value for the variance less than the true value. True variance evaluated about the true mean is greater than the experimental variance evaluated locally about the experimental mean by a factor of N/(N-1). x x 3.1 : 7/15

9 PDF of Discrete Sums Consider the sum of two discrete random variables, z = x + y, having the associated probabilities, p(x), q(y) and r(z). It is tempting to write the probability of z as a product, r(z) = p(x)q(y). This simple product is not correct because it does not take into account all the different values of x and y that add up to one particular z. A suitable expression would involve a sum of products. The problem with this equation is the lack of an explicit way to combine only those terms summing to one particular z. This goal is easier to reach by summing over only one random variable. Do this by recognizing that y = z - x. The sum is evaluated for each value of the random variable z. This process is called convolution. 3.1 : 8/15

10 Discrete Sum Example (1)
Consider an experiment where a coin is tossed until a head appears. The number of required tosses is the random variable. Two such experiments are done with the required tosses added, where 1  x  ∞, 1  y  ∞, and 2  z  ∞. What is the pdf of the sum, z = x + y, given that p(x) = 0.5x and q(y) = 0.5y? Start with the convolution summation. The hardest part is determining the upper and lower limits of the summation. The lower limit is determined by p(x), while the upper limit is determined by q(z-x). In neither case may the equation variables extend beyond the range of the random variables. Since z = x + y, the upper limit of x for a given value of z arises when y = 1. In this case a lower limit of 1 and an upper limit of z-1 for x keep the functions within their ranges. 3.1 : 9/15

11 Discrete Sum Example (2)
The two graphs at the right are "experimental" results for tossing a coin until a head appears. The dependent random variable, z, is the sum of two such experiments, z = x + y. The histogram for 10,000 random x values is shown along with the pdf, f(x) = 0.5x. The histogram for y is indistinguishable from that for x. The histogram for z is also shown along with the pdf, h(z) = (z - 1)0.5z. Moments: 3.1 : 10/15

12 PDF of Continuous Sums The pdf for the sum of two continuous random variables, z = x + y, can be written by analogy to the discrete case. This is called the convolution integral. Again, it is often difficult to work out the limits of integration that will keep the arguments of both f(x) and g(y) within the range of the random variables. Convolution is found throughout analytical chemistry. in chromatography the detector width is convolved with the peak in electronic instruments, the RC time constant is convolved with the measured voltage in absorption spectroscopy the monochromator slit widths are convolved with the spectrum Graphical convolution (demonstrated later) can often be used to solve the convolution integral without resorting to calculus. 3.1 : 11/15

13 Continuous Sum Example (1)
Consider a chemical reaction composed of three, sequential first order reactions: A  B  C  D. The time that each molecule of D is produced is determined by three exponential random times, tAB, tBC, and tCD. So that the calculus can be easily followed in class, let the rate constant for each reaction be 1. Thus, fAB(t) = fBC(t) = fCD(t) = e-t, and 0  t  ∞. Determine the pdf for the sum of the first two times, t', using the convolution integral and tBC = t' - tAB. The result is a gamma density. Finally, determine the sum, T = t' + tCD. Solve for h(T) using a second convolution integral. The result is another gamma density! 3.1 : 12/15

14 Continuous Sum Example (2)
The two graphs at the right are "experimental" results. The top graph is the time required for A to convert to B, tAB, while the bottom graph is the overall time required for A to convert to D, T = tAB + tBC + tCD. The histogram for 10,000 random tAB values is shown along with the pdf, f(tAB) = exp(-tAB). The histogram for T is also shown along with the pdf, h(T) = 0.5T 2 exp(-T ). Moments: 3.1 : 13/15

15 The Central Limit Theorem
The Central Limit Theorem states that the sum of random variables, each having a finite variance, will have a pdf that approaches a normal distribution. The theorem is taken in the limit that the number of terms in the sum approaches infinity. This simplified proof relies on moments asymptotically approaching normal values. For the following let N be the number of terms in the sum, my = Nmx sy = N1/2sx RSDy = RSDx/N1/2 As the sum grows the mean increases and the pdf becomes relatively narrower. Skewness and kurtosis control the shape. a3(y) = a3(x)/N1/2 a3(y)  0 as N  ∞ a4(y) = [(a4(x) - 3)/N] a4(y)  3 as N  ∞ The shape of the sum pdf approaches normal as N  ∞. 3.1 : 14/15

16 Central Limit Theorem Examples
A) Sum of 10 normal values: mx = 0, sx = 1, a3(x) = 0, a4(x) = 3. Sum: msum = 0, ssum = 101/2 = 3.16, a3(sum) = 0, a4(sum) = 3. B) Sum of 10 uniform values: mx = 0.5, sx = (1/12)1/2, a3(x) = 0, a4(x) = 9/5. Sum: msum = 5, ssum = (10/12)1/2 = 0.913, a3(sum) = 0, a4(sum) = 2.88. C) Sum of 10 exponential values: mx = 1, sx = 1, a3(x) = 2, a4(x) = 9. Sum: msum = 10, ssum = 101/2 = 3.16, a3(sum) = 0.623, a4(sum) = 3.6. A B C The solid line is a normal pdf using msum and ssum. 3.1 : 15/15

