Topic 5 - Joint distributions and the CLT Joint distributions - pages 145 - 156 Central Limit Theorem - pages 183 - 185
Joint distributions Often times, we are interested in more than one random variable at a time. For example, what is the probability that a car will have at least one engine problem and at least one blowout during the same week? X = # of engine problems in a week Y = # of blowouts in a week P(X ≥ 1, Y ≥ 1) is what we are looking for To understand these sorts of probabilities, we need to develop joint distributions.
Discrete distributions A discrete joint probability mass function is given by f(x,y) = P(X = x, Y = y) where
Return to the car example Consider the following joint pmf for X and Y P(X ≥ 1, Y ≥ 1) = P(X ≥ 1) = E(X + Y) = X\Y 1 2 3 4 1/2 1/16 1/32
Joint to marginals The probability mass functions for X and Y individually (called marginals) are given by Returning to the car example: fX(x) = fY(y) = E(X) = E(Y) =
Continuous distributions A joint probability density function for two continuous random variables, (X,Y), has the following four properties:
Continuous example Consider the following joint pdf: Show condition 2 holds on your own. Show P(0 < X < 1, ¼ < Y < ½) = 23/512
Joint to marginals The marginal pdfs for X and Y can be found by For the previous example, find fX(x) and fY(y).
Independence of X and Y The random variables X and Y are independent if f(x,y) = fX(x) fY(y) for all pairs (x,y). For the discrete clunker car example, are X and Y independent? For the continuous example, are X and Y independent?
Sampling distributions We assume that each data value we collect represents a random selection from a common population distribution. The collection of these independent random variables is called a random sample from the distribution. A statistic is a function of these random variables that is used to estimate some characteristic of the population distribution. The distribution of a statistic is called a sampling distribution. The sampling distribution is a key component to making inferences about the population.
StatCrunch example StatCrunch subscriptions are sold for 6 months ($5) or 12 months ($8). From past data, I can tell you that roughly 80% of subscriptions are $5 and 20% are $8. Let X represent the amount in $ of a purchase. E(X) = Var(X) =
StatCrunch example continued Now consider the amounts of a random sample of two purchases, X1, X2. A natural statistic of interest is X1 + X2, the total amount of the purchases. Outcomes X1 + X2 Probability X1 + X2 Probability
StatCrunch example continued E(X1 + X2) = E([X1 + X2]2) = Var(X1 + X2) =
StatCrunch example continued If I have n purchases in a day, what is my expected earnings? the variance of my earnings? the shape of my earnings distribution for large n? Let’s experiment by simulating 1000 days with 100 purchases per day. StatCrunch These are notes for this page.
Central Limit Theorem We have just illustrated one of the most important theorems in statistics. As the sample size, n, becomes large the distribution of the sum of a random sample from a distribution with mean m and variance s2 converges to a Normal distribution with mean nm and variance ns2. A sample size of at least 30 is typically required to use the CLT The amazing part of this theorem is that it is true regardless of the form of the underlying distribution.
Airplane example Suppose the weight of an airline passenger has a mean of 150 lbs. and a standard deviation of 25 lbs. What is the probability the combined weight of 100 passengers will exceed the maximum allowable weight of 15,500 lbs? How many passengers should be allowed on the plane if we want this probability to be at most 0.01?
The sample mean For constant c, E(cY) = cE(Y) and Var(cY) = c2Var(Y) The CLT says that for large samples, is approximately normal with a mean of m and a variance of s2/n. So, the variance of the sample mean decreases with n.
Sampling applet