Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 15: Statistics and Their Distributions, Central Limit Theorem

Similar presentations


Presentation on theme: "Lecture 15: Statistics and Their Distributions, Central Limit Theorem"— Presentation transcript:

1 Lecture 15: Statistics and Their Distributions, Central Limit Theorem
Devore, Ch

2 Topics The Concept of a “Statistic”
Independent, Identically Distributed (iid) Samples Deriving Sampling Distribution of Statistic By Probability Rules By Simulation Application – Tolerances Distribution of the Sample Mean / Total Central Limit Theorem Distribution of a Linear Combination

3 I. Concepts of a “Statistic”
Consider taking two samples of size n from the same population distribution. A: 30.7, 29.4,  Mean 30.4 B: 28.8, 30.0,  Mean 29.97 Which group has the larger mean? Propositions The uncertainty of individual values xi when sampling from a population distribution implies a r.v. This uncertainty further implies that any statistic calculated from the population distribution also varies from sample to sample.

4 Example: Minitab Suppose X ~ Weibull (shape= 2, scale = 5)
E(X) = ; V(X) = 5.365 Using Minitab generate samples of 10 and observe differences in mean and variance. Results shown are from Devore, p

5 Point Estimates / Sampling Distributions
Point Estimate – value for a sample statistic from a particular sample. Statistic – rv whose value may be calculated from a sample of data -- lowercase letter indicates the calculated or observed value of the statistic. S  s Probability Distribution of a Statistic is known as its Sampling Distribution.

6 II. iid Random Samples Sampling Distribution depends on several items:
Population Distribution (parameters) Sample size, n Method of Sampling (with or without replacement) rv’s X1, X2, .. Xn form a random sample of size n if: Xi’s are independent rv’s (independent) Every Xi has the same probability distribution (identically distributed) If satisfy above two conditions  we say Xi’s are iid sampling with replacement or from infinite population  iid sampling w/o replacement requires sample sizes n much smaller than population N to assume iid (rule: n/N <= 0.05).

7 III. Deriving Sampling Distribution of a Statistic
By Probability Rules used for simple cases with a few Xi’s cases where derivation is already done. By Simulation (more common!) typically used when derivation via probability rules is complicated, or if: Underlying distribution of interest in unknown (assumed). We use, we dont derive!

8 Deriving Via Simple Probability
Example: Suppose you sell two brands of DVD players for A: $150, and B: $200. Sales records indicate the following: A – 60% of Sales; B: 40% of Sales Let X1 – revenue from selling A; X2  revenue from B Suppose you take samples of size n=2. List the possible outcome, p(x1,x2), sample mean and variance.

9 DVD Example: Sampling Distribution
Compute: What is the relationship between the expected value of X-bar and variance of X-bar and the original statistics?

10 DVD Example n=3 Now, what is the relationship between mean
and variance of original distribution X Versus X-bar?

11 Deriving Sampling Distributions for Continuous Variables
Similar to discrete distributions, we can also derive the sampling distributions of continuous variables.

12 Example: Two Exponential
Exponential Distribution f(x; l) = l e- l x E(X) = 1/l V(X) = 1/l2 Suppose you have two independent rv’s, each following an exponential distribution and you are interested in the sum of the two rv’s (n=2). It can be shown that:

13 Practical Applications
For many well-known distributions, the sampling distributions of their primary statistics (mean, variance) have already been determined. In those cases where the sampling distribution is unknown or complicated, a very useful alternative is simulation.

14 Simulation Experiment
To perform a simulation, you need: statistic of interest (e.g., X-bar, S, median, ..) population distribution (e.g., normal, uniform, ..) sample size n (e.g., n=10, n=100) number of k replications (e.g., k=500)

15 Simulation #1: Range Vs. S
Conduct an experiment to determine the relationship between Range and S for n=2, n=5, and n=100. Assume X ~ N(0,12)

16 Simulation #2: Jointly Distributed Variables with Tolerance Stack-Up
Develop tolerances for the mean +/- 4s for the volume of an engine cylinder whose: bore ~ N(81 mm, mm) and stroke ~ N(83.5 mm, mm) What is the volume equation? 25.4 mm = 1 in, 1 L = 10^6 mm^3

17 IV. Distribution of Sample Mean/ Total
Proposition - Let X1, X2, .. Xn random sample from a distribution with mean value m and std deviation of s, then: Let Total, To = X1 + X Xn , then: Note difference between average and summing rv’s.

18 Sample Problem: Using the Avg or Sum of rv’s
Let Y = # Parking Tickets issued on any given weekday. Suppose Y has Poisson distribution with l = 50. Assuming you may approximate with normal, What are the mean and variance of the avg # tickets per 5-day week? What are the mean and variance of the sum of tickets per 5-day week? What is the probability that the average # tickets per 5-day week is less than 48? What is the probability that the total # tickets per 5-day week is between 225 and 275? a) Mean = 50 variance = 50 variance(x-bar) = (50) / 5 = 5 s(x-bar) = 2.236 b) Total -- mean = 5*50 = 250; variance to = 250 sigma to = sqrt 250 = Test at C) P(x-bar < 48) / = phi(-1.12) = d) Zu / Zl / phi(1.61) - phi (-1.61) =

19 V. Central Limit Theorem (CLT)
Let X1, X2, .. Xn be a random sample from a distribution with mean value m and variance s2, and if n is sufficiently large, then Rule of Thumb: n > 30. But can be much less!

20 Understanding the CLT Using Minitab, let us generate 100 groups of service times (4 samples per group) from an exponential distribution with mean = 20 min. Describe what is happening to the distribution? Histogram times Histogram Group Avgs

21 Increasing sample size
What is happening to the distribution of the sample averages? (Note: underlying distribution - exponential)

22 Average Multiple Distributions
Suppose you have samples from 3 different distributions (e.g., exp, weibull, and uniform). Minitab results from exponential (l = 20), weibull (shape = 2, scale = 12) and uniform (20, 80). ALL 300 Observations Sample Averages

23 Summarizing the CLT Regardless of the underlying distribution, averaging produces a distribution which is more bell-shaped than before. Usefulness of CLT If n becomes sufficiently large and we wish to compute a probability of the sample mean, we may approximate with a normal. CLT provides analytical robustness! Issue of how robust depends on n and the underlying distribution -- the closer the underlying distribution resembles a normal (bell-shape) the smaller the n that is needed.

24 Other Applications Bernoulli Trials (Binomial Distribution)
Let a sample n consist of Xi Bernoulli trials (where each trial equals 0 for failure, 1 for success). As n (# of trials) becomes large and both: np > 10 and nq > 10 then the distribution of the sample mean (np) will become normally distributed. Consider the following example: 10K bernoulli trials, if you group them in samples of size 100, what will be the distribution of the groups?

25 Bernoulli Trial Example
What does this experiment show about the importance of sample size, particularly for binary attributes?

26 Rules of Thumb with CLT How large a sample size do you need to invoke the CLT? Uniform n >= 4 Symmetric Triangular n >= 3 Normal n >= 1 Unimodal with extreme n >= 30 (e.g., exponential) Discrete - apply normal approx rules Binomial ~ np >= 10 for p < 0.5 (Or, np >= and nq >= 10) Poisson ~ l >= 15

27 VI. Distribution of Linear Combination (Independent Xi’s)
Let X1, X2, .. Xn be a collection of random variables with constraints a1, a2, .. an then, Linear Combination Y = If X1, X2, .. Xn are independent:

28 Differences Between Variables
If Y = X1 - X2, E(X1 - X2) = a1E(X1) - a2E(X2) V(X1 - X2) = a12V(X1) + a22V(X2) Regardless of whether Xi are added or subtracted, the variances are additive!

29 Linear Combination: Tolerance
Suppose you need to slide tube A into tube B. What is the linear combination of assembly clearance if tube A is N(24.8, 0.052) and tube B N(25, 0.052)? Assume the tube measurements are independent. E(B) - E(A) = 0.2 S clear = 0.07

30 Weighted Linear Combination: Tolerance
Suppose you are welding two pieces of metal together: a thick piece and a thin piece. Let Xthin be the position of the thin piece. Let Xthick be the position of the thick piece. From experience, you find the final position is based on the following: Yassembly = 0.2 Xthin Xthick What is the expected variance of the assembly if the standard deviation thin piece is 0.4 mm, and the standard deviation of the thick piece is 0.15 mm? (assume the measurements of each piece is independent) V(Yasm) = .2^2*.4^2 + .8^2*.15^2 = sasm = 0.144


Download ppt "Lecture 15: Statistics and Their Distributions, Central Limit Theorem"

Similar presentations


Ads by Google