Download presentation
1
Sample Means & Proportions
Week 7 Sample Means & Proportions
2
Variability of Summary Statistics
Variability in shape of distn of sample Variability in summary statistics Mean, median, st devn, upper quartile, … Summary statistics have distributions
3
Parameters and statistics
Parameter describes underlying population Constant Greek letter (e.g. , , , …) Unknown value in practice Summary statistic Random Roman letter (e.g. m, s, p, …) We hope statistic will tell us about corresponding parameter
4
Distn of sample vs Sampling distn of statistic
Values in a single random sample have a distribution Single sample --> single value for statistic Sample-to-sample variability of statistic is its sampling distribution.
5
Means Unknown population mean,
Sample mean, X, has a distribution — its sampling distribution. Usually x ≠ A single sample mean, x, gives us information about
6
Sampling distribution of mean
If sample size, n, increases: Spread of distn of sample is (approx) same. Spread of sampling distn of mean gets smaller. x is likely to be closer to x becomes a better estimate of
7
Sampling distribution of mean
Population with mean , st devn Random sample (n independent values) Sample mean, X, has sampling distn with: Mean, St devn, (We will deal later with the problem that and are unknown in practice.)
8
Weight loss Estimate mean weight loss for those attending clinic for 10 weeks Random sample of n = 25 people Sample mean, x How accurate? Let’s see, if the population distn of weight loss is:
9
Some samples Four random samples of n = 25 people:
Mean = 8.32 pounds, st devn = 4.74 pounds Mean = 8.48 pounds, st devn = 5.27 pounds Mean = 7.16 pounds, st devn = 5.93 pounds N.B. In all samples, x ≠
10
Sampling distribution
Means from simulation of 400 samples Theory: mean = = 8 lb, s.d.( ) = lb (How does this compare to simulation? To popn distn?)
11
Errors in estimation From 70-95-100 rule Even if we didn’t know
Population Sampling distribution of mean mean = = 8 lb, s.d.( ) = lb From rule x will be almost certainly within 8 ± 3 lb x is unlikely to be more than 3 lb in error Even if we didn’t know
12
Increasing sample size, n
If we sample n = 100 people instead of 25: s.d.( ) = lb. Larger samples more accurate estimates
13
Central Limit Theorem If population is normal (, )
If popn is non-normal with (, ) but n is large Guideline: n > 30 even if very non-normal
14
Other summary statistics
E.g. Lower quartile, proportion, correlation Usually not normal distns Formula for standard devn of samling distn sometimes Sampling distn usually close to normal if n is large
15
Lottery problem Pennsylvania Cash 5 lottery
5 numbers selected from 1-39 Pick birthdays of family members (none 32-39) P(highest selected is 32 or over)? Statistic: H = highest of 5 random numbers (without replacement)
16
Lottery simulation Theory? Fairly hard.
Simulation: Generated 5 numbers (without replacement) 1560 times Highest number > 31 in about 72% of repetitions
17
Normal distributions Family of distributions (populations)
Shape depends only on parameters (mean) & (st devn) All have same symmetric ‘bell shape’ = 65 inches, s = 2.7 inches
18
Importance of normal distn
A reasonable model for many data sets Transformed data often approx normal Sample means (and many other statistics) are approx normal.
19
Standard normal distribution
Z ~ Normal ( = 0, = 1) -3 -2 -1 1 2 3 Prob ( Z < z* )
20
Probabilities for normal (0, 1)
P(Z -3.00) = P(Z −2.59) = P(Z 1.31) = P(Z 2.00) = P(Z -4.75) = 0.0013 Check from tables:
21
Probability Z > 1.31 P(Z > 1.31) = 1 – P(Z 1.31)
= 1 – = .0951
22
Prob ( Z between –2.59 and 1.31) P(-2.59 Z 1.31)
= P(Z 1.31) – P(Z -2.59) = – = .9001
23
Standard devns from mean
Normal (, ) Heights of students = 65 inches, s = 2.7 inches
24
Probability and area X ~ normal ( = 65 , s = 2.7 )
P (X ≤ 67.7) = area
25
Probability and area (cont.)
Normal (, ) Exactly rule P(X within of ) = approx 70% P(X within 2 of ) = approx 95% P(X within 3 of ) = approx 100%
26
Finding approx probabilities
Ht of college woman, X ~ normal ( = 65 , s = 2.7 ) Prob (X ≤ 62 )? Sketch normal density Estimate area P (X ≤ 62) = area About 1/8
27
Translate question from X to Z
X ~ Normal (, ) Find P(X ≤ x*) x* Translate to z-score: Z ~ Normal ( = 0, = 1) -3 -2 z* -1 1 2 3
28
Finding probabilities
Prob (height of randomly selected college woman ≤ 62 )? About 13%.
29
Prob (X > value) Ht of college woman, X ~ normal ( = 65 , s = 2.7 ) Prob (X > 68 inches)?
30
Finding upper quartile
Blood Pressures are normal with mean 120 and standard deviation 10. What is the 75th percentile? Step 1: Solve for z-score Closest z* with area of (tables) z = 0.67 Step 2: Calculate x = z*s + m x = (0.67)(10) = or about 127.
31
Probabilities about means
Blood pressure ~ normal ( = 120, = 10) 8 people given drug If drug does not affect blood pressure, Find P(average blood pressure > 130)
32
P ( X > 130) ? X ~ normal ( = 120, = 10) n = 8 prob = 0.0023
Very little chance!
33
Distribution of sum X ~ distn with (, ) aX ~ distn with (a, a)
e.g. miles to kilometers Central Limit Theorem implies approx normal
34
Probabilities about sum
Profit in 1 day ~ normal (= $300, = $200) Prob(total profit in week < $1,000)? Total = Prob = Assumes independence
35
Categorical data Most important parameter is = Prob (success)
Corresponding summary statistic is p = Proportion (success) ^ N.B. Textbook uses p and p
36
Number of successes Easiest to deal with count of successes before proportion. If… 1. n “trials” (fixed beforehand). 2. Only “success” or “failure” possible for each trial. 3. Outcomes are independent. Prob (success), remains same for all trials, . Prob (failure) is 1 – . X = number of successes ~ binomial (n, )
37
Examples
38
Binomial Probabilities
for k = 0, 1, 2, …, n You won’t need to use this!! Prob (win game) = 0.2 Plays of game are independent. What is Prob (wins 2 out of 3 games)? What is P(X = 2)?
39
Mean & st devn of Binomial
For a binomial (n, )
40
Extraterrestrial Life?
50% of large population would say “yes” if asked, “Do you believe there is extraterrestrial life?” Sample of n = 100 X = # “yes” ~ binomial (n = 100, = 0.5)
41
Extraterrestrial Life?
Sample of n = 100 X = # “yes” ~ binomial (n = 100, = 0.5) rule of thumb for # “yes” About 95% chance of between 40 & 60 Almost certainly between 35 & 65
42
Normal approx to binomial
If X is binomial (n , ), and n is large, then X is also approximately normal, with Conditions: Both n and n(1 – ) are at least 10. (Justified by Central Limit Theorem)
43
Number of H in 30 Flips X = # heads in n = 30 flips of fair coin X ~ binomial ( n = 30, = 0.5) Bell-shaped & approx normal.
44
Opinion poll n = 500 adults; 240 agreed with statement
If = 0.5 of all adults agree, what P(X ≤ 240) ? X is approx normal with Not unlikely to see 48% or less, even if 50% in population agree.
45
Sample Proportion Suppose (unknown to us) 40% of a population carry the gene for a disease, ( = 0.40). Random sample of 25 people; X = # with gene. X ~ binomial (n = 25 , = 0.4) p = proportion with gene
46
Distn of sample proportion
X ~ binomial (n , ) Large n: p is approx normal (n ≥ 10 & n (1 – ) ≥ 10)
47
Examples Election Polls: to estimate proportion who favor a candidate; units = all voters. Television Ratings: to estimate proportion of households watching TV program; units = all households with TV. Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers. Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.
48
Public opinion poll Suppose 40% of all voters favor Candidate A.
Pollsters sample n = 2400 voters. Propn voting for A is approx normal Simulation 400 times & theory.
49
Probability from normal approx
If 40% of voters favor Candidate A, and n = 2400 sampled Sample proportion, p, is almost certain to be between 0.37 and 0.43 Prob 0.95 of p being between 0.38 and 0.42
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.