Chapter 3 Generalization: How broadly do the results apply?

Slides:



Advertisements
Similar presentations
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Advertisements

Stat 512 – Day 8 Tests of Significance (Ch. 6). Last Time Use random sampling to eliminate sampling errors Use caution to reduce nonsampling errors Use.
Chapter 7 Sampling Distributions
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 7 Sampling Distributions 7.1 What Is A Sampling.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
More About Significance Tests
June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
Chapter 10: Introduction to Statistical Inference.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Chapter 7 Sampling Distributions Target Goal: DISTINGUISH between a parameter and a statistic. DEFINE sampling distribution. DETERMINE whether a statistic.
1 Chapter 2: Sampling and Surveys. 2 Random Sampling Exercise Choose a sample of n=5 from our class, noting the proportion of females in your sample.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a measure of the population. This value is typically unknown. (µ, σ, and now.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 7 Sampling Distributions 7.1 What Is A Sampling.
Chapter 7 Data for Decisions. Population vs Sample A Population in a statistical study is the entire group of individuals about which we want information.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Chapter 7: Sampling Distributions Section 7.2 Sample Proportions.
Sampling Distributions Chapter 18. Sampling Distributions If we could take every possible sample of the same size (n) from a population, we would create.
Plan for Today: Chapter 1: Where Do Data Come From? Chapter 2: Samples, Good and Bad Chapter 3: What Do Samples Tell US? Chapter 4: Sample Surveys in the.
Section 7.1 Sampling Distributions. Vocabulary Lesson Parameter A number that describes the population. This number is fixed. In reality, we do not know.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 7 Sampling Distributions 7.1 What Is A Sampling.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 9 Estimation using a single sample. What is statistics? -is the science which deals with 1.Collection of data 2.Presentation of data 3.Analysis.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
Sampling Distributions
CHAPTER 7 Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 7: Sampling Distributions
Simulation-Based Approach for Comparing Two Means
CHAPTER 10 Comparing Two Populations or Groups
Stat 217 – Day 28 Review Stat 217.
CHAPTER 7 Sampling Distributions
CHAPTER 7 Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
TESTs about a population mean
What Is a Sampling Distribution?
Chapter 7: Sampling Distributions
CHAPTER 7 Sampling Distributions
Chapter 7: Sampling Distributions
CHAPTER 7 Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 7 Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Comparing Two Proportions
Chapter 7: Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
Chapter 7: Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 7: Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
Chapter 7: Sampling Distributions
Presentation transcript:

Chapter 3 Generalization: How broadly do the results apply?

Generalization So far we’ve studied significance and estimation. Once we make a conclusion from a test of significance or construct a confidence interval, how broadly do these apply or to what population can I generalize these results? This generalization is the topic for this section.

Generalization Sometimes this generalization is difficult and sometimes it is not. Generalizing to a larger population is valid only when the sample is representative. Unfortunately, biased sampling methods are common.

Section 3.1 Introduction to sampling from a finite population

Notation Check Statistics Parameters 𝑥 (x-bar) Sample Average or Mean 𝑝 (p-hat) Sample Proportion Parameters 𝜇 (mu) Population Average or Mean 𝜋 (pi) Population Proportion Statistics summarize a sample and parameters summarize a population

Sampling Hope College students Suppose we want to know the proportion of Hope students that watched the Super Bowl. Or the average number of traffic tickets Hope students have received. The population of interest is all Hope students. A census will get this information from all Hope students. What if you don’t have time/money to interview all students?

Sampling We can take a sample of Hope students and find the proportion of those in our sample that watched the Super Bowl or mean number of traffic tickets they have received. Using these statistics we can make inferences to the parameters. How well will these statistics represent our parameters of interest? The key to this question is how the sample is selected from the population.

Random Sampling Getting a random sample is key to making a good inference. This can be tough; we don’t live in a random world. For example, the people you see on a daily basis can be very different from the people others near to you see on a daily basis. When samples are not random or representative their results can be misleading.

Biased Sampling

ESPN Top 10: What is college basketball's fiercest rivalry? Connecticut vs. Tennessee (Women) Duke vs. North Carolina Hope vs. Calvin Illinois vs. Missouri Indiana vs. Purdue Louisville vs. Kentucky Penn vs. Princeton Philadelphia's Big 5 Oklahoma vs. Oklahoma State Xavier vs. Cincinnati http://proxy.espn.go.com/chat/sportsnation/polling?event_id=1194

ESPN Top 10: What is college basketball's fiercest rivalry? 75.1% Hope vs. Calvin 9.3% Duke vs. North Carolina 5.4% Indiana vs. Purdue 5.2% Philadelphia's Big 5 1.7% Penn vs. Princeton 1.5% Oklahoma vs. Oklahoma State 0.7% Louisville vs. Kentucky 0.6% Connecticut vs. Tennessee (Women) 0.3% Illinois vs. Missouri 0.3% Xavier vs. Cincinnati Total Votes: 46,084

2012 State ACT Results New York ranked 6th with an average of 23.3. Michigan ranked 45th with an average of 20.1.

2011 State SAT Results ??? MI NY ACT 100% 29% SAT 4% 90% New York ranked 45th with an average of 1466. Michigan ranked 6th with an average of 1762. ??? MI NY ACT 100% 29% SAT 4% 90%

Random Sample To have a random sample, you can’t have people self-select themselves into the sample. (Basketball poll) You can’t choose a convenient sample that is clearly not representative of the population. (ACT vs. SAT)

Random Sample A simple random sample is the easiest way to ensure that your sample is unbiased. A sampling method is biased if statistics from samples consistently over or under- estimate the population parameter.

Simple Random Sample A simple random sample is like drawing names out of a hat. Technically, a simple random sample is a way of randomly selecting members of a population so that every sample of a certain size from a population has the same chance of being chosen.

Sampling Every simple random sample gives us different values for the statistics. There is variability from sample to sample (sampling variability). If we take repeated simple random samples of Hope students, each sample will consist of different students. We will get different means or proportions each time we do this. However …

Sampling The sample means or proportions will center around the population mean or proportion if the sampling method is unbiased (like a simple random sample). Our sampling variability will decrease when we take larger and larger sample sizes.

Exploration 3.1A: Sampling Words We need to sample from a population of interest if it is very large or is difficult to measure every single member of the population. If we were interested in High School GPA for Hope students we would not need to sample. The registrar’s office has all that information. If we were interested in something that has not already been collected, we might want to sample.

Exploration 3.1A: Sampling Words That being said, in this activity we will be using the words in the Gettysburg Address as our population. There are fewer than 300 in this speech and we could easily look at the entire speech to find out average word length, proportion of words that contain an e, etc. We will be sampling from this speech not to get information from the population, but to help us learn some things about sampling.

Only picture of Lincoln at Gettysburg (Edward Everett spoke for over two hours. Lincoln followed with his two-minute speech.)

Exploration 3.1A Select what you think is a representative sample of 10 words from the Gettysburg (pg 3-10). Record your words in table in question 2. Make dotplots of both average length and proportion containing e on the board. Only work through question 22. HW: Exercises 3.1.3 and 3.1.4

Review of Section 3.1 A sampling method is biased if statistics from samples consistently over or under- estimate the population parameter. A simple random sample is the easiest way to insure that your sample is unbiased. Therefore, if we have a simple random sample, we can infer our results to the population from which is was drawn.

Review of Section 3.1 We saw biased and unbiased sampling in the Gettysburg Address exploration. We also saw that: When we increase sample size, the variability of our sampling distribution decreases. This variability can be predicted. Changing the population size has no effect on variability.

Population distribution of word lengths Distribution of average word length from samples of size 20

Section 3.2: Inference for a Single Quantitative Variable Using methods similar to what we did in the last section, we will see how a null distribution for a single quantitative variable can be obtained and even predicted.

Example 3.2: Estimating Elapsed Time

Estimating Time Does it ever seem that time drags or flies by? Students in a stats class (for their project 2) collected data on students’ perception of time Subjects were told that they’d listen to music and asked questions when it was over. Played 10 seconds of the Jackson 5’s “ABC” and asked how long they thought it lasted Can students accurately estimate the length?

Hypotheses Null Hypothesis: People will accurately estimate the length of a 10 second-song snippet, on average. (μ = 10 seconds) Alternative Hypothesis: People will not accurately estimate the length of a 10 second-song snippet, on average. (μ ≠ 10 seconds)

Estimating Time A convenience sample of 48 students on campus were subjects and song length estimates were recorded. The average estimate was 13.71 sec and the standard deviation was 6.50 sec.

Skewed, mean, median The distribution obtained is not symmetric, but is right skewed. When data are skewed right, the mean gets pulled out to the right while the median is more resistant to this.

Mean v Median The mean is 13.71 and the median is 12. How would these numbers change if on of the people that gave an answer of 30 seconds actually said 300 seconds? The standard deviation is 6.5 sec. Is it resistant to outliers?

Population? One way to develop a null distribution is to draw samples from some population that we think our population of time estimates might look like under a true null. Under the null the mean is 10 sec. We might assume the population is skewed and has a standard deviation similar to what we found.

Simulation-based Inference We have a possible population data set similar to what we need. Let’s go and get that data. Then go to the One Mean applet and develop a null distribution. Find out where our actual mean of 13.71sec is located. And finally see how a t-distribution could predict all this.

T-distribution The t-distribution is very similar to a normal distribution, but with slightly “heavier” tails. The t-statistic is the standardized statistic we use with a single quantitative variable and can be found using the formula: 𝑡= 𝑥 −𝜇 𝑠 𝑛

Validity Conditions The theory-based test for a single mean requires either: The sample size is at least 20. If the sample size is less than 20 the sample distribution is not skewed. Let’s use the theory-based applet to run this test and find a confidence interval. (We first need to get the data.)

Estimating Time Formulate Conclusions. Based on our small p-value, we can conclude that people don’t accurately estimate the length of a 10-second song snippet and in fact they overestimate it. To what larger population can we make our inference?

Estimating Time We are 95% confident that the average estimate of a 10 second song is between 11.823 and 15.597 seconds.

Exploration 3.2: Sleepless Nights? Page 3-32