Chapter 12 Sample Surveys Producing Valid Data “If you don’t believe in random sampling, the next time you have a blood test tell the doctor to take it.

Slides:



Advertisements
Similar presentations
Chapter 5 Sample Surveys. Background We have learned ways to display, describe, and summarize data, but have been limited to examining the particular.
Advertisements

Chapter 2 Introductory Information and Basic Terms: Basic Paradigm PopulationSample Statistics Inference Parameters.
© 2012 W.H. Freeman and Company Lecture 7 – Sept Sampling designs We have a population we want to study. It is impractical to collect data on the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 12 Sample Surveys.
Copyright © 2010 Pearson Education, Inc. Slide
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 12 Sample Surveys.
Sample Surveys Chapter 12.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide Background We have learned ways to display, describe, and summarize.
QBM117 Business Statistics Statistical Inference Sampling 1.
Producing data: - Sampling designs and toward inference IPS chapters 3.3 and 3.4 © 2006 W.H. Freeman and Company.
Chapter 12 Sample Surveys
Chapter 4 Simple Random Sampling n Definition of Simple Random Sample (SRS) and how to select a SRS n Estimation of population mean and total; sample.
Sample Surveys Ch. 12. The Big Ideas 1.Examine a Part of the Whole 2.Randomize 3.It’s the Sample Size.
Chapter 12: AP Statistics
Producing data: - Sampling designs and toward inference IPS chapters 3.3 and 3.4 © 2006 W.H. Freeman and Company.
Lecture Unit 3 Sample Surveys Producing Valid Data “If you don’t believe in random sampling, the next time you have a blood test tell the doctor to take.
Producing data: sampling BPS chapter 7 © 2006 W. H. Freeman and Company.
Sample Surveys.  The first idea is to draw a sample. ◦ We’d like to know about an entire population of individuals, but examining all of them is usually.
Introduction to Sampling “If you don’t believe in sampling, the next time you have a blood test tell the doctor to take it all.”
Section 1.2 ~ Sampling Introduction to Probability and Statistics Ms. Young.
Chapter 12 Notes Surveys, Sampling, & Bias Examine a Part of the Whole: We’d like to know about an entire population of individuals, but examining all.
Part III Gathering Data.
Chapter 12 Sample Surveys
Objectives Chapter 12: Sample Surveys How can we make a generalization about a population without interviewing the entire population? How can we make a.
Slide 12-1 Copyright © 2004 Pearson Education, Inc.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 11, Slide 1 Background We have learned ways to display, describe, and summarize data,
Objectives (BPS chapter 8) Producing data: sampling  Observation versus experiment  Population versus sample  Sampling methods  How to sample badly.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Training Activity 8 - Surveys Sample Surveys.
Part III – Gathering Data
Sampling Techniques Governments, companies, and news agencies often want to know the public’s opinion on pertinent questions. Elections offer an excellent.
1 Data Collection and Sampling Chapter Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results.
I can identify the difference between the population and a sample I can name and describe sampling designs I can name and describe types of bias I can.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 11, Slide 1 Chapter 11 Sample Surveys.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
Chapter 12 Sample Surveys math2200. How to generalize beyond the data? Three ideas Examine a part of the whole Randomize Sample size.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 12 Sample Surveys Survey Says…
 An observational study observes individuals and measures variable of interest but does not attempt to influence the responses.  Often fails due to.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
Copyright © 2010 Pearson Education, Inc. Chapter 12 Sample Surveys.
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
We’ve been limited to date being given to us. But we can collect it ourselves using specific sampling techniques. Chapter 12: Sample Surveys.
Sampling Sample Surveys Producing Valid Data “If you don’t believe in random sampling, the next time you have a blood test tell the doctor to take it.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2010 Pearson Education, Inc. Chapter 12 Sample Surveys.
Copyright © 2009 Pearson Education, Inc. Chapter 12 Sample Surveys.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Part III Gathering Data Chapter 12 Sample Surveys.
Chapter 12 Sample Surveys.
Sample Surveys.
Chapter 11 Sample Surveys.
Part III – Gathering Data
Section 5.1 Designing Samples
Chapter 12 Sample Surveys
Chapter 10 Samples.
SAMPLING (Zikmund, Chapter 12.
Week 6 Lecture 1 Chapter 10. Sample Survey.
Chapter 4 Simple Random Sampling
Chapter 11 Sample Surveys Producing Valid Data
CHAPTER 12 Sample Surveys.
Chapter 12 Sample Surveys Copyright © 2010 Pearson Education, Inc.
Chapter 2 Introductory Information and Basic Terms: Basic Paradigm
Chapter 12 Sample Surveys.
SAMPLING.
Chapter 11 Sample Surveys.
Chapter 12 Sample Surveys Copyright © 2010 Pearson Education, Inc.
Chapter 12 Sample Surveys
SAMPLING (Zikmund, Chapter 12).
Presentation transcript:

Chapter 12 Sample Surveys Producing Valid Data “If you don’t believe in random sampling, the next time you have a blood test tell the doctor to take it all.”

The election of 1948 The Predictions The Candidates Crossley Gallup Roper The Results Truman Dewey

Beyond the Data at Hand to the World at Large H We have learned ways to display, describe, and summarize data, but have been limited to examining the particular collection of data we have. H We’d like (and often need) to stretch beyond the data at hand to the world at large. H Let’s investigate three major ideas that will allow us to make this stretch…

3 Key Ideas That Enable Us to Make the Stretch

Idea 1: Examine a Part of the Whole H The first idea is to draw a sample. –We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. –We settle for examining a smaller group of individuals—a sample—selected from the population.

Examples 1.Think about sampling something you are cooking—you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. 2.Opinion polls are examples of sample surveys, designed to ask questions of a small group of people in the hope of learning something about the entire population.

Convenience sampling: Just ask whoever is around. –Example: “Man on the street” survey (cheap, convenient, often quite opinionated or emotional => now very popular with TV “journalism”) H Which people, and on which street? –Ask about gun control or legalizing marijuana “on the street” in Berkeley or in some small town in Idaho and you would probably get totally different answers. –Even within an area, answers would probably differ if you did the survey outside a high school or a country western bar.  Bias: Opinions limited to individuals present. Sampling methods

Voluntary Response Sampling: H Individuals choose to be involved. These samples are very susceptible to being biased because different people are motivated to respond or not. Often called “public opinion polls.” These are not considered valid or scientific. H Bias: Sample design systematically favors a particular outcome. Ann Landers summarizing responses of readers 70% of (10,000) parents wrote in to say that having kids was not worth it—if they had to do it over again, they wouldn’t. Bias: Most letters to newspapers are written by disgruntled people. A random sample showed that 91% of parents WOULD have kids again.

CNN on-line surveys: Bias: People have to care enough about an issue to bother replying. This sample is probably a combination of people who hate “wasting the taxpayers money” and “animal lovers.”

Bias Bias is the bane of sampling—the one thing above all to avoid. There is usually no way to fix a biased sample and no way to salvage useful information from it. The best way to avoid bias is to select individuals for the sample at random. The value of deliberately introducing randomness is one of the great insights of Statistics – Idea 2

Idea 2: Randomize Randomization can protect you against factors that you know are in the data. –It can also help protect against factors you are not even aware of. Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. –Randomizing makes sure that on the average the sample looks like the rest of the population

Idea 2: Randomize (cont.) Individuals are randomly selected. No one group should be over- represented. Sampling randomly gets rid of bias. Random samples rely on the absolute objectivity of random numbers. There are tables and books of random digits available for random sampling. Statistical software can generate random digits (e.g., Excel “=random()”, ran# button on calculator).

Idea 2: Randomize (cont.) H Not only does randomizing protect us from bias, it actually makes it possible for us to draw inferences about the population when we see only a sample.

Example: selecting a random sample H Listed in the table are the names of the 20 pharmacists on the hospital staff. Use the random numbers listed below to select three of them to be in the sample. H

Idea 3: It’s the Sample Size!! How large a random sample do we need for the sample to be reasonably representative of the population? It’s the size of the sample, not the size of the population, that makes the difference in sampling. –Exception: If the population is small enough and the sample is more than 10% of the whole population, the population size can matter. The fraction of the population that you’ve sampled doesn’t matter. It’s the sample size itself that’s important.

Example i) In the city of Chicago, Illinois, 1,000 likely voters are randomly selected and asked who they are going to vote for in the Chicago mayoral race. ii) In the state of Illinois, 1,000 likely voters are randomly selected and asked who they are going to vote for in the Illinois governor's race. iii) In the United States, 1,000 likely voters are randomly selected and asked who they are going to vote for in the presidential election. Which survey has more accuracy? All the surveys have the same accuracy

Idea 3: It’s the Sample Size!! H Chicken soup H Blood samples

Does a Census Make Sense? Why bother worrying the sample size? Wouldn’t it be better to just include everyone and “sample” the entire population? –Such a special sample is called a census.

Does a Census Make Sense? (cont.) There are problems with taking a census: –Practicality: It can be difficult to complete a census— there always seem to be some individuals who are hard to locate or hard to measure. –Timeliness: populations rarely stand still. Even if you could take a census, the population changes while you work, so it’s never possible to get a perfect measure. –Expense: taking a census may be more complex than sampling. –Accuracy: a census may not be as accurate as a good sample due to data entry error, inaccurate (made-up?) data, tedium.

Population versus sample Population: The entire group of individuals in which we are interested but can’t usually assess directly. Example: All humans, all working-age people in California, all crickets A parameter is a number describing a characteristic of the population. Sample: The part of the population we actually examine and for which we do have data. How well the sample represents the population depends on the sample design. A statistic is a number describing a characteristic of a sample. Population Sample

Sample Statistics Estimate Parameters Values of population parameters are unknown; in addition, they are unknowable. Example: The distribution of heights of adult females (at least 18 yrs of age) in the United States is approximately symmetric and mound-shaped with mean µ. µ is a population parameter whose value is unknown and unknowable The heights of 1500 females are obtained from a sample of government records. The sample mean x of the 1500 heights is calculated to be 64.5 inches. The sample mean x is a sample statistic that we use to estimate the unknown population parameter µ

We typically use Greek letters to denote parameters and Latin letters to denote statistics.

Simple Random Sample H A simple random sample (SRS) of size n consists of n units from the population chosen in such a way that every set of n units has an equal chance to be the sample actually selected.

Simple Random Samples (cont.) To select a sample at random, we first need to define where the sample will come from. –The sampling frame is a list of individuals from which the sample is drawn. –E.g., To select a random sample of students from a college, we might obtain a list of all registered full-time students. –When defining sampling frame, must deal with details defining the population; are part-time students included? How about current study-abroad students? Once we have our sampling frame, the easiest way to choose an SRS is with random numbers.

Warning! If some members of the population are not included in the sampling frame, they cannot be part of the sample!! (e. g., using a telephone book as the sampling frame) Population: Wal Mart shoppers Sampling frame?

Example: simple random sample H Academic dept wishes to randomly choose a 3-member committee from the 28 members of the dept 00 Abbott07 Goodwin14 Pillotte21 Theobald 01 Cicirelli08 Haglund15 Raman22 Vader 02 Crane09 Johnson16 Reimann23 Wang 03 Dunsmore10 Keegan17 Rodriguez24 Wieczoreck 04 Engle11 Lechtenb’g 18 Rowe25 Williams 05 Fitzpat’k12 Martinez19 Sommers26 Wilson 06 Garcia13 Nguyen20 Stone27 Zink

Solution Use a random number table; read 2-digit pairs until you have chosen 3 committee members For example, if a row of a random number table is Rodriguez (17) Lechtenberg (11) Engle (04) Your calculator generates random numbers; you can also generate random numbers using Excel

Sampling Variability Suppose we had used row Our sample would have been 19 Summers, 03 Dunsmore, 04 Engle

Sampling Variability Samples drawn at random generally differ from one another. Each draw of random numbers selects different people for our sample. These differences lead to different values for the variables we measure. We call these sample-to-sample differences sampling variability. Variability is OK; bias is bad!!

H This sampling procedure separates the population into mutually exclusive sets (strata), and then selects simple random samples from each stratum. Sex Male Female Age under Occupation professional clerical blue-collar Stratified Random Sampling

H With this procedure we can acquire information about –the whole population –each stratum –the relationships among strata. Stratified Random Sampling

There are several ways to build the stratified sample. For example, keep the proportion of each stratum in the population. A sample of size 1,000 is to be drawn Stratum Income Population proportion 1 under $15,000 25% ,000-29,999 40% ,00030%300 4over $50,000 5% 50 Stratum size Total 1,000

Cluster Sampling Sometimes stratifying isn’t practical and simple random sampling is difficult. Splitting the population into similar parts or clusters can make sampling more practical. Then we could select one or a few clusters at random and perform a census within each cluster. This sampling design is called cluster sampling. If each cluster fairly represents the full population, cluster sampling will give us an unbiased sample.

Cluster Sampling Useful When… it is difficult and costly to develop a complete list of the population members (making it difficultto develop a simple random sampling procedure.)  e.g., all items sold in a grocery store  the population members are widely dispersed geographically.  e.g., all Toyota dealerships in North Carolina

Mean length of sentences in our course text We would like to assess the reading level of our course text based on the length of the sentences. Simple random sampling would be awkward: number each sentence in the book? Better way: choose a few pages at random (the pages are the clusters, and it's reasonable to assume that each page is representative of the entire text). count the length of the sentences on those pages

Cluster sampling - not the same as stratified sampling!! We stratify to ensure that our sample represents different groups in the population, and sample randomly within each stratum. Clusters are more or less alike, each heterogeneous and resembling the overall population.  We select clusters to make sampling more practical or affordable.  We conduct a census on or select a SRS from each selected cluster. Strata are homogenous (e.g., male, female) but differ from one another

Multistage Sampling Sometimes we use a variety of sampling methods together. Sampling schemes that combine several methods are called multistage samples. Most surveys conducted by professional polling organizations and government agencies use some combination of stratified and cluster sampling as well as simple random sampling.

Example: The American Community Survey  The American Community Survey (ACS) is an ongoing survey  … information from the survey generates data that help determine how more than $400 billion in federal and state funds are distributed each year.  … combined into statistics that are used to help decide everything from school lunch programs to new hospitals. 

Mean length of sentences in our course text, cont. In attempting to assess the reading level of our course text: we might worry that it starts out easy and gets harder as the concepts become more difficult we want to avoid samples that select too heavily from early or from late chapters Suppose our course text has 5 sections, with several chapters in each section.

Mean length of sentences in our course text, cont. We could: i) randomly select 1 chapter from each section ii) randomly select a few pages from each of the selected chapters iii) if altogether this makes too many sentences, we could randomly select a few sentences from each page. So what is our sampling strategy? i) we stratify by section of the book ii) we randomly choose a chapter to represent each stratum (section) iii) within each chapter we randomly choose pages as clusters iv) finally, we choose an SRS of sentences within each cluster

Systematic Sampling Sometimes we draw a sample by selecting individuals systematically.  For example, you might survey every 10th person on an alphabetical list of students. To make it random, you must still start the systematic selection from a randomly selected individual. When there is no reason to believe that the order of the list could be associated in any way with the responses sought, systematic sampling can give a representative sample. Systematic sampling can be much less expensive than true random sampling. When you use a systematic sample, you need to justify the assumption that the systematic method is not associated with any of the measured variables.

Systematic Sampling-example You want to select a sample of 50 students from a college dormitory that houses 500 students. On a list of all students living in the dorm, number the students from 001 to 500. Generate a random number between 001 and 010, and start with that student. Every 10th student in the list becomes part of your sample. Questions: 1) does each student have an equal chance to be in the sample? 2) what is the chance that a student is included in the sample? 3) is this an SRS?

End of Chapter 12