Honors Statistics Chapter 12 Part 1 Sample Surveys Honors Statistics Chapter 12 Part 1
Learning Goals Population versus Sample Sample Surveys Generalizing Results Sampling Frame & Sampling Design Simple Random Sample (SRS) Convenience Samples Other Random Sampling Designs Types of Bias The Valid Survey
Population versus Sample Learning Goal #1 Population versus Sample
Learning Goal #1: Background We have learned ways to display, describe, and summarize data, but have been limited to examining the particular batch of data we have. To make decisions, we need to go beyond the data at hand and to the world at large. Let’s investigate three major ideas that will allow us to make this stretch… Examine a part of the whole. Randomize It’s the sample size
Learning Goal #1: Idea 1: Examine a Part of the Whole The first idea is to draw a sample. We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. We settle for examining a smaller group of individuals—a sample—selected from the population. Sampling is a natural thing to do. Think about sampling something you are cooking—you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole.
Learning Goal #1: Population and Sample Population: The collection of all individuals or items under consideration in a statistical study. The population is determined by what we want to know. Sample: That part of the population from which information is obtained. The sample is determined by what is practical and should be representative of the population.
Learning Goal #1: Population and Sample Population: all the subjects of interest We use statistics to learn about the population, the entire group of interest Sample: subset of the population Data is collected for the sample because we cannot typically measure all subjects in the population Population Sample
Learning Goal #1: Example: Population vs. Sample If we have data on all the individuals who have climbed Mt. Everest, then we have population data. On the other hand, if our data come from some of the climbers, we have sample data.
Learning Goal #1: Data Collection The quality of the results obtained from any statistical method is only as good as the data used. The reliability and accuracy of the data affect the validity of the results of a statistical analysis. The reliability and accuracy of the data depend on the method of collection Conclusion: “Garbage in, means garbage out” Sampling methods that, by their nature, tend to over- or underemphasize some characterstics of the population are said to be biased.
Learning Goal #1: Idea 2: Randomize Randomization can protect you against factors that you know are in the data. It can also help protect against factors you are not even aware of. Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. Randomizing makes sure that on the average the sample looks like the rest of the population.
Learning Goal #1: Randomizing Not only does randomizing protect us from bias, it actually makes it possible for us to draw inferences about the population when we see only a sample. Such inferences are among the most powerful things we can do with Statistics. But remember, it’s all made possible because we deliberately choose things randomly.
Learning Goal #1: Idea 3: It’s the Sample Size How large a random sample do we need for the sample to be reasonably representative of the population? It’s the size of the sample, not the size of the population, that makes the difference in sampling. Exception: If the population is small enough and the sample is more than 10% of the whole population, the population size can matter. The fraction of the population that you’ve sampled doesn’t matter. It’s the sample size itself that’s important.
Learning Goal #1: Sample Size Is the number of individuals selected from our population. The size of the population does not dictate the size of the sample. A sample of size 100 may work equally well for a population of 1000 or 10,000 as long as it is a random sample of the population of interest. Example: A ladle of soup gives us the same information regarding the seasoning of the soup regardless of the size of the pot it is taken from as long as the pot is well stirred (random samples). The general rule is that the sample size should be no more than 10% of the population size.
Learning Goal #2 Sample Surveys
Learning Goal #2: Sample Surveys Opinion polls are examples of sample surveys, designed to ask questions of a small group of people in the hope of learning something about the entire population. Professional pollsters work quite hard to ensure that the sample they take is representative of the population. If not, the sample can give misleading information about the population.
Learning Goal #2: Sample Surveys Sample Surveys solicit information from individuals. Types of sample surveys; opinion polls by Personal interview Telephone interview Questionnaire by mail or internet
Learning Goal #2: Sample Survey A sample survey selects a sample of people from a population and interviews them to collect data. A census is a survey that attempts to count the number of people in the population and to measure certain characteristics about them
Learning Goal #2: Census Why bother determining the right sample size? Wouldn’t it be better to just include everyone and “sample” the entire population? Such a special sample is called a census. Often includes a collection of related demographic information (age, race, gender, occupation, income, etc.). Definition: A sample that consists of the entire population (tries to count every individual). Example: US census – an official, periodic (every 10 years) inventory of the entire population of the US.
Learning Goal #2: Census Problems with taking a census: It can be difficult to complete a census—there always seem to be some individuals who are hard (or expensive) to locate or hard to measure; or it may be impractical - food. Populations rarely stand still. Even if you could take a census, the population changes while you work, so it’s never possible to get a perfect measure. Taking a census may be more complex than sampling.
Learning Goal #2: Populations and Parameters Models use mathematics to represent reality. Parameters are the key numbers in those models. A parameter that is part of a model for a population is called a population parameter. Rarely know the true value of a population parameter; we estimate it from sampled data. We use data to estimate population parameters. Any summary found from the data (sample) is a statistic. The statistics that estimate population parameters are called sample statistics.
Learning Goal #2: Population versus Sample Population: The entire group of individuals in which we are interested but can’t usually assess directly Example: All humans, all working-age people in California, all crickets A parameter is a number describing a characteristic of the population. Sample: The part of the population we actually examine and for which we do have data How well the sample represents the population depends on the sample design. A statistic is a number describing a characteristic of a sample. Population Sample
Learning Goal #2: Populations and Parameters
Learning Goal #2: Notation We typically use Greek letters to denote parameters and Latin letters to denote statistics.
Learning Goal #3 Generalizing Results
Learning Goal #3: Generalizing Results Recall that the goal of sampling ask questions of a small group of people, the sample, in the hope of learning something about the entire population. However, care should be taken to generalize the results of a sample only to the population that is represented by the sample.
Sampling Frame & Sampling Design Learning Goal #4 Sampling Frame & Sampling Design
Learning Goal #4: Sampling Frame The sampling frame is a list of individuals from which the sample is drawn. If the sampling frame is not equal to the population of interest and is different from the population in some way that may affect the response variable, the sample will be biased. Example: If we are interested in obtaining information about H.S. students in Florida but obtain our sample of students from a list of private schools, then our sampling frame is not reflective of the population of interest nor is our sample.
Learning Goal #4: Sampling Frame & Sampling Design The sampling frame is the list of subjects in the population from which the sample is taken, ideally it lists the entire population of interest. The sampling design determines how the sample is selected. Ideally, it should give each subject an equal chance of being selected to be in the sample
Learning Goal #4: Sampling Design The sampling design is the method used to chose the sample. All statistical sampling designs incorporate the idea that chance (randomness), rather than choice, is used to select the sample. The value of deliberately introducing randomness is one of the great insights of Statistics. Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. It does that by making sure that on the average the sample looks like the rest of the population.
Simple Random Sample (SRS) Learning Goal #5: Simple Random Sample (SRS)
Learning Goal #5: Simple Random Sample (SRS) We draw samples because we can’t work with the entire population. We need to be sure that the statistics we compute from the sample reflect the corresponding parameters accurately. A sample that does this is said to be representative.
Learning Goal #5: Simple Random Sample (SRS) We will insist that every possible sample of the size we plan to draw has an equal chance to be selected. And such samples must also guarantee that each individual has an equal chance of being selected. A sample drawn in this way is called a Simple Random Sample (SRS). An SRS is the standard against which we measure other sampling methods, and the sampling method on which the theory of working with sampled data is based.
Learning Goal #5: Simple Random Sample (SRS) Requirements for Simple Random Sample (SRS): Every sample of size n from the population has an equal chance of being selected and Every member of the population has an equal chance of being included in the sample. The preferred method – probability is the highest that the sample is representative of the population than for any other sampling method. Least chance of sample bias.
Learning Goal #5: Simple Random Sample (SRS) Random Sampling is the best way of obtaining a sample that is representative of the population.
Learning Goal #5: SRS Example Two club officers are to be chosen for a New Orleans trip There are 5 officers: President, Vice-President, Secretary, Treasurer and Activity Coordinator The 10 possible samples are: (P,V) (P,S) (P,T) (P,A) (V,S) (V,T) (V,A) (S,T) (S,A) (T,A) For a SRS, each of the ten possible samples has an equal chance of being selected. Thus, each sample has a 1 in 10 chance of being selected and each officer has a 1 in 4 chance of being selected.
Learning Goal #5: Methods of SRS Place names (population) in a hat and draw out a handful (sample). Computer/TI-84 software. Table of random digits A long string of the digits 0,1,2,3,4,5,6,7,8,9 with these two properties Each entry in the table is equally likely to be any of the ten digits 0 through 9. The entries are independent of each other, that is, knowledge of one part of the table gives no information about any other part.
Leaning Goal #5: Using Random Numbers to select a SRS To select a simple random sample Number the subjects in the sampling frame using numbers of the same length (number of digits). Select numbers of that length from a table of random numbers or using a random number generator. Include in the sample those subjects having numbers equal to the random numbers selected.
Learning Goal #5: Example – Choosing a SRS We need to select a random sample of 5 from a class of 20 students. List and number all members of the population, which is the class of 20. The number 20 is two-digits long. Parse the list of random digits into numbers that are two digits long. Here we chose to start with line 103, for no particular reason. 22 36 84 65 73 25 59 58 53 93 30 99 58 91 98 27 98 25 34 02
22 36 84 65 73 25 59 58 53 93 30 99 58 91 98 27 98 25 34 02 24 13 04 83 60 22 52 79 72 65 76 39 36 48 09 15 17 92 48 30 1 Alison 2 Amy 3 Brigitte 4 Darwin 5 Emily 6 Fernando 7 George 8 Harry 9 Henry 10 John 11 Kate 12 Max 13 Moe 14 Nancy 15 Ned 16 Paul 17 Ramon 18 Rupert 19 Tom 20 Victoria Choose a random sample of size 5 by reading through the list of two-digit random numbers, starting with line 2 and on. The first five random numbers matching numbers assigned to people make the SRS. The first individual selected is Amy, number 02. That’s it from line 2. Move to line 3 Then Moe (13), Darwin, (04), Henry (09), and Net (15) Remember that 1 is 01, 2 is 02, etc. If you were to hit 09 again before getting five people, don’t sample Ramon twice—you just keep going.
Learning Goal #5: SRS Example Use a random digit table to pick a random sample of 30 cars from a population of 500 cars. Label - Assign each car a different number from 001 to 500 (3 digit group). Table – Enter Table B on line 108 (can begin anywhere) and regroup the digits in groups of 3 (because our labels have 3 digits). Then select the sample.
Learning Goal #5: SRS Example 108 60940 72024 17868 24943 61790 90656 87964 18883 109 36009 19365 15412 39638 85453 46816 83485 41979 609 407 202 417 868 249 436 179 090 656 879 641 888 336 009 193 651 541 239 638 854 534 681 683 Select the first 30 digit groups that are within the range of your labels to make up the SRS. SRS – 407, 202, 417, 249, 436, 179, 090, 336, 009, 193, 239, etc.
Learning Goal #5: Your Turn Suppose 80 students are taking an AP Statistics course and the teacher wants to randomly pick out a sample of 10 students to try out a practice exam. Select a SRS of 10 students. Solve – use the following Random Digit Table beginning at line 108. 108 60940 72024 17868 24943 61790 90656 87964 109 18883 36009 19365 15412 39638 85453 46816
Learning Goal #5: TI-84 Random Digits Use RANDINT function (MATH/PRB/5:RANDINT) RANDINT(lower limit, upper limit, number of digits) RANDINT(0,9,5) – generates 5 random integers between 0-9. RANDINT(1,6) – generates a random integers between 1-6, simulates rolling die. RANDINT(0,99) – generates two digit number from 00-99 each time the return key is pressed. 124 RAND, sets TI-84 to the same random digits.
Learning Goal #5: Simple Random Samples Samples drawn at random generally differ from one another. Each draw of random numbers selects different people for our sample. These differences lead to different values for the variables we measure. We call these sample-to-sample differences sampling variability.
Learning Goal #5: Sampling Variability Is the natural tendency of randomly drawn samples to differ, one from another. Sampling variability is not an error, just the natural result of random sampling. Statistics attempts to minimize, control, and understand variability so that informed decisions can drawn from the data despite their variation. Although samples vary, when we use chance to select them, they do not vary haphazardly but rather according to the laws of probability.
Learning Goal #5: Example: Sample Variability Each of four major news organizations surveys likely voters and separately reports that the percentage favoring the incumbent candidate is 53.5%, 54.1%, 52%, and 54.2%, respectively. What is the correct percentage? Did three or more of the news organizations make a mistake?
Learning Goal #5: Solution There is no way of knowing the correct population percentage from the information given. The four surveys led to four statistics, each an estimate of the population parameter. No one made a mistake unless there was a bad survey. Sampling variation is natural.
Learning Goal #6 Convenience Samples
Learning Goal #6: Convenience Samples: Poor Ways to Sample Convenience Sample: a type of survey sample that is easy to obtain, exactly as its name suggests, by sampling individuals who are conveniently available. Unlikely to be representative of the population Often severe biases result from such a sample Results apply ONLY to the observed subjects The classic example of a convenience sample is standing at a shopping mall and selecting shoppers as they walk by to fill out a survey.
Learning Goal #6: Convenience Samples: Poor Ways to Sample The classic example of a convenience sample is standing at a shopping mall and selecting shoppers as they walk by to fill out a survey. Internet convenience samples are worthless, they have no well-defined sampling frame and thus report no useful information.
Learning Goal #6: Convenience Samples: Poor Ways to Sample Just ask whoever is around. Example: “Man on the street” survey (cheap, convenient, often quite opinionated or emotional → now very popular with TV “journalism”) Which men, and on which street? Ask about gun control or legalizing marijuana “on the street” in Berkeley, CA and in some small town in Idaho and you would probably get totally different answers. Even within an area, answers would probably differ if you did the survey outside a high school or a country-western bar. Bias: Opinions limited to individuals present
Learning Goal #6: Voluntary Response Sample Voluntary Response Sample: most common form of a convenience sample. Subjects volunteer for the sample, a large group of individuals are invited to respond and all who do respond are counted. The sample is not representative, volunteers do not tend to be representative of the entire population. Results in voluntary response bias (which will be discussed later) which invalidates the survey.
Ann Landers summarizing responses of readers: Learning Goal #6: Voluntary Response Sample Individuals choose to be involved. These samples are very susceptible to being biased because different people are motivated to respond or not. They are often called “public opinion polls” and are not considered valid or scientific. Bias: Sample design systematically favors a particular outcome. Ann Landers summarizing responses of readers: Seventy percent of (10,000) parents wrote in to say that having kids was not worth it—if they had to do it over again, they wouldn’t. Bias: Most letters to newspapers are written by disgruntled people. A random sample showed that 91% of parents WOULD have kids again.