Sample Surveys Idea 1: Examine a part of the whole. Population – all items of interest. The whole idea behind a sample survey is to examine a few items from a population (a sample) and from that sample try to say something about the population as a whole. Sample – a few items from the population.
Properties of a Sample The sample should be representative of the population. This may not be possible, but at least we would like a sample that is not biased. In order for this to work, the sample should be representative of the population. There is no way to guarantee that a sample will represent the population so what we settle for is a sample that is not biased (it does not favor one portion of the population over another portion).
Sample Surveys Idea 2: Randomize Selecting items for the sample should be done at random so as to reduce the chance of getting a biased sample. In order to avoid bias, selection of the few items that will make up the sample should be done using a random mechanism so that every item in the population is treated fairly, e.g. has an equal chance of being chosen for the sample.
Sample Surveys Idea 3: It’s the sample size! What fraction of the population is sampled is not important. The size of the sample is the important thing. The size of the sample is the determining factor in terms of how much information you get. A sample of size 1000 from a population of size 1 million does just as well as a sample size of 1000 from a population of 100 million.
What about a census? Would a census of the population be a better way to go? Difficult to do. Populations are often dynamic. Can be more complex. A census is a complete enumeration of every item in the population. This is very difficult to do. The U.S. government attempts to do this every 10 years to simply count the number of people resident in the U.S. Populations are dynamic. As you are enumerating the population can change, people die others are born. Sampling is often easier to accomplish but at a cost, uncertainty is introduced when we rely on samples.
Parameters & Statistics A parameter is a summary of a model for the population (population parameters). A statistic is a summary of sample data (sample statistics). We have already mentioned the difference between population parameters and sample statistics. We will continue to remind you of the important distinction between these two ideas.
Example Population: All students at ISU. Question: Have you posted a video on You Tube? Population parameter: Proportion of all ISU students who would answer yes. Let’s consider an example. Suppose we wish to know what proportion of ISU students use a social networking utility like Facebook. The population proportion (if we were able to ask all 26,000 or so ISU students) is a parameter because it summarizes the entire population.
Example Sample: 400 ISU students. Sample statistic: the proportion of the 400 students in the sample who say yes. A sample would be only a few students, say 400. If we contact 400 ISU students (this is very doable) we could calculate the proportion of the 400 students who use a social networking utility like Facebook. This would be a sample statistic.
How to select the 400? Put an ad in the ISU Daily with the question and ask students to drop off their answers. Stand in front of the library and ask the first 400 students who come by. How do we get the 400 students. We could put an ad in the Daily and wait until 400 students respond. We could stand in front of the library and ask the first 400 students who come by. What is wrong with either of these?
Simple Random Sample Want a representative sample but will settle for one that is not biased. SRS – Each combination of 400 ISU students has the same chance of being the sample selected. Remember we want a representative sample, or at least a sample that is not biased. If we do something that can exclude a portion of the population (those that don’t read the Daily, those that are not on central campus) then our sample will be biased. A simple random sample is a fair way to select a sample because every group of 400 students is just as likely to be the sample chosen.
Simple Random Sample Sampling Frame A list of all students at ISU (the Registrar has such a list) Use random numbers to select 400 students at random from this list. How do you actually obtain a Simple Random Sample. You can start with a list of all the students at ISU (the Registrar has such a list). Identify each person on the list with a unique ID number. Use a random number generator (like the random number table or computer generated random numbers) to select 400 students from that list. You did something similar on Homework 2 when you were asked to select a random sample of 50 from the over 1400 responses to an intro stats survey.
Simple Random Sample If one were to do this more than once Different random numbers will give different samples of 400 students. We have introduced variability by sampling! Sampling introduces uncertainty because you never know who will be in the sample you choose. The sample you choose at random will be different from the sample I choose at random. Therefore, you and I can get different sample statistics and we have to have some way of dealing with the variation introduced by sampling.
Other Sampling Plans Stratified Divide population into strata (subpopulations) and select a SRS from each strata. Divide ISU students into colleges and select a SRS from each college. There are other ways to get random samples. In a stratified random sample, items in the population are stratified based on a common characteristic, e.g. everyone in the same college. Then select SRS’s from each of the strata.
Other Sampling Plans Cluster and multistage Divide population into clusters, each cluster being somewhat representative of the population, and select a cluster as your sample. Another plan involves dividing people into clusters, this time the cluster is a mixed bag that has elements from all of the population. A dormitory might be a cluster as there are students from all colleges, majors, genders, in state/out of state, in a dormitory.
Other Sampling Plans Systematic Select in a systematic way from the sampling frame. Select every 60th student on the list from the Registrar. Caution the order of the list must be random or else a systematic sample can be biased. If the list we have does not have any correspondence between order on the list and the response we are interested in, then a systematic sample is easier to do and will have properties similar to a SRS.
What can go wrong? Relying on volunteers – Ad in the Daily. Convenience – The first 400 students to come by the library. Bad frame – using the ISU directory of phone numbers. We have already seen that certain ways of getting a sample are sure to introduce bias and therefore should be avoided. Relying on volunteers is almost always a bad idea because those who volunteer tend to be those with the strongest opinions. Convenience samples are convenient but are almost always biased. Sometime we can have a list but that list is not complete e.g. directory of phone numbers for ISU students excludes those students without phones or those who choose not to have their number listed.
What can go wrong? Undercoverage 1,000,000 products sold. 100,000 warranty cards returned. 1,000 people selected from those who returned warranty cards. Often times when you buy a product there is a warranty card that you are supposed to fill out and send in. The return rate is fairly low. If we select 1000 people at random from those who returned warranty cards, we can say something about all the people who bought the product and returned the warranty card. We cannot say much about all the people who bought the product.
Other problems Non response Question bias/Response bias Would you favor or oppose a law that would take away your constitutional right to own guns? Even when we select a random sample, some people refuse to respond. If this refusal eliminates a certain group of people then there is non-response bias. Bias can be introduced in the wording of questions. Loaded questions can lead to loaded (biased) answers.