Collecting Data Sensibly Chapter 2 Collecting Data Sensibly Note: Correct usage of the vocabulary in this chapter is VERY important!
Consider the following headlines which occurred on September 25, 2009. These headlines imply that spanking is the CAUSE of the observed difference in IQ. Is this conclusion reasonable? Consider the following headlines which occurred on September 25, 2009. “Spanking lowers a child’s IQ” (Los Angeles Times) “Do you spank” Studies indicate it could lower your kids’ IQ.” (SciGuy, Houston Chronicle) “Spanking can lower IQ” (NBC4i, Columbus, Ohio) “Smacking hits kids’ IQ” (newscientist.com) In this study, two groups of children were followed for 4 years; 806 children ages 2 to 4 and 704 children ages 5 to 9. IQ was measured at the beginning of the study and again four years later. Researchers found that the average IQ of children, ages 2 to 4, who were not spanked was 5 points higher than those who were spanked and 2.8 points higher for children, ages 5 to 9.
Observation versus Experimentation How do these two examples differ? Think about: How the groups were determined? Were any variables controlled? What did the researcher do? Observation versus Experimentation Look at the following two examples: A social scientist studying a rural community wants to determine whether gender and attitudes toward abortion are related. Using a telephone survey, 100 residents are contacted at random and their gender and attitude toward abortion are recorded. A professor might wonder what would happen to final test scores if the required lab time for a chemistry course is increased from 3-hours to 6-hours. For 100 chemistry students, half were randomly assigned to the 3-hour lab and half to the 6-hour lab. The rest of the course remained the same for the two groups. The difference in their final test scores will be examined. Which is the experiment and which is the observational study? See page 32-33 for more information.
Definitions: Observational study – a study in which the researcher observes characteristics of a sample selected from one or more populations. Experiment - a study in which the researcher observes how a response variable behaves when one or more explanatory variables (factors) are manipulated. A well-designed experiment can result in data that provides evidence for a cause-effect relationship.
Let’s return to the study on spanking and IQ In this study, two groups of children were followed for 4 years; 806 children ages 2 to 4 and 704 children ages 5 to 9. IQ was measured at the beginning of the study and again four years later. Researchers found that the average IQ of children, ages 2 to 4, who were not spanked was 5 points higher than those who were spanked and 2.8 points higher for children, ages 5 to 9. Does spanking “CAUSE” a decrease in IQ? Why or why not? Are there other variables connected to the response (decreased IQ) and the groups of children? These are called confounding variables. Have students consider variables like socio-economic status, education level of parent, home or school environments
Definition: Confounding variable – a variable that is related to both group membership and the response variable of interest in a research study Because observational studies may contain confounding variables, their results can NOT be used to show cause-effect relationships.
Observational studies CAN be generalized to the population if the sample is randomly selected from the population of interest, but CANNOT show cause-effect relationships. Well-designed experiments CAN show cause-effect relationships, but CANNOT be generalized to the population if the groups are volunteers or are not randomly assigned.
Sampling Section 2.2
Census versus Sample Why might we prefer to take select a sample rather than perform a census? Measurements that require destroying the item Measuring how long batteries last Safety ratings of cars Difficult to find entire population Length of fish in a lake Limited resources Time and money Obtaining information about the entire population is called a census. Most common reason to use a sample
Methods of selecting random samples Simple Random Sample (SRS) A sample of size n is selected from the population in a way that ensures that every different possible sample of the desired size has the same chance of being selected. Suppose a local school has 2000 students. We want to survey 100 students about the current cell phone policy. A sample of students can be selected by putting each students’ name on individual (but identical) slips of paper and placing them in a large container. After mixing well, randomly select 100 names from the container, one at a time. This is an example of a simple random sample. It has to be possible for all 100 students in the sample to be seniors – or any other combination of students! A simple random sample does NOT guarentee that the sample is representative of the population.
Methods of selecting random samples Simple Random Sample (SRS) continued A sample of size n is selected from the population in a way that ensures that every different possible sample of the desired size has the same chance of being selected. Sampling frame – list of all the objects or individuals in the population. Another way to select a simple random sample is to create a list of all the students in the school (called a sampling frame). Number each student with a unique number from 1 to 2000. Use a random digit table or random number generator (a calculator or computer software) to select the 100 students for the sample. Another way to select a simple random sample is to create a list of all the students in the school (called a sampling frame).
How to use a Random digit table The following is part of the random digit table found in the back of your textbook: Row 6 9 3 8 7 5 2 4 1 We would continue in this fashion until we had selected 100 numbers. It would be faster to use a random number generator. Since our students are numbered 1-2000, we will select 4-digit numbers from the table. If the number is not within 1-2000, we will ignore it.
Methods of selecting random samples Simple Random Sample (SRS) continued A sample of size n is selected from the population in a way that ensures that every different possible sample of the desired size has the same chance of being selected. Although sampling with and without replacement are different, they can be treated as the same when the sample size n is relatively small compared to the population size (no more than 10% of the population). Most often sampling is done without replacement. That is once an individual or object is selected, they are not replaced and cannot be selected again. Sampling with replacement allows an object or individual to be selected more than once for a sample.
Methods of selecting random samples Stratified Random Sample Population is divided into non-overlapping subgroups called strata Simple random samples are selected from each stratum Sometimes easier to implement and is more cost effective than simple random sampling Sometimes allows more accurate inferences about a population than simple random sampling Instead of a simple random sample to answer our survey about the cell phone policy at school, suppose we were take four simple random samples of size 25 from each grade level, freshman, sophomore, junior, and senior. This would be an example of a stratified random sample. Strata are groups that are similar (homogeneous) based upon some characteristic of the group members.
Methods of selecting random samples Cluster Sampling Population is divided into non-overlapping subgroups called clusters Randomly select clusters and then all the individuals in the clusters are included in the sample Cluster sampling is often easier to perform and more cost effective. Let’s look at another way to select a sample of students to answer our survey on the current cell phone policy at our school. One way to do this would be to randomly select 5 classrooms during 2nd period. Survey all the students in those rooms! This is an example of a cluster sample. Clusters are often based upon location. It is best if the clusters are heterogeneous subgroups from the population.
Methods of selecting random samples Systematic Sampling A value k is specified (for example k = 50 or k = 200). One of the first k individuals is selected at random. Then every kth individual in the sequence is included in the sample. This method works reasonably well as long as there are no repeating patterns in the population list. Suppose we randomly select a number between 1 and 20. Using a alphabetical list of students at our school, select the student whose name is at that number in the list. Then choose every 20th student from there. This is an example of a systematic random sample.
Identify the sampling design 1)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, medium public, medium private, large public, and large private). Then they randomly selected 3 colleges from each group. Stratified random sample
Identify the sampling design 2) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks. Cluster sampling
Identify the sampling design 3) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave. Systematic sampling
This is a classic example of how bias affects the results of a sample! Consider the following example: In 1936, Franklin Delano Roosevelt had been President for one term. The magazine, The Literary Digest, predicted that Alf Landon would beat FDR in that year's election by 57 to 43 percent. The Digest mailed over 10 million questionnaires to names drawn from lists of automobile and telephone owners, and over 2.3 million people responded - a huge sample. At the same time, a young man named George Gallup sampled only 50,000 people and predicted that Roosevelt would win. Gallup's prediction was ridiculed as naive. After all, the Digest had predicted the winner in every election since 1916, and had based its predictions on the largest response to any poll in history. But Roosevelt won with 62% of the vote. The size of the Digest's error is staggering. Bias is the tendency for samples to differ from the corresponding population in some systematic way. This is a classic example of how bias affects the results of a sample!
Sources of bias Selection bias People with unlisted phone numbers – usually high-income families Selection bias Occurs when the way the sample is selected systematically excludes some part of the population of interest –called undercoverage May also occur if only volunteers or self-selected individuals are used in a study People without phone numbers –usually low-income families Suppose you take a sample by randomly selecting names from the phone book – some groups will not have the opportunity of being selected! People with ONLY cell phones – usually young adults
Sources of bias Convenience sampling An example would be the surveys in magazines that ask readers to mail in the survey. Other examples are call-in shows, American Idol, etc. Remember, the respondent selects themselves to participate in the survey! Convenience sampling Using an easily available or convenient group to form a sample. The group may not be representative of the population of interest Results should not be generalized to the population Can also occur when samples rely entirely on volunteers to be part of the sample – called voluntary response Suppose we decide to survey only the students in our statistics class – why might that cause bias in a survey?
Sources of bias Measurement or Response bias Suppose we wanted to survey high school students on drug abuse and we used a uniformed police officer to interview each student in our sample – would we get honest answers? Measurement or Response bias Occurs when the method of observation tends to produce values that systematically differ from the true value in some way Improperly calibrated scale is used to weigh items Tendency of people not to be completely honest when asked about illegal behavior or unpopular beliefs Appearance or behavior of the person asking the questions Questions on a survey are worded in a way that tends to influence the response A Gallup survey sponsored by the American Paper Institute (Wall Street Journal, May 17, 1994) included the following question: “It is estimated that disposable diapers accounts for less than 2% of the trash in today’s landfills. In contrast, beverage containers, third-class mail and yard waste are estimated to account for about 21% of trash in landfills. Given this, in your opinion, would it be fair to tax or ban disposable diapers?” People are asked if they can trust men in mustaches – the interviewer is a man with a mustache.
Sources of bias Nonresponse occurs when responses are not obtained from all individuals selected for inclusion in the sample To minimize nonresonse bias, it is critical that a serious effort be made to follow up with individuals who did not respond to the initial request for information The phone rings – you answer. “Hello,” the person says, “do you have time for a survey about radio stations?” You hang up! People are chosen by the researchers, BUT refuse to participate. NOT self-selected! This is often confused with voluntary response! How might this follow-up be done?
Identify a potential source of bias. 1) Before the presidential election of 1936, FDR against Republican ALF Landon, the magazine Literary Digest predicting Landon winning the election in a 3-to-2 victory. A survey of 2.3 million people. George Gallup surveyed only 50,000 people and predicted that Roosevelt would win. The Digest’s survey came from magazine subscribers, car owners, telephone directories, etc. Undercoverage – since the Digest’s survey comes from car owners, etc., the people selected were mostly from high-income families and thus mostly Republican! (other answers are possible)
Identify a potential source of bias. 2) Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at a local college. You collect register receipts for students as they leave the bookstore during lunch one day. Convenience sampling – easy way to collect data or Undercoverage – students who buy books from on-line bookstores are excluded.
Identify a potential source of bias. 3) To find the average value of a home in Plano, one averages the price of homes that are listed for sale with a realtor. Undercoverage – leaves out homes that are not for sale or homes that are listed with different realtors. (other answers are possible)
Comparative Experiments Sections 2.3 & 2.4
What variable will we “measure”? the performance on a calculus exam Suppose we are interested in determining the effect of room temperature on the performance on a first-semester calculus exam. So we decide to perform an experiment. What variable will we “measure”? the performance on a calculus exam What variable will “explain” the results on the calculus exam? the room temperature This is called the response variable. Response variable – a variable that is not controlled by the experimenter and that is measured as part of the experiment This is called the explanatory variable. Explanatory variables – those variables that have values that are controlled by the experimenter (also called factors)
Room temperature experiment continued . . . We decide to use two temperature settings, 65° and 75°. How many treatments would our experiment have? the 2 treatments are the 2 temperature settings Experimental condition – any particular combination of the explanatory variables (also called treatments)
Room temperature experiment continued . . . Suppose we have 10 sections of first-semester calculus that have agree to participate in our study. On who or what will we impose the treatments? the 10 sections of calculus How would we determine which sections would be in rooms with the temperature set at 65° and which sections in rooms set at 75°? we need to randomly assign them to the treatments Random assignment of subjects to treatments or treatments to trials ensures that the experiment does not systematically favor one treatment over another. These are our subjects or experimental units. Experimental units – the smallest unit to which a treatment is applied.
The remaining sections will have the room temperature set at 75°. Room temperature experiment continued . . . To randomly assign the 10 sections of first-semester calculus to the 2 treatment groups, we would first number the classes 1-10. Place the numbers 1-10 on identical slips of paper and put them in a hat. Mix well. Sections assigned Treatment 1 (65°) Treatment 2 (75°) 5 8 7 3 9 9 7 5 8 3 1 2 4 6 10 Randomly select 5 numbers from the hat. Those will be the sections that have the room temperature set at 65°. The remaining sections will have the room temperature set at 75°.
Why is replication an important trait of a well-designed experiment? Room temperature experiment continued . . . Notice that there are five sections assigned to each treatment. This is called replication. Why is replication an important trait of a well-designed experiment? Sections assigned Treatment 1 (65°) 9 7 5 8 3 Treatment 2 (75°) 1 2 4 6 10 Replication is important so that we can account for the natural variation that occurs within the experimental units. Replication ensures that we have multiple observations for each treatment.
Room temperature experiment continued . . . In an experiment, these extraneous variables need to be “controlled”. Direct control is holding the extraneous variables constant so that their effects are not confounded with those of the experimental conditions (treatments). Remember – the explanatory variable is the room temperature setting, 65° and 75°. The response variable is the grade on the calculus exam. Are there other variables that could affect the response? These other variables are called extraneous variables. An extraneous variable is a variable that is NOT one of the explanatory variables (factors) but it is thought to affect the response. What about the variables that the experimenter can’t directly control? What can be done to avoid confounding results? Can the experimenter control these extraneous variables? If so, how? Remember - two variables are confounding if their effects on the response cannot be distinguished from each other. Students should answer with things like time of day, instructor, textbook, ability level of students in the sections Instructor? Textbook? Time of day? Ability level of students?
Room temperature experiment continued . . . Suppose that there were five instructors who taught the first-semester calculus. We do not have direct control of this variable; however, we could have each instructor teach 2 sections. Then we could randomly assign which one of the 2 sections would have a temperature setting of 65° and the other would have a temperature setting of 75°. This is an example of blocking. Blocking is process by which an extraneous variable’s effects are filtered out. Similar groups, called blocks, are created. All treatments are tried in each block. Students should answer with things like time of day, instructor, textbook, ability level of students in the sections
Room temperature experiment continued . . . What about extraneous variables that we cannot control directly or that we cannot block for or that we don’t even think about? Random assignment should evenly spread all extraneous variables, that are not controlled directly or that are not blocked, into all treatment groups. We expect these variables to affect all the experimental groups in the same way; therefore, their effects are not confounding.
Room temperature experiment continued . . . Would the students in each section of calculus know to which treatment group, 65° or 75°, they were assigned? If the students knew about the experiment, they would probably know which treatment group they were in. So this experiment is probably NOT blinded. A double-blind experiment is one in which neither the subjects nor the individuals who measure the response knows which treatment is received. An experiment in which the subjects do not know which treatment they were in is called a single-blind experiment.
In the room temperature experiment, we only have 2 treatment groups, 65° and 75°. We do NOT have a control group. Control group is an experimental group that does NOT receive any treatment. The use of a control group allows the experimenter to assess how the response variable behaves when the treatment is not used. This provides a baseline against which the treatment groups can be compared to determine whether the treatment had an effect.
Which of these is the control group? Consider Anna, a waitress. She decides to perform an experiment to determine if writing “Thank you” on the receipt increases her tip percentage. She plans on having two groups. On one group she will write “Thank you” on the receipt and on the other group she will not write “Thank you” on the receipt. Which of these is the control group?
This is called a placebo. Suppose we want to test an herbal supplement to determine if it aided in weight loss. Why would it not be beneficial have two groups in the experiment; one that takes the supplement and a control group that takes nothing? What could be done to remedy this problem? Give one group the supplement and give the other group a pill that is the same size, color, taste, smell, etc. as the supplement, but contains no active ingredient. This is called a placebo. A placebo is something that is identical to the treatment group but contains no active ingredient. See page 66-67 for additional information.
Let’s recap some ideas- Random assignment removes the potential for confounding variables. Blocking uses extraneous variables to create groups (blocks) that are similar. All treatments are then tried in each block. Direct control holds extraneous variables constant so their effects are not confounded with the treatments.
Let’s look at two examples of completely randomized experiments. Experimental Designs Completely randomized design –experimental units are assigned at random to treatments or treatments are assigned at random to trials Let’s look at two examples of completely randomized experiments. Talk through the process of the experimental design. Treatment A Measure response for A Experimental Units Random Assignment Compare treatments Treatment B Measure response for B
Example 1: A farm-product manufacturer wants to determine if the yield of a crop is different when the soil is treated with three different types of fertilizers. Fifteen similar plots of land are planted with the same type of seed but are fertilized differently. At the end of the growing season, the mean yield from the sample plots is compared. Experimental units? Factors? Response variable? How many treatments? Plots of land Type of fertilizer Yield of crop 3
Why is the same type of seed used on all 15 plots? Fertilizer experiment continued: A farm-product manufacturer wants to determine if the yield of a crop is different when the soil is treated with three different types of fertilizers. Fifteen similar plots of land are planted with the same type of seed but are fertilized differently. At the end of the growing season, the mean yield from the sample plots is compared. Why is the same type of seed used on all 15 plots? What are other potential extraneous variables? Does this experiment have a placebo? Explain To control the factor of type of seed. Type of soil; amount of water, sunlight, etc. No, one would compare the three types of fertilizers It is part of the controls in the experiment. Type of soil, amount of water, etc. NO – a placebo is not needed in this experiment
Experiment units? Factors? Response variable? Name the treatments? Example 2: A consumer group wants to test cake pans to see which works the best (bakes evenly). It will test aluminum, glass, and plastic pans in both gas and electric ovens. There are 30 boxes of cake mix to use for this experiment. Experiment units? Factors? Response variable? Name the treatments? Cake mixes Two factors - type of pan (aluminum, glass, and plastic) and type of oven (electric and gas) How evenly the cake bakes Aluminum pan in electric oven, aluminum pan in gas oven, glass pan in electric oven, glass pan in gas oven, plastic pan in electric oven, and plastic pan in gas oven
Cake experiment continued: A consumer group wants to test cake pans to see which works the best (bakes evenly). It will test aluminum, glass, and plastic pans in both gas and electric ovens. There are 30 boxes of cake mix to use for this experiment. Describe how to randomly assign the cake mixes to the treatments so that there is an even number in each treatment. Could we roll a die for each box? If we roll a “1” assign the box to the first treatment (aluminum pan in electric oven). If we roll a 2, assign the box to the 2nd treatment, and so on. This is just one way that you can perform this randomization. Number the boxes of cake mix from 1 to 30. Write the numbers 1 to 30 on identical slips of paper and place into a hat. Mix well. Randomly select 6 numbers from the hat and assign those boxes to the treatment of aluminum pan in electric oven. Randomly select 6 more numbers and assign those boxes to the treatment aluminum pan in gas oven. Continue this process, randomly assigning 6 boxes to each treatment glass pan in electric oven, glass pan in gas oven, and plastic pan in electric oven. The remaining 6 are assigned to plastic pan in gas oven Explain why the rolling of a die would ONLY work if the boxes were put in random order first!
Experimental Designs Continued . . . Units should be blocked on a variable that effects the response!!! 2. Randomized block – units are blocked into groups (homogeneous) and then randomly assigned to treatments Treatment A Measure response for A Block 1 Random Assignment Compare treatments for block 1 Treatment B Measure response for B Compare the results from the 2 blocks Experimental Units Create blocks Treatment A Measure response for A Block 2 Random Assignment Compare treatments for block 2 Treatment B Measure response for B
What can be done to account for this variable? Fertilizer experiment revisited: A farm-product manufacturer wants to determine if the yield of a crop is different when the soil is treated with two different types of fertilizers. Twenty plots of land (10 plots are along a river and 10 plots are away from the river) are planted with the same type of seed but are fertilized differently. At the end of the growing season, the mean yield from the sample plots is compared. Can the experimenter directly control the types of soil in the different plots of land? What can be done to account for this variable? No – they must use the plots that are available They could block by type of land
Fertilizer experiment revisited: Describe how to create the blocks of land and then to randomly assign plots to the 2 types of fertilizer. First create 2 blocks of land. Block 1 would be the 10 plots that are by the river. Block 2 would be the 10 plots away from the river. Number the 10 plots in block 1 from 1 to 10. Write the numbers 1 to 10 on identical slips of paper and place into a hat. Mix well. Randomly select 5 numbers from the hat and assign those boxes to fertilizer A. The remaining 5 are assigned to Fertilizer B. Number the 10 plots in block 2 from 1 to 10. Write the numbers 1 to 10 on identical slips of paper and place into a hat. Mix well. Randomly select 5 numbers from the hat and assign those boxes to fertilizer A. The remaining 5 are assigned to Fertilizer B.
Experimental Designs Continued . . . 3. Matched pairs - a special type of block design where the blocks consist of 2 experimental units that are similar with each being randomly assigned to a treatment OR the block consist of individual units that are assigned both treatments in random order
Explain why this is a matched pairs design. Example 3: Two new word-processing programs are to be compared by measuring the speed with which a standard task can be completed. One hundred volunteers are will perform the same task on each of the programs in random order and their speeds will be measured. Explain why this is a matched pairs design. How could we determine which program the volunteers use first? Each block consist of an individual who will do both treatments Discuss that each block consist of an individual who will do both treatments Could flip a coin, use random numbers (odd & even) etc. We could flip a coin for each volunteer; heads they do program A first, tails they do program B first.
The ONLY way to show a cause-effect relationship is with a well-designed, well-controlled experiment!!!