Download presentation
Presentation is loading. Please wait.
1
4.2 Experiments
2
Observational Study versus Experiment
In contrast to observational studies, experiments don’t just observe individuals or ask them questions. They actively impose some treatment in order to measure the response. Definition: An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. An experiment deliberately imposes some treatment on individuals to measure their responses. When our goal is to understand cause and effect, experiments are the only source of fully convincing data. The distinction between observational study and experiment is one of the most important in statistics.
3
Example: Soy good for you?
The November 2009 issue of Nutrition Action discusses what the current research tells us about the supposed benefits of soy. For a long time, scientists have believed that the soy foods in Asian diets explain the lower rates of breast cancer, prostate cancer, osteoporosis, and heart disease in places like China and Japan. However, when experiments were conducted, soy either had no effect or a very small effect on the health of the participants. For example, several different studies randomly assigned elderly women to either soy or placebo, and none of the studies showed that soy was more beneficial for preventing osteoporosis. So, what explains the lower rates of osteoporosis in Asian cultures? We still don’t know. It could be due to genetics, other dietary factors, or any other difference between Asian cultures and non-Asian cultures. Example: Soy good for you?
4
Observational Study versus Experiment
Observational studies of the effect of one variable on another often fail because of confounding between the explanatory variable and one or more lurking variables. Definition: A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable. Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. Well-designed experiments take steps to avoid confounding.
5
Lurking vs. confounding variable:
The difference between the two is in whether or not the variable was considered in the research study. Lurking variable: A variable that is not considered in a research study that could influence the relations between the variables in the study Confounding variable: A variable that is considered in a research study that could influence the relations between the variables in the study A lurking variable is not taken into account by the researchers while a confounding variable is taken into account by the researchers. Both are variables that could influence the relations between the variables of primary interest in a research study. Lurking vs. confounding variable:
6
Lurking vs. confounding variable: Examples
Online Assessments A professor is teaching an online course that requires weekly homework assignments and weekly quizzes. She wants to know if there is a relationship between students’ homework and quiz grades. She makes a graph showing students’ scores on the two assessments. After she starts looking at the data she realizes that the days when students submit their assignments may play a role in their grades on these assignments. Students who wait until the night the assignments are due tend to have lower scores than those who submit a day or two before the deadline. She uses this information to change her analyses to include the submission time. Originally, submission time was a lurking variable because the professor was not including it in her study. Now that she is including this variable in her study it is a confounding variable. Weight Comparisons A student is conducting a research study on differences in body weight between engineering and nursing students at Penn State. His data show that engineering students weigh more than nursing students on average. Then, his advisor points out that Penn State’s College of Engineering is 81.1% male while the Penn State’s College of Nursing is 7.7% male. The student conducts a second research study and includes biological sex. He finds that there is not a difference between engineering and nursing students after controlling for biological sex. In the first study gender was a lurking variable because the student was not taking it into account. In the second study gender was a confounding variable because the student was taking it into account. Note that the results of the study changed when biological sex was taken into account. Lurking vs. confounding variable: Examples
7
Lurking vs. confounding variable:
A confounding variable is one whose effects on the response variable cannot be distinguished from one or more of the explanatory variables in the study. A lurking variable is one whose effects on the response variable cannot be distinguished from one or more of the explanatory variables in the study, AND IS NOT INCORPORATED INTO THE DESIGN OF THE STUDY. The difference between lurking and confounding variables lies in their inclusion in the study. If a variable was measured and included, it's associations between the explanatory and response variables can be determined and (if random assignment was performed) neutralized with methods beyond the AP Syllabus. It is a confounding (or not) variable. The associations between an unmeasured variable and the explanatory and response variables cannot be determined -- whatever its associations are remain a mystery, and it "lurks" beyond the purview of the investigator. Lurking vs. confounding variable:
8
Lurking vs. confounding variables
So, a hint: lurking variables are most common in observational studies. If we are *observing* two variables over time, for example, then an external change (seasonal, political, economic, or whatever) could affect both of them and thereby be a lurking variable. They are much less common in designed experiments because we randomize to avoid such things. Lurking variables will show up only if we fail to randomize completely or correctly. If we are *controlling* a treatment and observing a response over time, an external change wouldn't lurk because it couldn't affect our treatment variable. Confounding, however, can show up in an experiment either through poor design or just because there is no reasonable way to avoid it. Things happen that are not under our control, but might be confounded with the factors, making it impossible to tell whether our treatment or the external change was responsible for the response. The best defense is to record as much supplementary information (temperature, precipitation, economic conditions, whatever might matter) so that these could be used as covariates in our analysis. Even that might not cure the problem because the external variables might still be collinear with our factors and impossible to separate from the predictors we care about. Lurking vs. confounding variables
9
Check your understanding p. 233
Does reducing screen brightness increase battery life in laptop computers? To find out, researchers obtained 30 new laptops of the same brand. They chose 15 of the computers at random and adjusted their screens to the brightest setting. The other 15 laptop screens were left at the default setting – moderate brightness. Researchers then measured how long each machine’s battery lasted. Was this an observational study or an experiment? Justify your answer. Check your understanding p. 233
10
Check your understanding p. 233
Does eating dinner with their families improve students’ academic performance? According to an AB News article, “Teenagers who eat with their families at least five times a week are more likely to get better grades in school.” This finding was based on a sample survey conducted by researchers at Columbia University. Was this an observational study or an experiment? Justify your answer. What are the explanatory and response variables? Explain clearly why such a study cannot establish a cause-and-effect relationship. Suggest a lurking variable that may be confounded with whether families eat dinner together. Check your understanding p. 233
11
The Language of Experiments
An experiment is a statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units) to observe the response. Here is the basic vocabulary of experiments. Definition: A specific condition applied to the individuals in an experiment is called a treatment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables. The experimental units are the smallest collection of individuals to which treatments are applied. When the units are human beings, they often are called subjects. Sometimes, the explanatory variables in an experiment are called factors. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a level) of each of the factors.
12
Example: A louse-y situation
A study published in the New England Journal of Medicine (March 11, 2010) compared two medicines to treat head lice: an oral medication called ivermectin and a topical lotion containing malathion. Researchers studied 812 people in 376 households in seven areas around the world. Of the 185 households randomly assigned to invermectin, 171 were free from head lice after two weeks compared with only 151 of 191 households randomly assigned to malathion. Problem: Identify the experimental units, explanatory and response variables, and the treatments in this experiment. Example: A louse-y situation
13
What is a big advantage of experiments over observational studies?
14
Example: Growing tomatoes
Does adding fertilizer affect the productivity of tomato plants? How about the amount of water given to the plants? To answer these questions, a gardener plants 24 similar tomato plants in identical pots in his greenhouse. He will add fertilizer to the soil in half of the pots. Also, he will water 8 of the plants with 0.5 gallon of water per day, 8 of the plants with 1 gallon of water per day, and the remaining plants with 1.5 gallons of water per day. At the end of three months, he will record the total weight of tomatoes produced on each plant. Problem: Identify the explanatory and response variables and the experimental units, and list all the treatments. Example: Growing tomatoes
15
How to Experiment Badly
Experiments are the preferred method for examining the effect of one variable on another. By imposing the specific treatment of interest and controlling other influences, we can pin down cause and effect. Good designs are essential for effective experiments, just as they are for sampling. Example, page 236 A high school regularly offers a review course to prepare students for the SAT. This year, budget cuts will allow the school to offer only an online version of the course. Over the past 10 years, the average SAT score of students in the classroom course was The online group gets an average score of That’s roughly 10% higher than the long- time average for those who took the classroom review course. Is the online course more effective? Students -> Online Course -> SAT Scores
16
How to Experiment Badly
Many laboratory experiments use a design like the one in the online SAT course example: Experimental Units Treatment Measure Response In the lab environment, simple designs often work well. Field experiments and experiments with animals or people deal with more variable conditions. Outside the lab, badly designed experiments often yield worthless results because of confounding.
17
Example: Does caffeine affect pulse rate?
Many students regularly consume caffeine to help them stay alert. Thus, it seems plausible that taking caffeine might increase an individual’s pulse rate. Is this true? One way to investigate this is to have volunteers measure their pulse rates, drink some cola with caffeine, measure their pulses again after 10 minutes, and then calculate the increase in pulse rate. Unfortunately, even if every student’s pulse rate went up, we couldn’t attribute the increase to caffeine. Why? How can this experiment be improved? Example: Does caffeine affect pulse rate?
18
How to Experiment Well: The Randomized Comparative Experiment
The remedy for confounding is to perform a comparative experiment in which some units receive one treatment and similar units receive another. Most well designed experiments compare two or more treatments. Comparison alone isn’t enough, if the treatments are given to groups that differ greatly, bias will result. The solution to the problem of bias is random assignment. Definition: In an experiment, random assignment means that experimental units are assigned to treatments at random, that is, using some sort of chance process.
19
Example: More Caffeine
Suppose you have a class of 30 students who volunteer to be subjects in the caffeine experiment described earlier. Problem: Explain how you would randomly assign 15 students to each of the two treatments. *When describing your method, use sufficient detail so that two knowledgeable users of statistics could follow your description and carry out the method in exactly the same way. Example: More Caffeine
20
The Randomized Comparative Experiment
Definition: In a completely randomized design, the treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group that receives an inactive treatment or an existing baseline treatment. Group 1 Group 2 Treatment 1 Treatment 2 Compare Results Experimental Units Random Assignment
21
Example: Dueling diets
A health organization wants to know if a low-carb or low-fat diet is more effective for long-term weight loss. The organization decides to conduct an experiment to compare these two diet plans with a control group that is only provided with a brochure about healthy eating. Ninety volunteers agree to participate in the study for one year. Problem: Outline a completely randomized design for this experiment. Write a few sentences describing how you would implement your design. Example: Dueling diets
22
Check your understanding. P. 240
Music students often don’t evaluate their own performances accurately. Can small-group discussion help? The subjects were 29 students preparing the end-of-semester performance that is an important part of their grade. Assign 15 students to the treatment: videotape a practice performance, ask the student to evaluate it, then have the student discuss the tape with a small group of other students. The remaining 14 students form a control group who watch and evaluate their tapes alone. At the end of the semester, the discussion-group students evaluated their final performance more accurately. Outline a completely randomized design for this experiment. Follow the model of Figure 4.4 Check your understanding. P. 240
23
Check your understanding p. 240
Describe how you would carry out the random assignment. Provide enough detail that a classmate could implement your procedure. What is the purpose of the control group in this experiment? Check your understanding p. 240
24
Principles of Experimental Design
Three Principles of Experimental Design Randomized comparative experiments are designed to give good evidence that differences in the treatments actually cause the differences we see in the response. Principles of Experimental Design Control for lurking variables that might affect the response: Use a comparative design and ensure that the only systematic difference between the groups is the treatment administered. Random assignment: Use impersonal chance to assign experimental units to treatments. This helps create roughly equivalent groups of experimental units by balancing the effects of lurking variables that aren’t controlled on the treatment groups. Replication: Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups.
25
Example: The Physicians’ Health Study
Read the description of the Physicians’ Health Study on page Explain how each of the three principles of experimental design was used in the study. A placebo is a “dummy pill” or inactive treatment that is indistinguishable from the real treatment.
26
Example: More caffeine
Explain how to use all three principles of experimental design in the caffeine experiment. Example: More caffeine
27
Experiments: What Can Go Wrong?
The logic of a randomized comparative experiment depends on our ability to treat all the subjects the same in every way except for the actual treatments being compared. Good experiments, therefore, require careful attention to details to ensure that all subjects really are treated identically. A response to a dummy treatment is called a placebo effect. The strength of the placebo effect is a strong argument for randomized comparative experiments. Whenever possible, experiments with human subjects should be double-blind. Definition: In a double-blind experiment, neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received.
28
Example: A more expensive placebo?
In a study reported by the New York Times on March 5, (“More Expensive Placebos Bring More Relief”), researchers discovered that placebos have a stronger effect when they are perceived to be more expensive The study had volunteers rate the pain of an electric shock before and after taking a new medication. However, half of the subjects were told the medication cost $2.50 per dose, while the other half were told the medication cost $0.10 per dose. In reality, both medications were placebos, and both had a strong effect. Of the “cheap” placebo users, 61% experienced pain relief, while 85% of the “expensive” placebo users experienced pain relief. The researchers suggested that people are accustomed to paying more for better medications, which may account for the difference in response. As with any placebo, it’s all about the expectations of the subjects. Example: A more expensive placebo?
29
Check your understanding p. 244
In an interesting experiment, researchers examined the effect of ultrasound on birth weight. Pregnant women received an ultrasound; the second group did not. When the subjects’ babies were born, their birth weights were recorded. The women who received the ultrasounds had heavier babies. Did the experimental design take the placebo effect into account? Why is this important? Was the experiment double-blind? Why is this important? Based on your answers to Questions 1 and 2, describe an improved design for this experiment. Check your understanding p. 244
30
Against All Odds: Designing Experiments
31
Inference for Experiments
In an experiment, researchers usually hope to see a difference in the responses so large that it is unlikely to happen just because of chance variation. We can use the laws of probability, which describe chance behavior, to learn whether the treatment effects are larger than we would expect to see if only chance were operating. If they are, we call them statistically significant. Definition: An observed effect so large that it would rarely occur by chance is called statistically significant. A statistically significant association in data from a well-designed experiment does imply causation.
32
Activity: Distracted Drivers
Is talking on a cell phone while driving more distracting than talking to a passenger? Read the Activity on page 245. Perform 10 repetitions of your simulation and report the number of drivers in the cell phone group who failed to stop Teacher: Right-click (control-click) on the graph to edit the counts. In what percent of the class’ trials did 12 or more people in the cell phone group fail to stop? Based on these results, how surprising would it be to get a result this large or larger simply due to chance involved in random assignment? Is this result statistically significant?
33
Blocking Completely randomized designs are the simplest statistical designs for experiments. But just as with sampling, there are times when the simplest method doesn’t yield the most precise results. Definition A block is a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a randomized block design, the random assignment of experimental units to treatments is carried out separately within each block. Form blocks based on the most important unavoidable sources of variability (lurking variables) among the experimental units. Randomization will average out the effects of the remaining lurking variables and allow an unbiased comparison of the treatments. Control what you can, block on what you can’t control, and randomize to create comparable groups.
34
Example: Comparing chocolate chip cookies
Ann is an avid baker who would like to compare two different chocolate chip cookie recipes (A and B). So she recruits 10 volunteer taste testers to rate each type of cookie on a scale from 1 (very bad) to 5 (very good). She will make 10 of each type of cookie, for a total of Each cookie tray will hold only 10 coolies, so she will use two trays and back them at the same time in the same oven, one sheet on the lower rack and one sheet on the upper rack. Because cookies might bake differently depending on which rack they are on, we will use the 10 locations on the lower-rack cookie sheet as one block and the 10 locations on the upper-rack cookie sheet as the other block. On each of the sheets, Anne will randomly place 5 of each type of cookie. This way, each type of cookie will have 5 on the lower rack and 5 on the upper rack balancing out the effect of the rack location. Note that in this experiment, the locations in the oven are the experimental units and the recipes are the treatments. Here is a diagram: Example: Comparing chocolate chip cookies
35
Example: Better texting?
A cell phone company is considering two different keyboard designs (A and B) for its new line of cell phones. Researchers would like to conduct an experiment using subjects who are frequent texters and subjects who are not frequent texters. The subjects will be asked to text several different messages in 5 minutes. The response variable will be the number of correctly typed words. Explain why a randomized block design might be preferable to a completely randomized design for this experiment. Outline a randomized block experiment using 100 frequent texters and 200 novice testers. Example: Better texting?
36
Matched-Pairs Design Definition
A common type of randomized block design for comparing two treatments is a matched pairs design. The idea is to create blocks by matching pairs of similar experimental units. Definition A matched-pairs design is a randomized blocked experiment in which each block consists of a matching pair of similar experimental units. Chance is used to determine which unit in each pair gets each treatment. Sometimes, a “pair” in a matched-pairs design consists of a single unit that receives both treatments. Since the order of the treatments can influence the response, chance is used to determine with treatment is applied first for each unit.
37
Blocking vs. Matched Pairs
38
Example: Standing and Sitting Pulse Rate
Consider the Fathom dotplots from a completely randomized design and a matched-pairs design. What do the dotplots suggest about standing vs. sitting pulse rates?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.