Feb. 6 Statistic for the day: Number of Florida high school students who take physical education courses online: 1204 Assignment: Continue to review for test on Monday! These slides were created by Tom Hettmansperger and in some cases modified by David Hunter
Friday, Feb. 6 Review Exam #1 (100 points) Monday, Feb 9 in class 60 Multiple choice questions Responsible for Anything in lecture (except SFD) Anything in book Chapts 1,4,5,7,8,9 Bring ID! Bring pencils! Bring 1 sheet of notes!
2 Types of studies to obtain data relevant to your research: Randomized Experiment Randomized Experiment Observational Study Observational Study
Literary Digest Survey Results: 2.4 million responded! 2.4 million responded! 43% were for Roosevelt 43% were for Roosevelt Literary Digest predicted a landslide victory for Alf Landon Literary Digest predicted a landslide victory for Alf Landon
Turning Data into Information: The distribution of the data The shape of the distribution The shape of the distribution Is it skewed or is it symmetric? What is a typical value? What is a typical value? Should we use the mean or the median? What is the spread of the distribution? What is the spread of the distribution? Should we use the standard deviation or the interquartile range? What are the quartiles?
Mean vs. Median: Which is more “typical” in this (right-skewed) case?
Age at Death of English Rulers 60, 50, 47, 53, 48, 33, 71, 43, 65, 34, 56, 59, 49, 81, 67, 68, 49, 16, 86, 67 Turn these data into information.
Shape: Stem and Leaf Display
The Median and the Quartiles * ** *** Q1 M Q3 The first quartile is the number that divides the data into the first quarter and the last three quarters. The median divides the data into halves. (5)
5 Number Summary Median M = 54.5 Median M = 54.5 First Quartile Q1 = 47.5 First Quartile Q1 = 47.5 Third Quartile Q3 = 67 Third Quartile Q3 = 67 Lowest = 16 Lowest = 16 Highest = 86 Highest = 86
Anatomy of a Boxplot
Shape: Histogram
Rough way to approximate the standard deviation: Look at the histogram and estimate the range of the middle 95% of the data. The standard deviation is about ¼ of this range
Research Question 1: How high should I build my doorways so that 99% of the people will not have to duck? Secondary Question 2: If I built my doors 75 inches (6 feet 3 inches) high, what percent of the people would have to duck? (Assume normal distribution with mean 68, st. dev. 4)
Z-Scores: Measurement in Standard Deviations Given the mean (68), the standard deviation (4), and a value (height say 75) compute This says that 75 is 1.75 standard deviations above the mean. Z = (75-mean) / SD = (75-68) / 4 = 1.75
Morals of the story: Whenever you meet a graph that is very far from square, it is likely to produce an impression different from what you would have obtained from the data themselves. Almost any graph in which the vertical scale does not start at zero is deceptive.
BAD
Bogus vertical scale. Hard to say what the graph should look like.
Portion of income taken by the government. Top: spending equal to the income in western states. Bottom: more densely populated east.
A perplexing polling paradox People generally believe the results of polls. People generally believe the results of polls. People do not believe in the scientific principles on which polls are based People do not believe in the scientific principles on which polls are based According to Gallup, most Americans said that a survey of 1500 to 2000 respondents (a larger- than-average sample size for national polls) CANNOT represent the views of all Americans.
How are Gallup Opinion Polls Taken? Telephone interviews: Random digit dialing Telephone interviews: Random digit dialing At random pick At random pick Exchange (area code + first three digits; e.g., ) Next two digits eg. 22 Last two digits eg. 11 Up to three callbacks (why callbacks?) Up to three callbacks (why callbacks?) Evenings and weekends Evenings and weekends This catches unlisted numbers This catches unlisted numbers
Designed to be a random sample from the POPULATION of people with telephones. All members of the population are equally likely to be in the sample. Called a SIMPLE RANDOM SAMPLE. Polls typically take roughly 1500 or 1600 people.
We generally will NOT have the benefit of a histogram to get the standard deviation or the margin of error of the sample percentage. SECRET FORMULA FOR THE MARGIN OF ERROR OF A SAMPLE PERCENTAGE: Square root of sample size Margin of error: 2 standard deviations
The Morning After Pill The Morning After Pill YesNo Not sure 59.1%37.1%3.8% Do you think that the ‘morning-after’ contraceptive pill should be available over the counter? USA Today call-in poll (
Volunteer response vs. volunteer sample Contraceptive call-in poll? Volunteer sample! 1936 Literary Digest poll? Volunteer response! Which is worse? Volunteer sample!
Do you have a tattoo? YesMenNoMenYesWomenNoWomen 15%85%23%77% Based on: 100 men 136 women Stat100.2 S04
Sampling methods (Simple) random sampling (Simple) random sampling Stratified random sampling Stratified random sampling Cluster sampling Cluster sampling Systematic sampling Systematic sampling Bad: Haphazard or convenience sampling (as in tattoo survey) Bad: Haphazard or convenience sampling (as in tattoo survey)
Stratified random sampling Divide population into subgroups, or strata Divide population into subgroups, or strata From each stratum, select a random sample From each stratum, select a random sample Example: Select a random sample from each of four groups of students (in-state non-minority, in-state minority, out-of-state non-minority, out-of-state minority) to ensure adequate representation of each group.
Cluster sampling Divide population into subgroups, or clusters Divide population into subgroups, or clusters Select a random sample of clusters Select a random sample of clusters Measure individuals within selected clusters according to some plan Measure individuals within selected clusters according to some plan Example: To study high schoolers, first take a random sample of schools and then look in depth at all students in selected schools
Systematic sampling From a list of individuals in the population, select every k th individual From a list of individuals in the population, select every k th individual Grizzly example: “Decimation”, a term originally used for a punishment for mutinous Roman legions in which the legion was lined up and every tenth person killed.
Comparisons Randomized Experiments Observational Studies EXPLANATORY VARIABLE says which population we sampled from. RESPONSE VARIABLE says what we measured or counted.
The key to a good observational study or a good randomized experiment is RANDOMIZATION in both cases. In observational studies we need a random sample from each population. In randomized experiments we must randomize the subjects to the different treatments (or treatment and control groups).
Randomized Experiment Associated concepts and ideas: Control group (provides a benchmark) Blinding: single or double (reduce bias) Placebo (benchmark, blinding) Confounding (a lurking third variable) Pairing or blocking (reduces noise in data)
The Hawthorne effect Imagine the following study, intended to determine the prevalence of cheating: Individual students taking an exam in a particular course are filmed and observed closely by a team of extra observers, who then record the number of instances of cheating they observe. Named for Elton Mayo’s famous study ( ) of workers at the Hawthorne, Illinois plant of the Western Electric Company
What sort of a study could be used to answer this? Observational Study? Randomized Experiment? If we cannot establish cause and effect, perhaps we can we establish an association between cell phones and cancer using an observational study. Research question: Do cell phones cause cancer?
Possible Observational Study: Response Variable: whether or not a subject gets cancer. Explanatory Variable: whether or not the subject uses a cell phone. This may require a very long time.
A special kind of observational study: SWITCH RESPONSE AND EXPLANATORY VARIABLES Response Variable: whether a subject uses a cell phone or not Explanatory Variable: whether a subject has cancer or not. 1.Select a sample of cancer patients (Cancer Case) 2.Develop a group of people who match the cancer patients but do not have cancer. (Control) 3.Compute the % who use cell phones in each group. Called a retrospective Case-Control Study
Research question: How does putting a smiley face on the bill influence a waitperson’s tip? Response variable: Size of tip Response variable: Size of tip Explanatory variable: Smiley face or not Explanatory variable: Smiley face or not Interacting variable: Sex of waitperson Interacting variable: Sex of waitperson Female waitress: Drawing a smiley face increased tip Female waitress: Drawing a smiley face increased tip Male waiter: Drawing a smiley face decreased tip Male waiter: Drawing a smiley face decreased tip Source: Journ. Appl. Soc. Psych, 1996