Presentation is loading. Please wait.

Presentation is loading. Please wait.

Producing Data: Samples and Experiments

Similar presentations


Presentation on theme: "Producing Data: Samples and Experiments"— Presentation transcript:

1 Producing Data: Samples and Experiments
Chapter 5

2 Simple Random Sample number the population
use a method to randomly select the desired sample size from entire population Advantages: every member of population always has equal chance of being selected Disadvantages: sample may not be representative of population; difficult with large populations

3 Cluster Random Sample divide population into clusters
use a method to randomly select one or more clusters use a method to randomly select from the chosen clusters Advantages: can work well if population is easy to divide or there are established clusters Disadvantages: not everyone has equal chance of being chosen; selected clusters may not be representative of population

4 Stratified Random Sample
divide population into strata use a method to randomly select a sample from each strata Advantages: guarantees representation from each strata Disadvantages: not everyone has equal chance of being chosen; strata (of interest) may be difficult to determine; population may be difficult/laborious to sort

5 Systematic Random Sample
use sample size and population size to determine (estimate) “magic number” use a method to randomly select number using “magic number” as range; add to determine corresponding selections Advantages: allows rapid method to select from large population; helps provide representation throughout population Disadvantages: not everyone has equal chance of being chosen; sample may not be representative

6 Multi-Stage Random Sample
use a method (SRS, cluster, stratified) to randomly select (large) groups use a method (SRS, cluster, stratified to randomly select (smaller) groups repeat until participants are chosen

7 Role of Sampling Design
Statistical inference provides ways to answer specific questions from data with some guarantee that the answers are good ones. Statistical inference will be inaccurate if the method of collecting data is flawed.

8 Other Sampling Designs
Suppose the principal is interested in finding out if Austin High students think more trees should be planted. He makes an announcement and instructs students to come by his office to let him know if tree planting is an issue they support. Will this sample of students give him an accurate picture of all students feelings at Austin High?

9 Other Sampling Designs
A voluntary response sample consists of people who choose themselves by responding to a general appeal. Voluntary response samples over represent people with strong opinions.

10 Other Sampling Designs
The principal is surprised to find most of the students coming in his office are in favor of the tree planting. Feeling that maybe his design may not have worked, he ventures into the hallways and starts asking students randomly. Will this sample of students give him an accurate picture of all students feelings at Austin High?

11 Defining Important Terms
population sample sample design good: simple random sample, cluster, stratified, systematic poor: voluntary response, convenience sampling bias A poor design systematically favors certain outcomes or results.

12 Random-random sample practice
simple random sample convenience sample cluster sample voluntary response systematic sample stratified sample Austin High seniors UT alumni Blender magazine subscribers Texans national pet stores Austin middle students

13 Cautions about sample surveys
Suppose we use a random sample in a survey, what could confound our results? undercoverage the issue occurs when a sampling design misses a part of the population nonresponse the issue occurs when a significant part of the population refuses to participate in the survey

14 Cautions about sample surveys
response bias the issue occurs when the person asking the question makes the respondent uncomfortable and possibly influence their answer wording of questions the issue occurs when a question is leading and attempts to persuade a respondent toward a particular answer Remember: sample results sometimes simply do not necessarily match the population.

15 Identify potential problems
To obtain a sample of households, a television rating service dials numbers taken at random from telephone-directories. Teen magazine sent a mail-in questionnaire to 500 randomly selected subscribers. One of the questions was the following: “Knowing that the cover price would likely increase, would you prefer the number of advertisements in the magazine to be limited.?”

16 Identify potential problems
For a survey of student opinions about high school athletic programs, a member of the school board obtains a random sample of students by listing all high school students and using a random number table to select 30 of them. After making phone calls last weekend, she notes six of the students said that they didn’t have time to participate in the survey.

17 Role of mathematics in sampling
Results will differ from sample to sample. This phenomenon is called sampling variability. Since we deliberately use chance, the results obey the laws of probability allowing fairly consistent results (within a margin of error). The degree of accuracy can be improved by increasing the size of the sample.

18 Designing Experiments: vocab
Vocabulary shift from algebra to statistics algebra statistics Independent  Explanatory variable Dependent  Response variable Explanatory variable also called a “factor.”

19 Example for vocabulary check
A corporation found that technology trainings were often stressful to their employees. One idea was to play background music (jazz or classical). Another idea was to have the presenter and participants dress casual rather than the usual business attire. Equivalent technology trainings over the next year were randomly assigned a particular condition. A post training survey was given to measure the stress associated with each training.

20 Example for vocabulary check
No music Business attire Casual attire Classical music Jazz music Jazz Music Factors: music, attire Levels: music (3), attire (2) Treatments: 6

21 Discussion example 1 One school board member noticed that students in band tended to be in the top 25% of their school. She compiled a list from each high school’s band director and took a random sample of 25 students from each school’s band. She then took a random sample of 25 students from each high school that wasn’t in band. She found a slightly higher average G.P.A. of student’s in band.

22 Discussion example 1 Will this study give evidence that being in band causes an increase in a students G.P.A? Will this study help her generalize that student’s in band tend to have a slightly higher G.P.A. than students not in band?

23 Vocabulary from example 1
Observational study a study based on data collected from individuals that meet a determined criteria Lurking variable an outside factor that is not the explanatory nor response variable prevents causal relationships from being established in observational studies

24 Discussion example 2 Another school board member is surprised the increase is so slight. First, he s each band director and asks for a list of 30 students. He then accesses each high school’s roster takes the first 40 listed striking any student’s name has already has. He found the average G.P.A. of student’s in band to be more significant than the first study.

25 Discussion example 2 Will this study give evidence that being in band causes an increase in a students G.P.A? Will this study help her generalize that student’s in band tend to have a slightly higher G.P.A. than students not in band?

26 Discussion example 3 Walmart is considering buying a gasoline additive that is suppose to improve gas mileage. They found 30 employees in Texas that drive the same car. Fifteen employees are randomly selected to receive the additive, the remaining fifteen are given a bottle with just gas. Each employee is given a set route around the city to drive. The gas mileage is recorded by an onboard computer which shows the additive gives the driver 12% better gas mileage.

27 Discussion example 3 Will this study give evidence that using the additive will give a car better gas mileage?

28 Vocabulary from example 3
Experiment a planned study where deliberate conditions are imposed to see how the response variable will change Confounding variable a variable associated (noncausal) with the explanatory variable that affects the response variable in some way makes it difficult to tell if the treatment or the confounding variable affected the response variable significantly

29 Lurking versus confounding
Observation study Experiment ? x y x y ? z z Lurking Confounding

30 Randomized comparative experiments
Goal of an experiment: collect statistically significant evidence for a cause-and-effect relationship. The success of an experiment depends on our ability to treat all the experimental units identically except for the actual treatment.

31 Example A baby-food producer claims that her product is superior to that of her leading competitor, in that babies gain weight faster with her product. For the experiment, 30 healthy babies are randomly selected. Using a diagram, outline an experiment.

32 Completely Randomized Design
Group 1 15 babies Treatment 1 Her product Compare weight gain Random Allocation Group 2 15 babies Treatment 2 Competitor’s Babies will be numbered 01 to 30. Using a random number table, the first 15 selected will be in Group 1 with the remaining placed in group 2. Each babies’ weight will be measured in pounds and compared.

33 Example We wish to determine whether or not a new type of fertilizer is more effective than the type currently in use. Researchers have subdivided a 20-acre farm into twenty 1-acre plots. Wheat will be planted on the farm, and at the end of the growing season the number of bushels harvested will be measured. Produce a diagram of the experiment.

34 Randomized Design Random Allocation Group 1 10 acres Group 2
Treatment 1 New type Treatment 2 Current Compare bushels Land plots will be numbered 01 to 20. Using a random number table, the first 10 selected will be in Group 1 with the remaining placed in group 2. The bushels of wheat from each plot will be counted and compared.

35 An example of a good design?
In order to test the effectiveness of nicotine patches, Dr. Hurt recruited 240 smokers at various locations. Volunteers were to receive a 22-mg nicotine patch for eight weeks. Almost half (46%) of the nicotine group had quit smoking at the end of the study. Confounding variable: placebo effect

36 Principles of Experimental Design
Control: using comparison ensures that outside factors other than the experimental treatments operate equally on all groups. Randomization: use of impersonal chance in order equalize unanticipated factors so that groups that should be similar in all respects. Replication: perform the experiment on as many subjects to reduce chance variation in the results.

37 Design Example You are participating in the design of a medical experiment to investigate whether a calcium supplement in the diet will reduce the blood pressure of middle-aged men. Preliminary work suggests that calcium may be effective and that the effect may be greater for African-American men than for white men. Describe a completely randomized design.

38 Design example Treatment 1 Calcium Group 1 Compare blood pressure Random Assignment Treatment 2 Placebo Group 2 What potential problems might be have because we started with random assignment? How should we alter our experiment?

39 Block Design Completely randomized experiment
African American men Completely randomized experiment All participants Completely randomized experiment White men

40 Block Design Treatment 1 Calcium Group 1 Group 2 African American men
Random assignment Treatment 2 Placebo Compare blood pressure Subjects Group 3 Group 4 Treatment 1 Calcium Random assignment White men Treatment 2 Placebo All African American men will be assigned a random number. Half the men who have the smallest numbers will be assigned group 1, the half with the largest numbers will be assigned group 2. The process will repeat for the white men. The reduction in blood pressure will be compared.

41 Improving the Design A block is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. Block design has the same rationale as a stratified random sample. Blocks allow us to reduce the amount of variation to improve the accuracy of our conclusions by creating homogeneous groups. single blind versus double blind

42 Design Example Is the right hand of a right-handed people generally stronger than the left? Paul Murky of Murky Research designs an experiment to test this question. He fastens an ordinary bathroom scale to a shelf five feet from the floor, with the end of the scale projecting out from the shelf. Subjects squeeze the scale between their thumb and their fingers on the top. The scale reading in pounds measures hand strength. Is a completely randomized experiment appropriate?

43 Matched pair Design Group 1 Treatment 1 left hand Treatment 2
right hand Compare difference Random Allocation Group 2 Treatment 2 right hand Treatment 1 left hand A coin will be flipped to decide which hand will be measured first by each participant. Heads will squeeze the left hand first, tails will squeeze the right hand first. The different in the pounds on the scale will be compared.

44 Improving the Design In a matched pair design, each subject in the experiment will receive two (and only two) treatments. The order that each subject receives both treatments is randomly selected to preserve the important aspect of randomization.

45 Why a simulation? A simulation is using a model to imitate a chance behavior based on a specific problem situation. A simulation allows a model to be analyzed when a theoretical probability is unknown or indeterminate.

46 Elements of a simulation
Number assignment Description of a trial Stopping rule Execution of simulation (marking of the number line) Documentation of results

47 Simulation Example Traffic Lights: Coming to school each day, Anne rides through three traffic lights, A, B, and C. The probability that any one light is green is 0.3, and the probability that it is not green is Use a simulation to answer questions below. We must assume that the lights operate independently. Estimate the probability that Anne will find all traffic lights to be green. Estimate the probability that Anne will find at least one light to be not green.

48 Simulation Example Number assignment
0 – 2 green light; 3 – 9 not green (1 – 3 green light; 4 – 0 not green) Description of a trial/Stopping rule A trial consists of choosing one digit at a time to represent one traffic light. After we determine if the light is green or not green, the trial ends after three lights. Execution of simulation Documentation of results

49 Simulation Example three green lights two or fewer
three green lights two or fewer


Download ppt "Producing Data: Samples and Experiments"

Similar presentations


Ads by Google