Sampling Design.

1 Sampling Design

2 How do we gather data? Surveys Opinion polls Interviews Studies
Observational Retrospective (past) Prospective (future) Experiments

3 Population the entire group of individuals that we want information about

4 Census a complete count of the population

5 How good is a census? Do frog fairy tale . . .
The answer is 83!

6 Why would we not use a census all the time?
Not accurate Very expensive Perhaps impossible If using destructive sampling, you would destroy population Breaking strength of soda bottles Lifetime of flashlight batteries Safety ratings for cars Look at the U.S. census – it has a huge amount of error in it; plus it takes a long time to compile the data making the data obsolete by the time we get it! Since taking a census of any population takes time, censuses are VERY costly to do! Suppose you wanted to know the average weight of the white-tail deer population in Texas – would it be feasible to do a census?

7 Sample A part of the population that we actually examine in order to gather information Use sample to generalize to population

8 Sampling design refers to the method used to choose the sample from the population

9 Sampling frame a list of every individual in the population

10 Random Rectangles Estimate the average area of the rectangles

11 Judgmental Sample

12 Simple Random Sample (SRS)
Suppose we were to take an SRS of 100 THS students – put each students’ name in a hat. Then randomly select 100 names from the hat. Each student has the same chance to be selected! Not only does each student have the same chance to be selected – but every possible group of 100 students has the same chance to be selected! Therefore, it has to be possible for all 100 students to be juniors in order for it to be an SRS! consist of n individuals from the population chosen in such a way that every individual has an equal chance of being selected every set of n individuals has an equal chance of being selected

13 Stratified random sample
Homogeneous groups are groups that are alike based upon some characteristic of the group members. Suppose we were to take a stratified random sample of 100 THS students. Since students are already divided by grade level, grade level can be our strata. Then randomly select 50 juniors and randomly select 50 seniors. population is divided into homogeneous groups called strata SRS’s are pulled from each strata

14 Systematic random sample
Suppose we want to do a systematic random sample of THS students - number a list of students (There are approximately 2000 students – if we want a sample of 100, 2000/100 = 20) Select a number between 1 and 20 at random. That student will be the first student chosen, then choose every 20th student from there. select sample by following a systematic approach randomly select where to begin

15 Cluster Sample based upon location
Suppose we want to do a cluster sample of THS students. One way to do this would be to randomly select 10 classrooms during 2nd period. Sample all students in those rooms! based upon location randomly pick a location & sample all there

16 Multistage sample To use a multistage approach to sampling THS students, we could first divide 2nd period classes by level (AP, Honors, Advanced, etc.) and randomly select 4 second period classes from each group. Then we could randomly select 5 students from each of those classes. The selection process is done in stages! select successively smaller groups within the population in stages SRS used at each stage

17 SRS Advantages Disadvantages Unbiased Easy Large variance
May not be representative Must have sampling frame (list of population)

18 Stratified Disadvantages Advantages
More precise unbiased estimator than SRS Less variability Cost reduced if strata already exists Disadvantages Difficult to do if you must divide stratum Formulas for SD & confidence intervals are more complicated Need sampling frame

19 Systematic Random Sample
Advantages Unbiased Ensure that the sample is distributed across population More efficient, cheaper, etc. Disadvantages Large variance Can be confounded by trend or cycle Formulas are complicated

20 Cluster Samples Advantages Disadvantages Unbiased Cost is reduced
Sampling frame may not be available (not needed) Disadvantages Clusters may not be representative of population Formulas are complicated

21 Convenience Sample Advantages Sample individuals who are convenient
Disadvantages Not representative of population Considered a bad sampling method

22 Judemental Sample Advantages
The approach is well understood and has been refined by experience over many years; The auditor is given an opportunity to bring his judgement and expertise into play. Disadvantages It is unscientific; It is wasteful and usually too large samples are selected; You cannot extrapolate the results to the population as a whole as the samples are not representative; Personal bias in selecting the sample is unavoidable

23 Identify the sampling design
1)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, etc). Then they randomly selected 3 colleges from each group. Stratified random sample

24 Identify the sampling design
2) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks. Cluster sampling

25 Identify the sampling design
3) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave. Systematic random sampling

26 Bias ERROR favors certain outcomes
Anything that causes the data to be wrong! It might be attributed to the researchers, the respondent, or to the sampling method!

27 Sources of Bias things that can cause bias in your sample
cannot do anything with bad data

28 Remember – the way to determine voluntary response is:
People chose to respond Usually only people with very strong opinions respond An example would be the surveys in magazines that ask readers to mail in the survey. Other examples are call-in shows, American Idol, etc. Remember, the respondent selects themselves to participate in the survey! Remember – the way to determine voluntary response is: Self-selection!!

29 Convenience sampling Ask people who are easy to ask
The data obtained by a convenience sample will be biased – however this method is often used for surveys & results reported in newspapers and magazines! Ask people who are easy to ask Produces bias results An example would be stopping friendly-looking people in the mall to survey. Another example is the surveys left on tables at restaurants - a convenient method!

30 Undercoverage People with unlisted phone numbers – usually high-income families some groups of population are left out of the sampling process People without phone numbers –usually low-income families Suppose you take a sample by randomly selecting names from the phone book – some groups will not have the opportunity of being selected! People with ONLY cell phones – usually young adults

31 Nonresponse Because of huge telemarketing efforts in the past few years, telephone surveys have a MAJOR problem with nonresponse! occurs when an individual chosen for the sample can’t be contacted or refuses to cooperate telephone surveys 70% nonresponse People are chosen by the researchers, BUT refuse to participate. NOT self-selected! This is often confused with voluntary response! One way to help with the problem of nonresponse is to make a follow-up contact with the people who are not home when you first contact them.

32 Response bias Suppose we wanted to survey high school students on drug abuse and we used a uniformed police officer to interview each student in our sample – would we get honest answers? occurs when the behavior of respondent or interviewer causes bias in the sample wrong answers Response bias occurs when for some reason (interviewer’s or respondent’s fault) you get incorrect answers.

33 Wording of the Questions
The level of vocabulary should be appropriate for the population you are surveying Questions must be worded as neutral as possible to avoid influencing the response. wording can influence the answers that are given connotation of words use of “big” words or technical words – if surveying Podunk, TX, then you should avoid complex vocabulary. – if surveying doctors, then use more complex, technical wording.

34 Source of Bias? 1) Before the presidential election of 1936, FDR against Republican ALF Landon, the magazine Literary Digest predicting Landon winning the election in a 3-to-2 victory. A survey of 10 million people. George Gallup surveyed only 50,000 people and predicted that Roosevelt would win. The Digest’s survey came from magazine subscribers, car owners, telephone directories, etc. Undercoverage – since the Digest’s survey comes from car owners, etc., the people selected were mostly from high-income families and thus mostly Republican! (other answers are possible)

35 Convenience sampling – easy way to collect data
2) Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at SMU. You collect register receipts for students as they leave the bookstore during lunch one day. Convenience sampling – easy way to collect data or Undercoverage – students who buy books from on-line bookstores are included.

36 (other answers are possible)
3) To find the average value of a home in Plano, one averages the price of homes that are listed for sale with a realtor. Undercoverage – leaves out homes that are not for sale or homes that are listed with different realtors. (other answers are possible)


38 #4 Population: US Citizens Parameter: Type of content that bothers most on TV Sampling Frame: All US adults Sample: 1423 randomly selected citizens Method: Not clear Bias: probabily not biased. Conclusions could be generalized. #5 Population: Adults Parameter: Proportion who think drinking and riving is a serious problem. Sampling Frame: Bar Patrons Sample: Every 10th person leaving the bar Method: Systematic Sampling Bias: Those interviewed had just left the bar. They probably think drinking and driving is less of a problem than do adults in general.


40 This is a multi-stage design, with a cluster sample at the first stage and a simple random sample for each cluster. b) If any of the three churches you pick at random is not representative of all churches, then you’ll introduce sampling error by the choice of that church.


42 #14 A) This is a multistage design, with one day picked at random as a cluster, then five boats picked as clusters within that day and finally a census taken for each boat. B) If the day is not representative of all fishing days, that will introduce sampling error. If any of the five boats they pick at random are not representative of the types and amounts of fish taken by all boats, then that will introduce sampling error by the choice of that boat. #15A) This is a systematic sample. B) It is likely to be representative of those waiting for the roller coaster. Indeed, it may do quite well if those at the front of the line respond differently (after their long wait) than those at the back of the line. C) The sampling frame is patrons willing to wait for the roller coaster on that day at that time. It should be representative.



45 #26A) Mean gas mileage for the last six fill-ups.
B) Mean gas mileage for the vehicle. C) Conditions of late may not be typical. D) Mean gas mileage for all cars of this make and model.


47 #32A) Petition may bias people to say they support the playground
#32A) Petition may bias people to say they support the playground. Also, many may not be home on Saturday afternoon. B) If the food at the largest cafeteria is representative, this should be ok. However, those who really don’t like the food won’t be eating there.

48 Frog Fairy Tale Printed below is a story which can be used to demonstrate the effectiveness of a census. Assume that the letter "G" or "g" is a defective product caused by the Gremlin, and that you are the inspector. Allow yourself about 3 minutes to count all the G's or g's. Place your total at the bottom of the story.

49 While strolling through a glen, a giddy English girl tripped on a rather large, almost gigantic frog. The girl staggered but regained her footing and was about to go on when the frog began to speak and gesticulate to gain the girl's attention. "I have not always been a frog," he croaked. The frog's green coloring seemed to glow brightly as he continued. "I was once a gracious knight. A gentleman called Gallant George Grenville, but was changed into this ghastly frog you now see by an ungodly, magical genie. The spell can only be broken if I gain a girl's good graces and spend a night in her garden." The agog girl was skeptical, of course. She gazed at the frog's pleading eyes and soon her doubts gave way to her giddy nature. Giggling, she decided to grant the frog's wish and took him home straightway, putting him by her garden gate. That night the girl slept grandly and sure enough, when she awoke the following morning, there alongside her garden gate was the gracious knight, George Grenville. Well, strangely enough, for a long, long time the girl's mother did not believe that story.

