Download presentation
Presentation is loading. Please wait.
Published byAustin Watson Modified over 9 years ago
2
Chapter 2 Sampling and Measurement PopulationSample Statistics Inference Parameters
3
Chapter 2 Objectives 1. Identify the different types of variables 2. Choose a simple random sample, stratified random sample, cluster sample, and systematic random sample in a variety of situations. 3. Given a survey sample, determine whether the sample is a simple random sample, a stratified sample, a cluster sample, or a systematic sample. 4. Identify the different types of sampling bias
4
Section 2.1 Variables and Their Measurement The characteristics recorded about each item in a sample or population are called variables.
5
Variable Types 1. Categorical (Qualitative) Variables Data that categorizes Ex. Male/female, Democrat/Republican, yes/no, Chevy/Buick/Pontiac/Oldsmobile, Awful/Fair/Good/Very Good/Excellent 1a) Nominal: categorizes only Buick, Chevy, Ford 1b) Ordinal: categories can be ranked or ordered taste test; order of finish in a race
6
Variable Types-2 n Wendy’s is developing a new hamburger. A panel of taste-testers evaluates the new item. Categories:Excellent Very Good Good Poor Gag Ordinal - there is a natural ranking
7
Variable Types-3 n Wendy’s is developing a new hamburger. A panel of taste-testers evaluates the new item. Categories:Excellent = 5 Very Good = 4 Good = 3 Poor = 2 Gag = 1 Ordinal - there is a natural ranking
8
Variable Types-4 1. Quantitative data Data that is measured on a numerical scale Ex. height, GPA, income, temperature, SAT 2a) interval data no meaningful zero point; difference between 2 values meaningful; cannot meaningfully multiply or divide Ex. temperature, SAT
9
Variable Types-5 Ex. (cont.) 60 o F not twice as warm as 30 o F; the difference between 32 o and 30 o same as difference between 83 o and 81 0, 2 degrees in each case. (No meaningful “zero”; 0 degrees not the absence of all heat) n Ratio data zero point meaningful; can multiply and divide Ex. income, height, GPA, pulse rate; $200 is twice as much as $100; $0 is the absence of all money
10
We collect these data from 50 students. Which variable is categorical? A. Eye color B. Head circumference C. Hours of homework last week D. Number of TV sets in home
11
Registration and Records collects data on NCSU students. Which one of the following is quantitative? 1. Class ( freshman, sophomore, etc.) 2. Grade point average 3. Whether the student took an AP class 4. Whether the student has taken the SAT
12
Why Do We Care About the Type of Variable We Have? n Summaries of categorical data: Proportions, counts, tables, bar charts n Example: student opinion of quality of NCSU campus food. Excellent: 10%, Very Good: 12% Good: 25%, Fair: 35%, Poor: 18% n Summaries of quantitative data: Averages, medians, stand. dev., histograms Example: maximum speed (mph) of 198 roller coasters from around the world. average: 57.1, median: 55.9, standard deviation: 18.5 mph The type of data we have dictates the statistical procedures (graphics, summaries, inference techniques) that we can use.
13
Data: values and their context 815, 930, 750, 919 What can you do with these? Find the sum? Find the average? Seems reasonable if these are, for example, SAT scores. BUT these are telephone area codes! Adding and averaging make no sense.
14
Know the context of the data n Who: items included in the data n What: variable(s) measured on each item n Why: purpose for collecting the data -------------------------------- n Where: location(s) where data collected n When: last week? 1 year ago? last decade? n How: internet survey? (worthless); data provided by gov’t agency? (useful)
15
Section 2.2 Randomization and Sample Surveys Producing Valid Data “If you don’t believe in random sampling, the next time you have a blood test tell the doctor to take it all.”
16
Convenience sampling: Just ask whoever is around. –Example: “Man on the street” survey (cheap, convenient, often quite opinionated or emotional => now very popular with TV “journalism”) n Which “men”, and on which street? –Ask about gun control or legalizing marijuana “on the street” in Berkeley or in some small town in Idaho and you would probably get totally different answers. –Even within an area, answers would probably differ if you did the survey outside a high school or a country western bar. n Bias: Opinions limited to individuals present. Sampling methods
17
Voluntary Response Sampling: n Individuals choose to be involved. These samples are very susceptible to being biased because different people are motivated to respond or not. Often called “public opinion polls.” These are not considered valid or scientific. n Bias: Sample design systematically favors a particular outcome. Ann Landers summarizing responses of readers 70% of (10,000) parents wrote in to say that having kids was not worth it—if they had to do it over again, they wouldn’t. Bias: Most letters to newspapers are written by disgruntled people. A random sample showed that 91% of parents WOULD have kids again.
18
CNN on-line surveys (voluntary response sampling): Bias: People have to care enough about an issue to bother replying. This sample is probably a combination of people who hate “wasting the taxpayers money” and “animal lovers.”
19
Landon(R) Beats Roosevelt(D)?? The Survey (1/8) 1936, Literary Digest mailed 10 million questionnaires and received 2.4 million. Used names from the phone book The Results Landon leads 57% to 43% The Problem Only high income earners could afford a phone. This was a very biased survey (sampling bias, nonresponse bias) Literary Digest soon went out of business.
20
Summary of Types of Bias n Bias can destroy our ability to gain insights from our sample: –Nonresponse bias can arise when sampled individuals will not or cannot respond. –Response bias arises when respondents’ answers might be affected by external influences, such as question wording or interviewer behavior.
21
Summary of Types of Bias-2 n Poor results can also arise from sampling bias: –Voluntary response samples are almost always biased and should be avoided and distrusted. –Convenience samples are likely to be flawed for similar reasons. –Undercoverage occurs when individuals from a subgroup of the population are selected less often than they should be.
22
Bias-Avoid It!! Bias is the bane of sampling—the one thing above all to avoid. There is usually no way to fix a biased sample and no way to salvage useful information from it.
23
To be useful, a sample should be representative, meaning that characteristics of interest in the population can be estimated from the sample with a known degree of accuracy. To achieve this goal, we select individuals for the sample at random. The value of deliberately introducing randomness is one of the great insights of Statistics. http://abcnews.go.com/blogs/politics/polls/
24
Randomize Randomization can protect you against factors that you know are in the data. –It can also help protect against factors you are not even aware of. Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. –Randomizing makes sure that on the average the sample looks like the rest of the population –Randomizing enables us to make rigorous probabilistic statements concerning possible error in the sample.
25
Simple Random Samples n Desire the sample to be representative of the population from which the sample is selected n Each individual in the population should have an equal chance to be selected n Is this good enough?
26
Example Select a sample of high school students as follows: 1. Flip a fair coin 2. If heads, select all female students in the school as the sample 3. If tails, select all male students in the school as the sample Each student has an equal chance to be in the sample Every sample a single gender, not representative Each individual in the population has an equal chance to be selected. Is this good enough? NO!!
27
Simple Random Sample n A simple random sample (SRS) of size n consists of n units from the population chosen in such a way that every set of n units has an equal chance to be the sample actually selected.
28
Simple Random Sample-2 Suppose a large History class of 500 students has 250 male and 250 female students. To select a random sample of 250 students from the class, I flip a fair coin one time. If the coin shows heads, I select the 250 males as my sample; if the coin shows tails I select the 250 females as my sample. What is the chance any individual student from the class is included in the sample? This is a random sample. Is it a simple random sample? 1/2 NO! Not every possible group of 250 students has an equal chance to be selected. Every sample consists of only 1 gender – hardly representative.
29
Simple Random Sample-3 The easiest way to choose an SRS is with random numbers. Statistical software can generate random digits (e.g., Excel “=random()”, ran# button on calculator).
30
Example: selecting a simple random sample n Academic dept wishes to randomly choose a 3-member committee from the 28 members of the dept 00 Abbott07 Goodwin14 Pillotte21 Theobald 01 Cicirelli08 Haglund15 Raman22 Vader 02 Crane09 Johnson16 Reimann23 Wang 03 Dunsmore10 Keegan17 Rodriguez24 Wieczoreck 04 Engle11 Lechtenb’g 18 Rowe25 Williams 05 Fitzpat’k12 Martinez19 Sommers26 Wilson 06 Garcia13 Nguyen20 Stone27 Zink
31
Example: selecting a simple random sample - solution Use a random number table; read 2-digit pairs until you have chosen 3 committee members For example, start in row 121: 71487 09984 29077 14863 61683 47052 62224 51025 Garcia (07) Theobald (22) Johnson (10) Your calculator generates random numbers; you can also generate random numbers using Excel
32
Sampling Variability Suppose we had started in line 145? 19687 12633 57857 95806 09931 02150 43163 58636 Our sample would have been 19 Rowe, 26 Williams, 06 Fitzpatrick
33
Sampling Variability Samples selected at random generally differ from one another. Each selection of random numbers selects different people for our sample. These differences lead to different values for the variables we measure. We call these sample-to-sample differences sampling variability. Variability is OK; bias is bad!!
34
Other Probability Sampling Methods 1. Stratified Random Sampling 2. Cluster Random Sampling 3. Systematic Random Sampling 4. Multistage Sampling – combinations of sampling methods
35
n This sampling procedure separates the population into mutually exclusive sets (strata), and then selects simple random samples from each stratum. Sex Male Female Age under 20 20-30 31-40 41-50 Occupation professional clerical blue-collar Stratified Random Sampling
36
H With this procedure we can acquire information about –the whole population –each stratum –the relationships among strata. Stratified Random Sampling-2
37
Stratified Random Sampling-3 There are several ways to build a stratified random sample. For example, keep each stratum’s proportion in the sample equal to the stratum’s proportion in the population. A sample of size 1,000 is to be selected Stratum Income Population proportion 1 under $15,000 25% 250 2 15,000-29,999 40% 400 3 30.000-50,00030%300 4over $50,000 5% 50 Stratum size Total 1,000
38
Cluster Random Sampling Sometimes stratifying isn’t practical and simple random sampling is difficult. Splitting the population into similar parts or clusters can make sampling more practical. Each cluster should be a miniature version of the entire population. Then we could select one or a few clusters at random and select a simple random sample from each chosen cluster. This sampling design is called cluster random sampling. If each cluster fairly represents the full population, cluster random sampling will give us an unbiased sample.
39
Cluster Random Sampling Useful When… 1. It is difficult and costly to develop a complete list of the population members (making it difficult to develop a simple random sampling procedure.) e.g., all items sold in a grocery store 2. T he population members are widely dispersed geographically. e.g., all Toyota dealerships in North Carolina
40
Mean length of sentences in our course text We would like to assess the reading level of our course text based on the word-length of the sentences. Simple random sampling would be awkward: number each sentence in the book? Better way: choose a few pages at random (the pages are the clusters, and it's reasonable to assume that each page is representative of the entire text). count the length of the sentences on those pages or select a simple random sample of the sentences from each cluster (i.e. in each randomly chosen page)
41
Cluster sampling - not the same as stratified sampling!! We stratify to ensure that our sample represents different groups in the population, and sample randomly within each stratum. –Each stratum is homogenous (e.g., male stratum, female stratum) but strata differ from one another Clusters are more or less alike, each heterogeneous and resembling the overall population. We choose cluster random sampling to make sampling more practical or affordable. We conduct a census on or select a SRS from each selected cluster.
42
Systematic Random Sampling Sometimes we draw a sample by selecting individuals systematically. For example, you might survey every 10th person on an alphabetical list of students. To make it random, you must still start the systematic selection from a randomly selected individual. The order of items on the list should not be associated with the variables being measured. Systematic sampling can be much less expensive than true random sampling.
43
Systematic Random Sampling-example You want to select a sample of 50 students from a college dormitory that houses 500 students. 1.On a list of all students living in the dorm, number the students from 001 to 500. 2.Generate a random number between 001 and 010, and start with that student. 3.Every 10th student in the list becomes part of your sample. For example: 3, 13, 23, 33, 43, 53, …, 493. Questions: 1) does each student have an equal chance to be in the sample? 2) what is the chance that a student is included in the sample? 3) is this an SRS? Yes 1/10 No
44
Multistage Sampling Sometimes we use a variety of sampling methods together. Sampling schemes that combine several methods are called multistage samples. Most surveys conducted by professional polling organizations and government agencies use some combination of stratified and cluster sampling as well as simple random sampling.
45
Example: The American Community Survey The American Community Survey (ACS) is an ongoing survey … information from the survey generates data that help determine how more than $400 billion in federal and state funds are distributed each year. … combined into statistics that are used to help decide everything from school lunch programs to new hospitals. http://www.census.gov/acs/www/ http://www.census.gov/acs/www/
46
Mean length of sentences in our course text-continued. In attempting to assess the reading level of our course text: we might worry that it starts out easy and gets harder as the concepts become more difficult we want to avoid samples that select too heavily from early or from late chapters Suppose our course text has 5 sections, with several chapters in each section.
47
Mean length of sentences in our course text-continued. We could: i) randomly select 1 chapter from each section ii) randomly select a few pages from each of the selected chapters iii) select a simple random sample of sentences from each chosen page. So what is our sampling strategy? i) we stratify by section of the book ii) we randomly choose a chapter to represent each stratum (book section) iii) within each chapter we randomly choose pages as clusters iv) finally, we choose an SRS of sentences within each cluster
48
Opinion Polling: What’s Wrong Lately? Prediction slippage: 2012 US presidential election (correct winner but not very accurate) Recent inaccurate predictions: 2014 US midterms 2014 Scottish independence referendum 2015 UK election 2015 Israeli general election 2015 Greek bailout vote
49
Response Rates Declining
50
Contacting People-Extremely Difficult
51
Contacting People-Extremely Difficult - 2 1.Robo-calls (auto-dialed) calls to cellphones NOT ALLOWED 2.To obtain between 700 and 1,000 cellphone interviews when response rate is 8%, approx. 10,000 cellphone numbers must be manually dialed – budget buster!
52
Non-Probability Sampling n The high cost of obtaining data has driven survey firms to the internet. n Non-probability sampling: participants are chosen or choose themselves so that the chance of being selected is not known. –Major problems with internet polls –No one has figured out how to select a representative sample of internet users
53
85% of US Adults Use the Internet * Blogs (e.g. Blogger, Wordpress), Microblogs (e.g. Twitter), Social networking (e.g. Facebook), Content sharing/discussion (YouTube, Reddit)
54
Non-Probability Sampling: Opt-In Online Panels n YouGov – what the world thinks https://today.yougov.com/about/about-the-yougov-panel/
55
Non-Probability Sampling: Many Online Data-Gathering Services (Free, Pay) n Google Consumer Surveys Google Consumer Surveys n Google Trends Google Trends n Google Analytics Google Analytics n Twitter Analytics Twitter Analytics n Facebook Analytics Facebook Analytics n Microsoft Microsoft n Yahoo Yahoo n Amazon Amazon
56
Example: ViralHeat (fee-based)
57
The Billion Prices Project @ MIT n http://bpp.mit.edu/ http://bpp.mit.edu/ Aggregates millions of daily e-commerce transactions into a real-time price index for US, China, and ten other countries
58
End of Chapter 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.