SAMPLING
Basic concepts Why not measure everything? – Practical reason: Measuring every member of a population is too expensive or impractical – Mathematical reason: Random sampling allows us to test hypotheses using inferential (probability) statistics Population – Largest group to which we intend to project the findings of a study (e.g., every inmate in Jay’s prison) – Parameter: A statistic of the population; e.g., mean sentence length Sample – Any subgroup of the population, however selected – Samples intended to represent a population must be selected in a way to make them “representative” (will come up later) Unit of analysis – “Persons, places, things or events” under study – The “container” for the variables “Member” or “element” of the population – What we call a case once it’s been drawn into a sample Sampling frame – A listing of all “elements” or members of the population Probability sampling – “Gold standard” - every element (“case”) in the population has the same chance of being included in the sample – Random sampling is the most common probability technique Population Sample Jay’s correctional institution
Sampling accuracy and error Representativeness: Samples should accurately reflect, or represent, the population from which they are drawn – If a sample is representative, then we can accurately “make inferences” (apply our findings) to the population We can simply describe the population Or we can test hypotheses and extend our findings to the population – Warning: we cannot generalize to other populations – only the population from which the sample was drawn Sampling error: Unintended differences between a population parameter and the equivalent statistic from an unbiased sample – Inevitable result of sampling – Try it out in class! Calculate the parameter, mean age. Then take a random sample (more about that later) and compare it to the sample statistic. – Any difference between the two is “sampling error.” It should decrease as sample size increases – Rule of thumb: To minimize sampling error sample size should be at least 30 for populations up to about 500; for larger populations sample size should be greater
RANDOM (PROBABILITY) SAMPLING
Sampling process Sample with or without replacement? – With replacement: Return each case to the population before drawing the next Keeps the probability of being drawn the same Makes it possible to redraw the same case – Without replacement: Drawn cases are not returned to the population Probability of undrawn cases being selected increases as cases are drawn – In social science research sampling without replacement is by far the most common Most sampling frames are sufficiently large so that as elements are drawn changes in the probability of being drawn are small Sample: simple or stratified? (examples on next two slides) – In simple random sampling we randomly draw from the entire population – In stratified random sampling we divide the population into subgroups according to a characteristic of interest For example, male and female; officers and supervisors; violent offenders and property offenders – Can designate strata before or after sampling Proportionate: Draw a sample from the population without regard to strata, then stratify Disproportionate (most common): Stratify first, then draw samples of equal size from each stratum
Population: 200 inmates Mean sentence: 2.94 years Exercise - using simple random sampling to describe a population Assignment Draw a random sample of 10 and compare its mean to the population parameter. Then do the same with a random sample of 30. How much error is there? Does it change with sample size? Sentence length in years Frequency (# prisoners) Data from Jay’s correctional center Koko Wachtel, warden
Property crimes: 150 Mean sentence: 2.88 Violent crimes: 50 Mean sentence: 3.12 Population 200 inmates; mean sentence 2.94 years Assignment Draw a random sample of 30 from each stratum and compare its mean to the corresponding population parameters. How much error is there? Exercise - using stratified random sampling to describe a population
Exercise - using random sampling to test a hypothesis Hypothesis: A pre-existing personal relationship between criminal and victim is more likely in violent crimes than in crimes against property You have full access to crime data for Sin City. These statistics show that in 2014 there were 200 crimes, of which 75 percent were property crimes and 25 percent were violent crimes. For each crime, you know whether the victim and the suspect were acquainted (yes/no). Applying what we learned from the preceding two slides… 1. Identify the population. 2. How would you sample? A. Would you stratify before or after? B. Which is better? Why?
Sin City Police Department has 200 officers; 150 are male and 50 are female. We wish to test the above hypotheses. 1. Identify the population. 2.How would you sample? A.Would you stratify? In advance or later? B.Which is better? Why? Exercise: Using random sampling to test hypotheses Hypothesis 1 : Gender affects cynicism (two-tailed) Hypothesis 2 : Male cops are more cynical than female cops (one-tailed)
Sampling in experiments Making cops “kinder” and “gentler” The Anywhere Police Department has 200 patrol officers, of which 150 are males and 50 are females. Chief Jay wants to test a program that’s supposed to reduce officer cynicism. Hypothesis: Officers who complete the training program will be less cynical Dependent variable: Score on cynicism scale (1-5, low to high) Independent variable: Cynicism reduction program (yes/no)
population: 200 patrol officers 150 males (75%)50 females (25%) Apply the intervention (apply the value of the independent variable – the program.) NO YES YES NO CONTROL GROUP Randomly Assign 25 Officers CONTROL GROUP Randomly Assign 25 Officers EXPERIMENTAL GROUP Randomly Assign 25 Officers EXPERIMENTAL GROUP Randomly Assign 25 Officers For each group, pre-measure dependent variable officer cynicism Hypothesis: officers who complete the training program will be less cynical For each group, post-measure dependent variable officer cynicism Also compare within-group changes – what do they tell us? Stratified disproportionate random sampling
OTHER SAMPLING TECHNIQUES
Quasi-probability sampling Systematic sampling – Randomly select first element, then choose every 5 th, 10 th, etc. depending on the size of the sampling frame (number of cases or elements in the population) – If done with care can give results equivalent to fully random sampling – Caution: if elements in the sampling frame are ordered in a particular way a non-representative sample might be drawn Cluster sampling – Method Divide population into equal-sized groups (clusters) chosen on the basis of a neutral characteristic Draw a random sample of clusters. The study sample contains every element of the chosen clusters. – Often done to study public opinion (city divided into blocks) – Rule of equally-sized clusters usually violated – The “neutral” characteristic may not be so and affect outcomes! – Since not everyone in the population has an equal chance of being selected, there may be considerable sampling error
Non-probability sampling Accidental sample – Subjects who happen to be encountered by researchers – Example – observer ride-alongs in police cars Quota sample – Elements are included in proportion to their known representation in the population Purposive/“convenience” sample – Researcher uses best judgment to select elements that typify the population – Example: Interview all burglars arrested during the past month Issues – Can findings be “generalized” or projected to a larger population? – Are findings valid only for the cases actually included in the samples?
PRACTICAL EXERCISE
Class assignment - non-experimental designs Hypothesis: Higher income persons drive more expensive cars - Income Car Value Independent variable: income – Categorical, nominal: student or faculty/staff Dependent variable: car value – Categorical, ordinal: 1 (cheapest), 2, 3, 4 or 5 (most expensive) Assignment – Visit one faculty and one student lot. – Select ten vehicles in each lot using systematic sampling – Use the operationalized car values to code each car’s value – Give each team member a filled-in copy and turn one in per team next week – We will complete the tables in class – This assignment is worth five points PLEASE BRING THESE FORMS TO EVERY CLASS SESSION!