Sampling Population: The overall group to which the research findings are intended to apply Sampling frame: A list that contains every “element” or member of the population Sample: A subset of population “elements” or members Once they’re in a sample each population member or “element” is called a “case” Why do we sample? To describe characteristics of a population, or test a hypotheses when measuring each member is too expensive or impractical To test a hypotheses using inferential (probability) statistics
Sampling error Statistic A mathematical description of any characteristic that can be measured; say, the mean (arithmetic average) A mathematical way to describe the interplay between two or more characteristics; say, the correlation coefficient (r) Summary statistic: A statistic that gives an overall measure; say, the mean Population parameter A statistic of the population; say, the mean Sample statistic Same, of a sample Sampling error: Unintended differences between a population parameter and a sample statistic Inevitable result of sampling Try it out in class! Get population parameter for mean age. Sample, then compare. These differences, or errors, decrease as the size of the sample increases Rule of thumb For populations of moderate size (up to 500 or so) sample size should be at least 30; for larger populations sample size should be larger
Sampling accuracy Samples should accurately reflect, or represent , the population from which they are drawn If a sample is representative, then we can generalize (apply, infer) our findings from the sample to the population That’s why we’re studying “inferential statistics” Warning: We can never generalize to other populations There are various sampling techniques Probability sampling: Each element or “case” in the population has exactly the same chance to be selected for the sample “Gold standard” to which all sampling techniques aspire If the sampling frame is 5, the first element’s probability of being selected is 1/5 (.20) on the first draw
Sample with or without replacement? Sampling with replacement Return each element to the population before drawing the next This keeps the probability of being drawn the same but makes it possible to redraw the same element If the sampling frame (population size) is 5, on the second and subsequent draws the probability of being drawn remains 1/5 (.20) Sampling without replacement: During selection, drawn elements are not returned to the population If the sampling frame (population size) is 5, on the second draw, each remaining element’s probability of being drawn is ¼ (.25). On the final draw it’s 1/1 (1.0) In social science research sampling without replacement is by far the most common Most sampling frames are sufficiently large so that as elements are drawn changes in the probability of being drawn are vey small
Probability sampling exercises Every element has an equal chance of being selected Data from Jay’s correctional center Sheba Wachtel, warden
Simple random sampling Population: 200 inmates Mean sentence: 2.94 years Draw a sample of 30 and compare the population parameter and sample statistic. How much error is there? A population “distribution”
Stratified random sampling Divide population into strata (categories), then randomly sample from each In proportionate sampling the number of elements drawn from each stratum is proportionate to that stratum’s representation in the population Problem: if some strata have few elements their sample size may become unacceptably small unless we take very large samples from other strata Property crimes: 150 Mean sentence : 2.88 Violent crimes: 50 Mean sentence: 3.12 Draw proportionate samples and compare their statistics to the population. How much error is there?
Using descriptive statistics to answer simple research questions Does the gender of patrol officers affect cynicism? Is there more likely to be a personal relationship between suspect and victim in violent crimes or in crimes against property? Can training reduce officer cynicism?
Does the gender of patrol officers affect cynicism? Stratified proportionate random sampling Does the gender of patrol officers affect cynicism? Sin City 200 patrol officers 150 male (75 %) 50 female (25 %) randomly select 30 officers expect 22.5 males expect 7.5 females Compare average cynicism scores Is there a problem? (Hint: how many females in the sample?
Disproportionate sampling Draw the same number of elements from all strata, at least thirty (30) regardless of a strata’s representation in the population You are then free to compare the summary statistic, say, mean, between the strata Keep in mind that you can’t combine the cases from these strata into a single sample, then use a characteristic, say, the mean, to represent the population Why? Because the statistic will be skewed or biased in the direction of the strata that were oversampled
Sin City 200 patrol officers Stratified disproportionate random sampling Sin City 200 patrol officers 150 male (75 %) 50 female (25 %) randomly select 30 cases from each category 30 males 30 females Compare average cynicism scores Note: don’t combine these into a single sample!
Sampling exercise - Sin City Is there more likely to be a personal relationship between suspects and victims in violent crimes or in crimes against property? You have full access to crime data for “Sin City” in 2009. These statistics show there were 200 crimes, of which 75 percent were property crimes and 25 percent were violent crimes. For each crime, you know whether the victim and the suspect were acquainted (yes/no). 1. Identify the population. 2. How would you sample? 3. Would you stratify? How? 4. Use proportionate and disproportionate techniques. Which is better? Why?
randomly select 30 cases (15% of the population) Stratified proportionate random sampling Is there more likely to be a personal relationship between suspect and victim in violent crimes or in crimes against property? Sin City 200 crimes in 2004 50 violent (25 %) 150 property (75 %) randomly select 30 cases (15% of the population) (expect 7.5 violent – 25%) (expect 22.5 property – 75%) Compare proportions of these cases where suspects knew the victim
randomly select 30 cases from each category Stratified disproportionate random sampling Sin City 200 crimes in 2003 50 violent (25 %) 150 property (75 %) randomly select 30 cases from each category 30 property 30 violent Compare proportions within each where suspect and victim were acquainted (Note: cannot combine results)
Jay’s cynicism reduction program Does Jay’s goofy training program reduce officer cynicism? The Anywhere Police Department has 200 patrol officers, of which 150 are males and 50 are females. Jay wants to conduct an experiment using control groups to test his program. 1. Identify the population. 2. How would you sample? 3. Would you stratify? How? 4. Is it better to use proportionate or disproportionate techniques. Why?
Does Jay’s goofy training program reduce officer cynicism? Stratified disproportionate random sampling Does Jay’s goofy training program reduce officer cynicism? population: 200 patrol officers 150 males (75%) 50 females (25%) CONTROL GROUP Randomly Assign 25 Officers CONTROL GROUP EXPERIMENTAL For each group, pre-measure dependent variable officer cynicism Apply the intervention (adjust the value of independent variable – Jay’s program.) NO YES YES NO For each group, post-measure dependent variable officer cynicism Compare within-group changes – what do they tell us?
Quasi-probability sampling Systematic sampling Randomly select first element, then choose every 5th, 10th, etc. depending on the size of the sampling frame (number of cases or elements in the population) Problem: Sampling list that is ordered in a particular way could result in a non-representative sample Cluster sampling Method Divide population into equal-sized groups (clusters) chosen on the basis of a neutral characteristic Draw a random sample of clusters. The study sample contains every element of the chosen clusters. Often done to study public opinion (city divided into blocks) Rule of equally-sized clusters usually violated The “neutral” characteristic may not be so and affect outcomes! Since not everyone in the population has an equal chance of being selected, there may be considerable sampling error
Non-probability sampling Accidental sample Subjects who happen to be encountered by researchers Example – observer ride-alongs in police cars Quota sample Elements are included in proportion to their known representation in the population Purposive/“convenience” sample Researcher uses best judgment to select elements that typify the population Example: Interview all burglars arrested during the past month Issues Can your findings be “generalized” or projected to a larger population? Are your findings valid only for those actually included in your samples?