Making inferences from collected data involve two possible tasks: Estimation: Use sample data to infer population parameter e.g., lifetime risk of being a victim of a violent crime according to NCVS data Hypothesis Testing: Use data to make a decision about the correctness of some hypothesis or prediction e.g., whether civil orders of protection really lower recurrent violence against spouses
Both tasks rely on using Samples to make statements about populations: A limited number of cases selected to represent the larger population of data points Key Terms/Ideas in Sampling: Representativeness degree to which sample is an exact replica in miniature of the population Sampling Error degree to which sample statistic deviates from population value Sampling Method procedure used to draw cases from the population of data points
Two main types of sampling methods: Probability Sampling Selection where each data point has a known probability for being selected into the sample Simple Random sample every data point has an equal likelihood of being selected Other types of probability samples? Systematic Stratified Weighted Cluster Doesn’t guarantee representativeness each time
Two main types of sampling methods: Non-probability Sampling: Selection procedure in which probability of selection is unknown Specific types of Non-probability samples? Accidental Convenience Purposive Snowball Volunteer No guarantee of representativeness
Why use one sample method versus another? Maximize representativeness of data Minimize sampling error and bias in data Valid use of inferential statistics with data (which mostly assumes simple random sampling)
Making inferences from sample statistics involves 3 distributions: Sample distribution: observed in cases from which data were collected Population distribution: unobserved in population from which cases drawn Sampling distribution: unobserved but calculable distribution of statistics for samples of same size/type as ours (drawn from the same population) This distribution is the key to making inferences
“Sampling Distribution”: what is it? A hypothetical population of samples (and sample statistics) from drawn from the same population Has a describable theoretical distribution (based on repeatedly drawing a sample an infinite number of times) Has certain parameters determined by the population from which the sample is drawn and the size of the sample (denoted as n)
e.g.: If we draw a sample of 25 cases and compute the sample mean The sample mean has a theoretical sampling distribution whose characteristics are exactly determined by the distribution of the population (μ & σ) and by the sample size (n=25) The mean of the sampling distribution = the mean of the population In this case: the σ of the sampling distribution = σ/5 (i.e., one-fifth the σ of the population)
Important features of Sampling distributions: If the variable is normally distributed in the population, then the sampling distribution of sample means will also be normal The mean of the sampling distribution = the mean of the population The σ of the sampling distribution = σ/√n Use this information to compute the likelihood of any sample mean being drawn from the population (using the standard normal [z] table)
Additional Important features of Sampling distributions: The σ of the sampling distribution will always be smaller than the σ of the population The mean of the sampling distribution will always be the population mean The sampling distribution will become more Normal as the sample size gets larger – no matter the distribution of the population! [this is called the Central Limit Theorem]
Using Sample statistics to make inferences about population parameters: The best estimate of the population mean is the sample mean The sample estimate of σ is slightly too low; it needs to be adjusted to be accurate estimate Thus there are two different formulas for the sample variance/standard deviation: (descriptive) (inferential)
Basic Steps in Estimating Population Parameters: Select valid estimator (unbiased, consistent, and efficient) Select valid data sample Corresponds to population of interest Random sample Complete (no censoring or omissions) Variables measured with least possible error Compute value of statistical estimate Compute confidence interval (i.e., plausible margin of sampling error)
Two Approaches to estimation: Point Estimation: Use sample data to infer exact value of population parameter Highly likely to be wrong or off-mark to some degree e.g., infer that 30% of adults will be victims of violent crime in their lifetimes (could actually be 35% or 25%) Interval Estimation: Instead use sample data to compute a range of values (“confidence intervals”) within which the actual parameter is located (with some calculated margin of certainty or confidence) Yields more approximate but more plausible (or confident) estimates.
Confidence Interval Estimation: Compute the sample mean Compute the sample standard error From the population (σ) From the sample (s or ) Compute the confidence interval or