Lecture 6: Primary Data Collection and Sampling Research Methods I Lecture 6: Primary Data Collection and Sampling
Primary Data There are various methods for collecting primary (original) data For example: questionnaire, survey, interview, observation Control over investigation much greater Can more easily avoid “data-driven” research Cost can be prohibitive Pilot studies can be very helpful
Choice of method Shipman: choice often between sampling and case study Intensive versus extensive research design Qualitative versus quantitative data Interpretivists favour the former; positivists favour the latter All primary research involves selection Most methods require sampling
Sampling: general principles No a priori superiority of any method Trade-offs: standardisation versus control, generalisability versus flexibility Shipman: sampling method used dependent on nature of study undertaken Basis for sample must be transparent Cost of surveying entire population is prohibitive (e.g. census) Constraint of feasibility
Sampling: definitions Population: must be defined Finite population: e.g. voters Sampling unit: single potential member of sample Sampling frame: list of sampling units (NB 1936 US Presidential election) Sample: drawn from sampling frame
Probability Sampling Probability of each sampling unit being chosen is known (often equal probability) Simple random sampling: classic method, regarded as most reliable, least biased List numbered sampling frame members and select via random number generator Other probabilistic methods are available
Systematic sampling List members of sampling frame Choose first sample member randomly Then choose every Kth unit, where K=N/n More convenient than SRS for large popn Can be a systematic pattern in sample list, leading to bias; e.g. corner shops
Stratified sampling Divide population into groups of alike members Strata sizes usually proportionate to popn Draw randomly from groups Cost effective Ensure representativeness Can lead to excessive number of sub-groups
Cluster Sampling Select large groups Select sampling units from clusters randomly Example: take a city, divide into areas, number areas, select areas randomly, number units within areas, select units randomly Very cost-effective Very good if sampling frame poorly defined
Non-probability Sampling Convenience sampling: select whoever is available Quota sampling: collect data according to proportions of the population Selection of subjects absolutely crucial Requires great skill of interviewers Snowball sampling: select next subject from previous subject
Non-Probability Sampling Theoretical sampling: select those most likely to be affected by an issue Can ignore things which do not fit Can interpret observations according to the theory Non-prob sampling cannot claim representativeness as easily but gives much more discretion and control
Response Rates Another possible trade-off is on response rates R = 1 - (n-r)/n Even if initial sample size is appropriate (n’ = n/(1+(n/N)) where n = s2/SE2: see F-N and N: 194-9) response rates can be low Postal questionnaires: typically 20-40% Non-response bias
Response Rates Non-respondents could affect findings If reason for non-response is related to issue: e.g. reluctance to interview drunks hampers study on alcoholism Response rate can be improved by cover letter, callbacks, skill of researcher, length of questionnaire, types of question
Conclusions All types of primary data require selection If sampling used: various methods possible Sampling method relates to research tool Different data collection techniques: questionnaires, interviews, etc. - all to be studied in Research Methods 2 - all have advantages and disadvantages