INTRODUCTION TO RESEARCH METHODS IN ECONOMICS Topic 5 Data Collection Strategies These slides are copyright © 2010 by Tavis Barr. This work is licensed under a Creative Commons Attribution- ShareAlike 3.0 Unported License. See for further information.
Data Collection Strategies ● Types of Data: Cross-Sectional, Time Series, Longitudinal ● Using Secondary Data ● Sampling Methods for Primary Data ● Choosing a Sample Size
Cross-Sectional, Time Series... Cross-Sectional: Each unit of observation is surveyed only once – Advantages: ● Easy to conduct ● Statistical techniques are generally also simpler – Disadvantage: ● Suppose we want to measure the relationship of variable A on variable B ● Are we measuring the effect of variable A, or just the type of observations where A appears?
Cross-Sectional, Time Series.... Cross-Sectional: Each unit of observation is surveyed only once – Example: ● We want to measure how much higher average income in cities is than in rural areas. ● We take a sample of people from cities, and a sample of people from rural areas, and and look at the difference in means. ● Does this show how much someone's income would incease if he moved to a city?
Cross-Sectional, Time Series.... ● Time Series: One observational unit is observed many times ● Examples: Mostly macro, finance – The daily price of a stock over a month – Ethiopia's exports from – The monthly unemployment rate from January 2000 – December 2004
Cross-Sectional, Time Series.... ● Time Series: One observational unit is observed many times ● Advantage: – We have many observations of the same thing – This means we can use inferential statistics for large samples based on one entity
Cross-Sectional, Time Series.... ● Time Series: One observational unit is surveyed several times ● Disadvantages: 1.Samples are usually small – Examples: ● Yearly economic growth for years? 30? ● Stock prices, exchange rates, etc., are usually an exception. But for how long a time period will a stock price obey the same rules?
Cross-Sectional, Time Series.... ● Time Series: One observational unit is surveyed several times ● Disadvantages: 2.Observations are not independent and identically distributed – Examples: ● If economic growth was higher than average in 1997, it was probably higher than average in 1996 ● If a stock was overpriced on January 13 th, it was probably overpriced on January 12 th
Cross-Sectional, Time Series.... ● Time Series: One observational unit is surveyed several times ● Disadvantages: 3. Spurious correlation and regression can arise – Examples: ● We have a time series of the world's annual economic output from 1940 to 2000 ● We have a time series of the number of countries with nuclear weapons from 1940 to 2000 ● Will the two be correlated? Does it mean anything?
Cross-Sectional, Time Series.... ● Longitudinal or Panel Data: Several observational units are surveyed several times ● Examples: – We have output per person for every state in Ethiopia from – We observe 2,000 households every year for 10 years. During that time they have births, deaths, job changes, etc.
Cross-Sectional, Time Series.... ● Longitudinal or Panel Data: Several observational units are surveyed several times ● Advantage: We can control for individual characteristics of the observational unit – Example: We observe people over five years. Some of them move from place to place. – We measure the income difference when a person moves from city to country or country to city
Cross-Sectional, Time Series.... ● Longitudinal or Panel Data: Several observational units are surveyed several times ● Disadvantage: Observations are not independent and identically distributed – Example: ● People who have high incomes in one place will have high incomes in another place ● This means you can predict the person's income one year if you know it in another year ● Statistical techniques have to control for this
Using Secondary Data ● Be sure you have: – A codebook, which describes the relationship between the survey questions and the data on the computer – Examples: ● How are industries categorized? ● Are data top-coded/bottom-coded/excluded? ● Are all questions asked of all people? – A description of the sampling technique, i.e., how the sample was collected
Sampling Methods ● It is ideal to obtain a probability sample, where any member of the population is equally likely to be observed ● Otherwise, we have problems: – The sample mean will not, on average, be equal to the population mean. ● Suppose, for example, that we want to know if men's wages are different from women's wages. ● We take a sample of male and female accounts.
Sampling Methods ● It is ideal to obtain a probability sample, where any member of the population is equally likely to be observed ● Otherwise, we have problems: – The regression coefficient may be incorrect. ● The usual model: y = xb + e ● e reflects random events that affect y ● For example,
Sampling Methods ● Obtaining a probability sample: – Sometimes we have a list of everyone in the population ● Census records of a country ● Registrar's records of a school ● Tax records of registered businesses – Then we can just sort them in random order and pick every 10 th or 50 th or 1000 th record
Sampling Methods ● Sometimes we don't have a list of everyone ● We may know the relative size of different strata or clusters in the population – University students: The size of each school – Country: The size of each district – Businesses: The output of each industry
Sampling Methods ● We may know the relative size of different strata or clusters in the population ● We conduct a probability sample within each stratum. We make sure: Stratum sample size =Stratum population Overall sample sizeOverall population
Sampling Methods ● Sometimes a probability sample is infeasible and we are stuck with a convenience sample – We want to conduct a lengthy experiment using human subjects. We use whoever is willing to participate. – We want to conduct an economic study of drug users. We only have limited information (hearsay, criminal records) of who uses drugs
Sampling methods ● Sometimes a probability sample is infeasible and we are stuck with a convenience sample ● Be sure to write down any characteristics by which the sample differs from the overall population – Who was excluded? – Who was over-represented ● If possible, compare any results with previous results from similar studies
Choosing a Sample Size ● Obviously, the bigger, the better ● Still, the limitations of the sample size will vary depending on what we want to do – Yes/No opinions: a 3 percent margin of error will require about 1,100 observations – Macroeconomic variables: Often highly correlated; many hypotheses can be tested with observations – Regressions using household data: Very noisy; can require thousands of observations
Form of the Sample ● Some possible forms: – In-person interviews – In-person questionnaires – Solicited responses (e.g., by mail) – Telephone interviews – Online interviews
Form of the Sample ● In-person interviews – Advantages: ● Higher response rate ● May be the only way to reach some people, especially poor people ● Can obtain precise answers to technical questions (income, occupation, etc.) – Disadvantages: ● Expensive ● People may refuse to answer private questions
Form of the Sample ● In-Person Questionnaires – Advantages: ● Higher response rate ● May be the only way to reach some people, especially poor people ● Good way to ask very private questions – Disadvantages: ● Expensive ● Assumes literacy
Form of the Sample ● Solicited responses (by mail, handout, etc.) – Advantages: ● Relatively inexpensive ● May get interesting responses – Disadvantages: ● Low response rate; biased toward extreme views ● Assumes literacy
Form of the Sample ● Telephone interviews – Advantages: ● Relatively inexpensive ● Respondents feel some privacy ● Reasonable response rate – Disadvantages: ● Biased toward wealthy in Ethiopia ● In all countries: Landlines biased toward old, cell phones toward young
Form of the Sample ● Online interviews – Advantages: ● Very inexpensive; saves inputting costs as well ● Respondents feel privacy ● Response rate varies by method of solicitation – Disadvantages: ● Very biased toward wealthy in Ethiopia ● Biased toward young everywhere; very poor have less online access in industrialized world