Download presentation
Presentation is loading. Please wait.
1
Continuous Surveys: Statistical Challenges and Opportunities Carl Schmertmann Center for Demography & Population Health Florida State University schmertmann@fsu.edu
2
Outline CHALLENGES (long) Increased Temporal Complexity Increased Sampling Error New Weighting Problems OPPORTUNITIES (brief, but important)
3
Sample Size Comparison US CENSUS LONG FORM: --- 17% / decade ACS ROLLING SURVEY: 2 per 1000 Households / month 24per 1000 Households / year 240per 1000 Households / decade --- 24% / decade
4
Sampling Differences over Decade Long FormACS Sample Size≈ 17%≈ 24% Taken on…1 day3650 days Released as…1 dataset10+ datasets Simultaneous 100% count? YESNO
5
1. Temporal Complexity Long FormACS Sample Size≈ 17%≈ 24% Taken on…1 day3650 days Released as…1 dataset10+ datasets Simultaneous 100% count? YESNO 1. Temporal Complexity
6
What is the Population? 1-Day Census Population membership is binary: {0,1} Each individual is IN or OUT Continuous Survey Population membership is fuzzy: 0 --------------- + ---------------1 Individuals can be MORE IN (more person-days of residence) or MORE OUT (fewer) 1. Temporal Complexity
7
JFMAMJJASOND● Type A10 120 Type B222210 22264 ●12 20 12 184 Residents (in 000s)
8
1. Temporal Complexity JFMAMJJASOND● Type A10 120 Type B222210 22264 ●12 20 12 184 Residents (in 000s) Census Population = 12 000 (83% Type A)
9
1. Temporal Complexity JFMAMJJASOND● Type A10 120 Type B222210 22264 ●12 20 12 184 Residents (in 000s) An ACS ‘Data Sandwich’ includes samples from all months
10
1. Temporal Complexity JFMAMJJASOND● Type A10 120 Type B222210 22264 ●12 20 12 184 Residents (in 000s) ACS samples from 184 000 person-months Avg Population: 15 333 (65% Type A)
11
Characteristics change over the Sampling Period Persons Age Marital Status Employment Education Housing Units Vacancy Number of Occupants $ Value 1. Temporal Complexity
12
Rolling ‘Population’ Population formed by sandwiching monthly samples is the average frame of a film, not a snapshot Individuals and housing units with changing characteristics are sampled and caught ‘in motion’. 1. Temporal Complexity
13
Reference Period Problems Many ‘long-form’ questions refer to retrospective periods: Income in last 12 months Place of residence 1 year ago Child born in last 12 months? Etc. 1. Temporal Complexity
14
Time Reference Example ‘2004’ data from 12 monthly samples taken in Jan04…Dec04 Question on fertility in the 12 months prior to the survey, so there are 12 overlapping periods in ‘2004’ data ‘Jan04’ question covers Jan03-Jan04 ‘Feb04’ question covers Feb03-Feb04 etc. 1. Temporal Complexity
15
Jan 2004 x x x x x x x x x x x x ●........... Jan 03Jan 04 Feb 2004. x x x x x x x x x x x x ●.......... Mar 2004.. x x x x x x x x x x x x ●......... Apr 2004... x x x x x x x x x x x x ●........ May 2004.... x x x x x x x x x x x x ●....... Jun 2004..... x x x x x x x x x x x x ●...... Jul 2004...... x x x x x x x x x x x x ●..... Aug 2004....... x x x x x x x x x x x x ●.... Sep 2004........ x x x x x x x x x x x x ●... Oct 2004......... x x x x x x x x x x x x ●.. Nov 2004.......... x x x x x x x x x x x x ●. Dec 2004........... x x x x x x x x x x x x ● 1 2 3 4 5 6 7 8 9 10 11121110 98 765432 1 Jan 05 1. Temporal Complexity
16
Reference Periods for ‘Last 12 Month’ Questions in 1-year ACS Datasets
17
Temporal Issues Summarized ‘Data Sandwiches’ contain: New meaning of ‘population’ Units that change over sampling period (moving targets) Multiple reference periods for retrospective questions 1. Temporal Complexity
18
2. Sampling Error Long FormACS Sample Size≈ 17%≈ 24% Taken on…1 day3650 days Released as…1 dataset10+ datasets Simultaneous 100% count? YESNO 2. Sampling Error
19
Small Samples More overall data from continuous sampling, but… 1-, 3-, or 5-Year Sandwiches have smaller samples than the single, decennial long form survey more sampling error in published data 2. Sampling Error
20
Small Samples The problem is especially acute for small areas narrow age groups rare subpopulations e.g., How many unmarried teen births per year in Sevier County, Tennessee? ACS 2006-2008 says 0 ± 161 2. Sampling Error
21
St. Johns County, FL 2006 1-Year ACS Data for Males BELOW POVERTYABOVE POVERTYPOVERTY RATE AGEEstimateMOEEstimateMOEPercentMOE* 0-4 746+/-5623,495+/-50117.6+/-13.3 50+/-300 906 +/-4670+/-33.1 6-11376+/-3635,401+/-7696.5+/-6.3 12-14231+/-2922,787+/-7687.7+/-9.7 150+/-3001,342+/-4600+/-22.4 16-170+/-3001,995+/-4170+/-15.0 18-241,235+/-6555,387+/-87818.6+/-9.9 25-34221+/-37110,192+/-8892.1+/-3.6 35-44202+/-19411,558+/-785 1.7 +/-1.6 45-54581+/-39912,794+/-8074.3+/-3.0 55-64468+/-45210,679+/-5504.2+/-4.1 65-74245+/-2005,825+/-2484.0+/-3.3 *Denominators have MOE≈0 under ACS sampling and weighting design
22
2. Sampling Error C24020. SEX BY OCCUPATION – Key West, Florida Data Set: 2006-2008 American Community Survey 3-Year Estimates ( http://tinyurl.com/acs-alap) …etc
23
Temporal Instability Teenage Birth Rate in a County
24
Unfortunate Result Aggregating over 1+ years of surveys produces datasets that are often Unfamiliar and difficult to understand Still too noisy to be useful for planners and researchers 2. Sampling Error
25
3. Weighting for Non-Response Long FormACS Sample Size≈ 17%≈ 24% Taken on…1 day3650 days Released as…1 dataset10+ datasets Simultaneous 100% count? YESNO 3. Weighting Problems
26
Weighting Weighting from Respondents Total Population requires Population Control Totals: (Place x Age x Sex x Race x Ethnicity x …) 3. Weighting Problems
27
Decennial Long Form Sample Control Totals Measured from a simultaneous enumeration of the population (Sample & Census on same day) Only 1 set needed Sample and Population defined identically (resid. on Census Day) 3. Weighting Problems
28
Continuous Survey Control Totals Must be estimated (no simultaneous census) Many sets needed (2006, 2007, 2006-8, 2007-9, 2008-12, …) Sample and Population defined differently 3. Weighting Problems
29
ACS Control Totals (Persons) 3. Weighting Problems ACS responses are weighted to match official intercensal estimates by Year (1 July midpoint snapshot) County (sometimes city) Age Race Sex Hispanic Origin (yes/no)
30
ACS Control Totals (Persons) 3. Weighting Problems Potential Errors Estimates are Wrong: Unanticipated internal migration Unanticipated international migration etc Population Definition don’t match Seasonal fluctuations Different race/ethnic categories
31
3. Weighting Problems JFMAMJJASOND● Type A10 120 Type B222210 22264 ●12 20 12 184 Census Pop = 12 000 (83% Type A) Average Pop= 15 333 (65% Type A) If every year looks like this… Intercensal Estim= 12 000 (83% Type A)
32
Weighting Error Example ACS weighting to estimates produces: Popn too small (Census < Avg Pop) Popn too “A” (seasonal Bs missed) Overestimates of vars + correl. with A (e.g., % with college education) Underestimates of vars - correl. with A (e.g., % single-parent families) 3. Weighting Problems
33
Opportunities Census Survey Continuous Survey Frequency Recency Sample Error Familiarity 4. Opportunities
34
Statistical models that exploit likely cell relationships (over times, ages, sexes, places, variables …) could, in principle Opportunities ACS table cells = millions of “seemingly unrelated” maximum likelihood estimates 4. Opportunities Retain frequency & recency Reduce variance of estimates Recover familiar measures
35
Conclusion 5. Conclusion CONTINUOUS SURVEYS like ACS create Big Problems for producers and users Unfamiliar, temporally complex data Potentially high sample error Technical problems with weighting Big Opportunities, IF we can develop appropriate statistical models and practices
36
5. Conclusion Thanks! ¡Gracias! Obrigado!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.