Download presentation
Presentation is loading. Please wait.
Published byShana Booker Modified over 9 years ago
1
Insights and Inference Opportunities and challenges with administrative data and non-probability sources (including organic data)
2
1. No new inferential issue
3
Social Science
4
Prediction Causation Descriptio n Social Science
5
Prediction Causation Inference Descriptio n Social Science
6
Prediction Causation Inference Descriptio n Social Science Probability Based (Survey) Data
7
Prediction Causation Inference Descriptio n Social Science Positive & Known Selection Probability
8
Prediction Causation Inference Descriptio n Positive & Known Selection Probability Social Science
9
Prediction Causation Descriptio n Positive & Known Selection Probability Social Science Inference
10
Prediction Causation Descriptio n Positive & Known Selection Probability Social Science Inference
11
Survey Statistic Postsurvey Adjusted Data Population Mean Sampling Frame Sample Respondents Construct Measurement Response Edited Response Groves et al. 2004 Data Generating Process
12
Survey Statistic Postsurvey Adjusted Data Population Mean Sampling Frame Sample Respondents Construct Measurement Response Edited Response Listing Information Time stamps; Driving Contact Data & Interviewer Observation Day/Time; Proxy-Y HU Char. Key Strokes Response time; Back-ups; Edits Vocal Characteristics Pitch; Disfluencies Data Generating Process
13
Key Ingredients for Valid Inference
14
1.Data generating process needs to be known 2.Framework as tool to identify errors 3.Model or break confounders 4.Know your inferential goal
15
“Found” -- Boston Street Bumps
16
Data Generating Process Who? What? Why? Who is missing? Who is counted repeatedly? What is not said / measured?..and why? … no matter if data are found or designed
17
2. Data integration
18
Linkage Consent Gain vs. loss framing Kahneman and Tversky, 1979 Front vs. back placement Opt-in vs. opt-out Thaler and Sunstein 2008; Schwartz 2014 Phone FrontBack Gain 90.878.7 Loss 90.581.2 Total n 613595 Kreuter et al. 2015
19
Regression Income (Euros) on age and gender Kreuter et al. 2015
20
3. Skills - Steps - Teams
21
Data Generating Process Data Curation/Storage Data Analysis Data Output/Access Research Questions Examples: geolocated social media + survey + administrative data Example: Record Linkage Hadoop Distributed File System Example: Hadoop MapReduce; High Frequency Data Example: map visualization / privacy Examples: Behavior of interest (political participation/job searches) Usher 2015
22
Take home 1.Inference rests on the assumption that all confounders are removed through modeling or data collection. This was true in the past and is true now. NOTHING has changed 2.Probability sampling is in theory a way to remove coverage and self-selection problems. In practice it is an empirical questions
23
Take home 3.With sufficient transparency we can model the processes and gain insight we did not have before because new data that are now at our disposal 4.To do this fully, teams with different skills are necessary 5.BUT we have a lot to offer with the skills we already have
24
Thank you! fkreuter@umd.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.