Presentation is loading. Please wait.

Presentation is loading. Please wait.

Insights and Inference Opportunities and challenges with administrative data and non-probability sources (including organic data)

Similar presentations


Presentation on theme: "Insights and Inference Opportunities and challenges with administrative data and non-probability sources (including organic data)"— Presentation transcript:

1 Insights and Inference Opportunities and challenges with administrative data and non-probability sources (including organic data)

2 1. No new inferential issue

3 Social Science

4 Prediction Causation Descriptio n Social Science

5 Prediction Causation Inference Descriptio n Social Science

6 Prediction Causation Inference Descriptio n Social Science Probability Based (Survey) Data

7 Prediction Causation Inference Descriptio n Social Science Positive & Known Selection Probability

8 Prediction Causation Inference Descriptio n Positive & Known Selection Probability Social Science

9 Prediction Causation Descriptio n Positive & Known Selection Probability Social Science Inference

10 Prediction Causation Descriptio n Positive & Known Selection Probability Social Science Inference

11 Survey Statistic Postsurvey Adjusted Data Population Mean Sampling Frame Sample Respondents Construct Measurement Response Edited Response Groves et al. 2004 Data Generating Process

12 Survey Statistic Postsurvey Adjusted Data Population Mean Sampling Frame Sample Respondents Construct Measurement Response Edited Response Listing Information Time stamps; Driving Contact Data & Interviewer Observation Day/Time; Proxy-Y HU Char. Key Strokes Response time; Back-ups; Edits Vocal Characteristics Pitch; Disfluencies Data Generating Process

13 Key Ingredients for Valid Inference

14 1.Data generating process needs to be known 2.Framework as tool to identify errors 3.Model or break confounders 4.Know your inferential goal

15 “Found” -- Boston Street Bumps

16 Data Generating Process Who? What? Why? Who is missing? Who is counted repeatedly? What is not said / measured?..and why? … no matter if data are found or designed

17 2. Data integration

18 Linkage Consent Gain vs. loss framing Kahneman and Tversky, 1979 Front vs. back placement Opt-in vs. opt-out Thaler and Sunstein 2008; Schwartz 2014 Phone FrontBack Gain 90.878.7 Loss 90.581.2 Total n 613595 Kreuter et al. 2015

19 Regression Income (Euros) on age and gender Kreuter et al. 2015

20 3. Skills - Steps - Teams

21 Data Generating Process Data Curation/Storage Data Analysis Data Output/Access Research Questions Examples: geolocated social media + survey + administrative data Example: Record Linkage Hadoop Distributed File System Example: Hadoop MapReduce; High Frequency Data Example: map visualization / privacy Examples: Behavior of interest (political participation/job searches) Usher 2015

22 Take home 1.Inference rests on the assumption that all confounders are removed through modeling or data collection. This was true in the past and is true now. NOTHING has changed 2.Probability sampling is in theory a way to remove coverage and self-selection problems. In practice it is an empirical questions

23 Take home 3.With sufficient transparency we can model the processes and gain insight we did not have before because new data that are now at our disposal 4.To do this fully, teams with different skills are necessary 5.BUT we have a lot to offer with the skills we already have

24 Thank you! fkreuter@umd.edu


Download ppt "Insights and Inference Opportunities and challenges with administrative data and non-probability sources (including organic data)"

Similar presentations


Ads by Google