Presentation is loading. Please wait.

Presentation is loading. Please wait.

Changing Patterns of Social Science Data Usage

Similar presentations


Presentation on theme: "Changing Patterns of Social Science Data Usage"— Presentation transcript:

1 Changing Patterns of Social Science Data Usage
Patrick Sturgis

2 Context Rapid increase in new types of data for social science research: Social media Online surveys Administrative data Mobile digital devices Textual archives Transactional data (Uber, bike shares, Airbnb)

3 ‘the coming crisis of empirical sociology’
“the sample survey is not a tool that stands ‘outside history’. Its glory years, we contend, are in the past” “It is unlikely, we suggest, that in the future the sample survey will be a particularly important research tool, and those sociologists who stake the expertise of their discipline to this method might want to reflect on whether this might leave them exposed to marginalization or even redundancy.” (Savage & Burrows, 2007)

4 Motivation What kinds of data do social scientists use?
Patterns across disciplines & over time? Decline of surveys & increase in ‘big data’? (transparency and quality of methods) I’ll address each of these separately.

5 Survey research in crisis?

6 Low and declining response rates
Face-to-face surveys now routinely struggle to reach 50%response rates RDD even worse, in the US routinely < 10% (increasing mobile-only+do not call legislation) Survey sponsors ask ‘what are we getting for our money?’ Is a low response rate survey better than a well designed quota?

7 Increasing costs Per achieved interview costs are high and increasing
Simon Jackman estimates $2000 per complete interview in 2012 American National Election Study My estimate= ~£180 per achieved for PAF sample, 45 min CAPI, n=~1500, RR=~50% Compare ~£5 for opt-in panels ‘A good quality survey may cost two to three times as much as an inferior version of the same thing’ Marsh

8 Cost drivers Average number of calls increasing
More refusal conversion More incentives (UKHLS, £30) 30%-40% of fieldwork costs can be deployed on the 20% ‘hardest to get’ respondents

9 US Survey of Consumer Attitudes 1979-1996 (Curtin et al 2000)
Response Rate = 70% (1979) -> 68% (1996) Mean contact attempts % refusal conversions

10 Externalities of ‘survey pressure’
Poor data quality of ‘hard to get’ respondents Fabrication pressure on respondents Fabrication pressure on interviewers Ethical research practice? Roger Thomas anecdote

11 Content analysis of journal articles
(joint work with Rebekah Luff)

12 Content analysis of all papers: 1949-50, 1964-65, 1979-80, 1994-95
Presser (1983) and Saris & Gallhofer (2007) Content analysis of all papers: , , , Field  Journal Economics American Economic Review Journal of Political Economy Review of Economics and Statistics Sociology American Sociological Review American Journal of Sociology Social Forces Political Sciences American Journal of Political Science American Political Science Review Journal of Politics Social Psychology Journal of Personality and Social Psychology Public Opinion Research Public Opinion Quarterly Presser did the 1st 3 and S&G the

13 Metzler et al (2016) Online survey of Sage social science ‘contacts’
9412 respondents 33% reported having undertaken big data research But response rate < 2% Self-definition of ‘big data’

14 Findings of Presser, Saris & Gallhofer
Percentages of articles using survey data by discipline and year Presser Saris & Gallhofer* Years Sociology 24% (282) 54% (259) 56% (285) 70% (287) 47% Political Science 3% (114) 19% (160) 35% (203) 42% (303) 27% Economics 6% (141) 33% (155) 29% (317) (461) 20% Social Psychology 2% (59) 15% (233) 21% (377) 50% (347) 49% Public Opinion 43% (86) (61) 91% (53) 90% (43) A few points to make about these data: -They include all of the papers, rather than just the empirical ones. Therefore subject to fluctuation if there’s more review papers or theoretical papers. - As the note at he bottom shows, Presser included studies performed by statistical bureaus as ‘surveys’. Eg anything by ONS etc would be a survey. I have not figured this part out yet, but as you can see from S&Gs 2nd set of figures for survey use which count surveys in a way I think most people would apply today, the % drop significantly. I concluded that ‘total’ figures for survey use were fairly meaningless given the changing number of papers within each field. *Presser included studies performed by organisations for official statistics (statistical bureaus) under the category ‘surveys’. Saris and Gallhofer repeated this method but also used their own classification- these results are shown in last column in italics.

15 Updating the Analysis: 2014-15
1453 research papers 7 coders, papers randomly assigned to coders 24 data-information codes: Theoretical, review, quant/qual/mixed, primary/secondary, survey/administrative, big data, experimental, observation, interview, textual, visual, social media Note- all of the coding categories are given on the 3 slides at the end of the presentation should you want them.

16 Inter-rater reliability
8% of 1453 papers were ‘flagged’ – that is coders were unsure of some aspect of coding wide variation in papers flagged by coders Coder reliability (based on random subset of papers coded by all coders): Average pairwise agreement = 87% Coder average agreement range = % Variation in reliability for code types: Survey/administrative = 76% vs Qual codes = 94% **NOTE – the findings in this presentation are only draft as the flagged papers are included in the analysis as left by the coder- the updated version (where I will have checked them all) is not yet complete. 150 papers out of the full 1752 (includes reliability sample) were flagged. For 20 categorical codes across 49 complete reliability papers (980 codes x 7 coders) ted Cohen’s Cappa as this is a bit more intuitive for this presentation. 124 out of 1453 papers (8%) were ‘flagged’

17 Empirical v Theory/review papers by discipline 2014/15
Please note – some of the ‘theoretical’ and ‘review’ papers have also been coded as having empirical data, which they should not have been. This is one of the data cleaning issues I’m working through.

18 Quant/Qual/Mixed by Discipline 2014/15
Field Quantitative Qualitative Mixed Economics 98 0.3 1 Sociology 80 11 10 Political Sciences 87 5 8 Social Psychology 72 18 Public Opinion 97 3 TOTAL 6 N=1251

19 Mainly Quantitative Data by Discipline 2014/15
Field Survey/Poll Administrative Census Digital/Big data* Experimental Economics 31 73 19 3 14 Sociology 52 42 17 4 5 Political Sciences 41 58 9 Social Psychology 69 1 72 Public Opinion 89 33 TOTAL 48 47 12 24 Note – more than 1 data type can be selected so will add up to more than 100% N= *Exclusively quantitative

20 Mainly Qualitative Data by Discipline 2014/15
Field Observational Interview/ focus grp Textual Visual Social media/ online* Economics 0.5 2 0.3 Sociology 12 15 11 Political Sciences 4 1 Social Psychology 8 5 24 14 Public Opinion TOTAL 10 3 Note – more than 1 data type can be selected so will add up to more than 100% Note – Textual data analysis seems too high, I’ll look into it. N= *Exclusively qualitative

21 Surveys 94/95 > 2014/15

22 Experiments 94/95 > 2014/15

23 Observation 94/95 > 2014/15

24 Text analysis 94/95 > 2014/15

25 Transparency and Quality of Methods reporting
Many of Presser’s initial criticisms still stand: basic reporting is frequently absent or unclear Inter-rater reliability and time taken to code papers shows how challenging this task could be A third of papers using surveys lacked some basic information e.g. sampling method Some journals have essential details in online appendices or refer to other documents/articles (do reviewers look at these?) Some details required ‘googling’ especially to work out if data was a survey or administrative – assumed knowledge

26 Next steps Use the human coding data as training sample for machine learning Automated sampling and retrieval of online journal articles Apply Natural Language Processing to code articles for methodological content

27 Reports of the death of surveys greatly exaggerated?

28 Frequency of GB Polls 1940-2015 N of election polls 1945-2010 = 3,500
= 1,942

29 Global spend on online market research
Chart: Mario Callegaro, source Inside Research

30 Survey Futures Lower cost of online surveys mean we are likely to see more not fewer surveys in future Population inference still key to social science Big data failing to live up to hype for social science applications

31 Survey Futures Shorter questionnaires administered at more frequent intervals Device agnostic questionnaires Data linkage & ‘passive’ data collection

32 Example: Wellcome Trust Science Education Tracker (SET)

33 Science Education Tracker waves 1 & 2
Conducted as part of survey of adults Stratified, multi-stage PAF, CAPI Interview all children aged years in sampled households + additional screener on adjacent houses Achieved sample ~450 Response rate ~ 50%

34 Science Education Tracker wave 3
Sample drawn from National Pupil Database Invitation with login details to named individual sent by post, short online interview (25 mins) £10 conditional incentive 4000 achieved interviews, response rate 50% 25% of interviews completed on mobile devices

35 Concluding Remarks Evidence of changing data use in content of social science journals Big differences by discipline Growth in ‘bid data’ including admin & text Big increase in experiments But no evidence of decline in survey research Reasons to be cheerful about the future


Download ppt "Changing Patterns of Social Science Data Usage"

Similar presentations


Ads by Google