Methodbox: From open-data to open-insight MethodBox Team Jul 2011
Presentation Problem Data tsunami + puddles of insight Solution Collective efficient science Deployment Sense-making networks on open-data
Quote “…you call it Epidemiology and we call it quantitative Social Science” A leading researcher, Jul 2011 Open data Common methods Potentially complementary expertise
Obesity Example Fragmented understanding of public health problems such as obesity...data, methods/models and expertise split across disciplines (e.g. social vs. biomedical) and settings (e.g. academia vs. healthcare)
Puddles of research around the organising principle … but policies need the big picture
Data Example Time series data from Health Visitors from Wirral Data deposit with UKDA but no uses for 16 years Children measured at the time the obesity epidemic took hold…
Fifths of IDAC 2004 Red (light) = most deprived Red (dark) Purple Blue (dark) Blue (light) = most affluent Material deprivation affecting children (households with children: % on benefits in ) Wirral (0.3M), UK
BMI of 3 yr olds Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 2000 – 2001 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
Child Obesity: Action 6 years after signal in the data Body Mass Index (BMI) trend in Wirral 3y-olds from 1988 to Mar-88Jul-89Nov-90Apr-92Aug-93Jan-95May-96Sep-97Feb-99Jun-00Nov-01Mar-03Aug-04 Month of measurement by Health Visitor Three-monthly rolling average BMI SDS SDS = standard deviation score from 1990 British Growth Reference charts – adjusts for age and sex of the child Clues Actions
Similar Data in 2011 National Child Measurement Programme Anonymised national database Could be opened (like national pupil database) extend to other policy-relevant, timely research
Data Already in UK Data Archive Example: Health Surveys for England (annual) Analyses feed national policies Does evidence need to be localised?...
Men Women BMI Income fifth (low to high) Women and not men from low-income households are fatter in England Data from Health Survey for England
Men Women BMI Income fifth (low to high) Women from low-income households and men from high-income households are fatter in Greater Manchester Data from Health Survey for England
Linked-data ≠ Linked: data, methods & investigators Previous slides show social-biomedical signals about obesity from under-used datasets Biomedical Research: Data, methods & investigators Social Research: Data, methods & investigators
MethodBox Aim..to increase the sharing and reuse of data sources & extracts and data processing methods in one in-silico environment (‘e-Lab’) shared by social and health researchers
e-Lab Socially-stimulating science, in-silico Research Object Find Share Reuse Data-sources Data-preparation scripts Research protocolStatistical analysis scripts Slides Working datasets Figures/Graphics Manuscripts References Analysis-logs & notes
National Dataset Example Health Surveys for England – Large-scale (participants * variables) – Annual since early 90s – Under-used by NHS who fund it – Key barrier: extracting a research-ready subset of data – Data archive playground = e-Lab
Supporting and Developing Interdisciplinary Understanding Sharing resources – tools, methods, data Sharing expertise – discussions and reuse around shared resources Promoting interdisciplinary working Developing interdisciplinary understanding – language, tacit assumptions, methods First step - sharing of resources Shared resources provide the basis for discussion Discussions lead to deeper interdisciplinary understanding Understanding of other domains promotes more effective interdisciplinary working
Facilitating a social network of data archive users… …toward a reward environment for sharing data, methods, and expertise
Browsing for data extracts made by a social network of data archive users…
Shopping for variables from across different years of survey collections…
Instant access to relevant parts of survey documentation …
Making the data extract visible… Linking a data extract with a script for deriving variables… Sharing and visibility
Enabling user-visibility for data extraction or derivation contributions…
Current MethodBox Video link
Training Course Apr `10 Trained a mixture of NHS, academic and industry users of HSE in the use of Methodbox Course run in conjunction with CCSR Feedback forms completed by 15 of 16 attendees, asked to rate Methodbox from 1 (negative) to 7 (positive) on the following statements: – I thought MethodBox was: Terrible - Wonderful: Mean = 5.57 Difficult to understand - Easy = 5.57 Frustrating to use - Satisfying = 5.79 Dull - Stimulating = 5.29 Rigid - Flexible = 5.71 Difficult to navigate - easy to navigate = 6
Attitudes to Sharing DataScripts Academic social scientists YesNo Academic epidemiologists/ medical researchers NoYes NHS & Local Govt. analysts Yes
MethodBox Evolution Amazon-like user-prompting for other variables that may be relevant to the set being extracted More surveys/datasets incorporated User-contributed & community-curated datasets …. Feature request list exceeds resources
Building on Successful E-Science Most widely used scientific workflow sharing systems: myGrid, Taverna, myExperiment Over a decade of programme funding sustained world leading E-Infrastructure R&D ready to leverage more outputs from open-linked data
Toward Open Insight Researcher A is expert in deprivation Researcher B is expert in obesity Both use a common data archive but don’t usually meet MethodBox shares the expertise of A and B to create a more complete model of deprivation in obesity
Conclusion Open-data alone is not enough Social e-infrastructure for science is needed Sharing insights and methods is key, and can be achieved through systems like MethodBox + ESDS