e-Infrastructure for Social Science data: Obesity e-Lab & MethodBox Ian Dunlop 15/03/11
Terminology Obesity e-Lab is the ESRC project MethodBox is the product
Obesity e-Lab Aims Enable socially networked research between the social sciences, health sciences and public health Add value to archived datasets by developing technologies to help on-line users Seed an “open source” approach to social research publication
Project Objectives Engagement (‘More with less’) – Research communities (Obesity/Cancer, Education) – Public health researchers (Academic, NHS, LA) – Key data providers (ESDS/UKDA) Reduce barriers – For survey datasets – Formation of research communities (cross-disciplinary) Develop tools – On line digital laboratory an ‘e-Lab’ known as MethodBox Data * Methods * People
e-Lab Socially-stimulating science, in-silico Research Object Find Share Reuse Data-sources Data-preparation scripts Research protocolStatistical analysis scripts Slides Working datasets Figures/Graphics Manuscripts References Analysis-logs & notes
Where we are upto MethodBox launched at ESDS government event April 2010(scored 5.7/7 from 15 responses) 80 registered users, 45 scripts and 58 data extracts. 21 public health researchers trained using a combination of social science and health science approaches Methodological approach adopted by North West e-Health ( project (which is 20x bigger than us)
Context, Features, Architecture Context – Investigation Cycle – Survey (Meta) Data overload – How MethodBox fits it MethodBox – Architecture – Screenshots – E-Infrastructure Future Directions
Investigation Cycle Data Our Tooling focus is (survey) Data and Analysis Out main Community focus is Expertise via Methods/Analysis/Scripts Analysis Models Results Questions Publications, Reports or Decisions Tooling Community
Examples: HSE pages208 pages Variable Definitions Variable Categories Variable SPSS code Questionnaire Instructions 224 pages Questions used To set variables 148 pages Survey Description 9 pages Variable Value Domains 351 pages 46 MB data files Data and Variable Codebook X 17 All Variables
How MethodBox fits in UK Data Archive (UKDA) MethodBox Economic and Social Data Service (ESDS) Survey Curation Survey Mapping diagram not to scale Survey Navigation Survey Commissioning & Collection etc… Improving Access & Use
Ruby delayed job Ruby on Rails Data providers User Dataset import File system mySQL Metadata import User data and metadata import Request ‘catalog’ information Provide metadata
Search
Results
Variable info with Stats
Profiles
People & Expertise
Methods
Method Information
Data Extracts
Making the data extract visible… Linking a data extract with a script for deriving variables… Sharing and visibility
MethodBox as e-Infrastructure Data Providers – Existing infrastructure (NESSTAR/NESSTAR Server) – Cautious adopt only ‘proven’ technologies Willing ‘try’ things if risk/work is low MethodBox offers – Social Layer, sharing, data tooling – Integration Existing data provider infrastructure – NESSTAR Server Security infrastructure (Shibboleth) Automated running of scripts for new datasets (using institutional/national compute) Deployment – ESDS/CCSR first instance (exit strategy) Obesity e-Lab project ends 31/03/12
Future work MethodBox as e-Infrastructure – Target deployment as part of ESDS/CCSR – Integration with NESSTAR system Focus on communities – Greater Manchester Public Health Inequalities Research Network – University of Manchester School of Education – North West e-Health and Arthritis Research UK Ability to ‘run’ methods – Part funded by Obesity e-lab work in JISC ‘National e- Infrastructure for Social Simulation’ project video at