Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,

Slides:



Advertisements
Similar presentations
UK DATA ARCHIVE Louise Corti, ODAF April UK Data Archive an internationally-renowned centre of expertise in data acquisition, preservation, dissemination.
Advertisements

DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
GEODE - NeSC workshop, Oct 2006 GEODE: Grid Enabled Occupational Data Environment Paul Lambert and Larry Tan University of Stirling
For the e-Stat meeting of 27 Sept 2010 Paul Lambert / DAMES Node inputs.
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
DAMES - Data Management through e-Social Science 1 DAMES: Data Management through e-Social Science NCeSS Research Node University of Stirling / University.
Workflows for Social Science Ken Turner Computing Science and Mathematics 31st January 2012.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3.
DDI at the Australian Data Archive Steve McEachern Deputy Director, ADA with Deborah Mitchell (ADA), Ben Evans and Olaf Delgado-Friedrichs (ANUSF) EDDI.
GEODE Workshop 16 th January 2007 Issues in e-Science Richard Sinnott University of Glasgow Ken Turner University of Stirling.
GEODE Project introduction and summary, 12/12/05 GEODE: Grid Enabled Occupational Data Environment GEODE Project introduction and summary, 12/12/05 Motivation.
Open Statistics: Envisioning a Statistical Knowledge Network Ben Shneiderman Founding Director ( ), Human-Computer Interaction.
Introducing Symposia : “ The digital repository that thinks like a librarian”
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
Polaris Financial Technologies Welcomes the members of Hyderabad chapter for the 2nd event on 4 th July 14 held by PACE (The Testing Practice)
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
World Bank: Microdata Library Development Data Group.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
GEODE, March 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Supporting & Embedding CPD for BCE Helen Blanchett, JISC Netskills.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.
Using ISO/IEC to Help with Metadata Management Problems Graeme Oakley Australian Bureau of Statistics.
GEODE, 16 Jan 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
GEODE - eSS Manchester, June 2006 Development of a Grid Enabled Occupational Data Environment GEODE – Paper presented.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
ESDS resources for managing data Jack Kneeshaw Economic and Social Data Service University of Essex, 27 January 2009.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
The Saguaro Digital Library for Natural Asset Management Dr. Sudha RamSudha Ram Advanced Database Research Group Dept. of MIS The University of Arizona.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Some comments on using research data in the social sciences Paul Lambert, School of Applied Social Science, University of Stirling, 25 March 2013.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 Earth System Modeling Framework Documenting and comparing models using Earth System Curator Sylvia Murphy: Julien Chastang:
Combining the strengths of UMIST and The Victoria University of Manchester “Use cases” Stephen Pickles e-Frameworks meets e-Science workshop Edinburgh,
Create Content Capture Content Review Content Edit Content Version Content Version Content Translate Content Translate Content Format Content Transform.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
GEODE - Durban ISA RC33, July 2006 Utilising a Grid Enabled Occupational Data Environment GEODE – Paper presented.
 Using SHS Lite in support of policy development in Fife Coryn Barclay Community Budgeting Project Manager, Corporate Research, Fife Council.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Modernization of official statistics Eric Hermouet Statistics Division, ESCAP
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
GSIM, DDI & Standards- based Modernisation of Official Statistics Workshop – DDI Lifecycle: Looking Forward October 2012.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
CoRD Meeting 12 March 2003 STIPES (Lot 4) STIPES = Statistical Inquiries from Popular European Software.
GEODE – Sharing Occupational Data Through The Grid Dr. Paul Lambert, Dr. Vernon Gayle, Prof. Ken Prandy, Prof. Richard Sinnott, Prof. Ken Turner, Koon.
Application of RDF-OWL in the ESG Ontology Sylvia Murphy: Julien Chastang: Luca Cinquini:
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Developing GRID Applications GRACE Project
The Impact of the Social Sciences Jane
Exeter – Implementation of a Crosswalk Connector S. Trowell, University of Exeter Nov 2013.
Tools of data analysis Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 2 on.
Linking data resources Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
An Overview of Data-PASS Shared Catalog
Presentation transcript:

Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland, UK Seminar: Data management in the social sciences and the contribution of the DAMES Node Stirling 31 January 2012 DAMES: Data Management through e-Social Science

2 DAMES: Background  DAMES: Case studies, provision and support for data management in the social sciences  This talk: focusing on "support for data management"  Infrastructure/tools  Driven by social science needs for support for advanced data management operations  “In practice, social researchers often spend more time on data management than any other part of the research process” (Lambert)  A ‘methodology’ of data management is relevant to ‘harmonisation’, ‘comparability’, ‘reproducibility’ in quantitative social science

3 DAMES: Themes  Enabling the (social science) researcher:  To deposit, search and process heterogeneous data resources  To access online services/‘tools’ that enable researchers to carry out repeatable and challenging data management techniques such as: fusion matching imputation …  Facilitating access is an important goal  Underlying computer science research themes  Metadata  Data curation  Data management/processing  Portals

4 Data management/processing scenarios  Curation scenarios include:  Uploading occupational data to distribute across academic community  Recording data properties prior to undertaking data fusion involving a survey and an aggregate dataset  Fusion scenarios include:  Linking a micro-social survey with aggregate occupational information (deterministic link)  Enhancing a survey dataset with ‘nearest match’ explanatory variables (probabilistic link)  Other processes: recoding, operationalising, linking, cleaning…

5 Generic data flows Data set store Processing Data sets are deposited Data sets are selected Processing is configured Data set selection, and the configuration of processing jobs must be informed by knowledge about the data sets - metadata Result is saved

6 Key role for metadata  Metadata records are absolutely core to the functioning of the portal infrastructure  For adequate, searchable records for the heterogeneous resources (data tables, command files, notes and documentation)  To connect the resources and the data mgmt tools  To document the data sets resulting from application of the data mgmt tools: inputs, process, rationale,…  DAMES requirements:  (Micro-)data based, very general  DDI (= Data Documentation Initiative)

7 DDI 2 – An XML language An interesting study 12 DAMES Portal Univ of Stirling July 29, 2010 <ddi2:grantNo source=" Financial_1 " agency=" Economic and Social Research Council "> RES

8 The metadata "cycle" Processing Metadata Search Data is mirrored by metadata Configure/ process Select Deposit/curate

9 DAMES portal architecture overview Portal DAMES Resources External Dataset Repositories User Services Search Enact Fusion File Access Compute Resources Metadata Local Datasets (Note: Security omitted)

10 Tools  Since metadata must have a key role in data management…  So tools for managing and exploiting the metadata have key role in the use and operation of the DAMES portal  At deposit/curation  For searching  For informing the configuration of processing steps  The following slides illustrate use of our tools

11 Curation Tool The source data:

12

13

14

15

16

17

18

19

20

21

22

23

24 Also automatically uploaded to searchable eXist database

25 Metadata searching

26 Browsing the search results

27 Fusion Tool prototype  Scenario: A soc sci researcher wishes to fuse Scottish Household Survey data with privately collected study data:  Uses the data curation tool to upload the data  Uses the data fusion/imputation tool to select the data, identify corresponding variables, and to generate a derived dataset (held in the portal)  The metadata about this derived dataset is stored and (may be) made public through the portal  Another researcher can now search the portal (metadata) for SHS data and find the derived dataset  DAMES metadata handling must facilitate this process

28 The Fusion Tool prototype Select datasets (recipient and donor) Select "common variables" Select variables to be imputed Select data fusion method Submit to fusion "enactor" Metadata accessed

29 Select datasets (recipient and donor) Select "common variables" Select variables to be imputed Select data fusion method Submit to fusion "enactor" Metadata accessed

30 Select datasets (recipient and donor) Select "common variables" Select variables to be imputed Select data fusion method Submit to fusion "enactor" Skipped Metadata for result dataset

31 Job submission: Information flow Wizard Enactor Compute resources (Condor) subjob1 subjob2 User's local file store Resultant data DDI record notify (job id) fetch job submit JFDL/JSDL description.xml Further infra- structure

32 Fusion job flow description  We use a Job Flow Description Language (JFDL) to submit the job to the computing resources pool  The JFDL job description includes references to:  Input data sets  Processing steps and their relationships  Outputs

33 JSDL/JFDL DAMES::Fusion … A brief extract!

34 Technology – other components  Liferay portal  eXist  XML based database – ideal for storing DDI metadata  Condor  Job management  iRODS  Highly flexible filestore  Capable of running automated processes on file upload: e.g. metadata extraction (e.g. STATA files), JFDL → DDI translation, & transfer from file store to metadata store

35 Thank you!