Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO www.wf4ever-project.org.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
University of Southampton, U.K.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
“provenance” DATA TRACK Chair : Krystyna Marek Rapporteur: Wolfram Horstmann 6th e-Infrastructure Concertation Lyon 24 Nov 2008.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Mobility Without Vulnerability: Secure and Enable Your Mobile Users, Apps, and Devices David Clapp – Intuitive.
Rainbow Facilitating Restorative Functionality Within Distributed Autonomic Systems Philip Miseldine, Prof. Taleb-Bendiab Liverpool John Moores University.
January, 23, 2006 Ilkay Altintas
Intégration Sémantique de l'Information par des Communautés d'Intelligence en Ligne ISICIL.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Digital Object Architecture
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
Data Wrangling and Interoperability Andrea Denton Research and Data Services Manager Claude Moore Health Sciences Library Ricky Patterson.
MyExperiment Research Objects: Beyond Workflows and Packs Stian Soiland-Reyes myGrid, University of Manchester BOSC 2013, ISMB, Berlin, This.
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich,
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
SWIM-SUIT Information Models & Services
MyExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, Stian.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
European Commission on Preservation and Access Preservation of digital heritage Yola de Lusenet Lisbon, November
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh.
National Library of Finland Strategic, Systematic and Holistic Approach in Digitisation Cultural unity and diversity of the Baltic Sea Region – common.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Provenance in Sensornet Republishing Unkyu Park and John Heidemann University of Southern California Information Science Institute June 18, 2008.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Atos, Atos and fish symbol, Atos Origin and fish symbol, Atos Consulting, and the fish symbol itself are registered trademarks of Atos Origin SA. June.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
SharePoint Fest 2013 Chicago What’s New and Exciting (and not so great) in SharePoint Designer 2013 Workflows Ira Fuchs – SharePoint Technical Specialist,
Meeting of the Member States Expert Group on Digitisation and Digital Preservation , Luxembourg European Archival Records and Knowledge Preservation.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
WP1:Definition & Production of the GRDI2020 Roadmap Roadmap Report To address the Technological, Organizational and Policy problems which hinder the building.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
Building A Repository for Digital Objects
Joseph JaJa, Mike Smorul, and Sangchul Song
Active Data Management in Space 20m DG
Outline Pursue Interoperability: Digital Libraries
Data Management: Documentation & Metadata
Health Ingenuity Exchange - HingX
An ecosystem of contributions
NSDL Data Repository (NDR)
Metadata The metadata contains
Managing Private and Public Views of DDI Metadata Repositories
Presentation transcript:

Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar April 2012 José Manuel Gómez Pérez, iSOCO

2 Some facts The data deluge Source: IDC ‘s The 2011 Digital Universe Study – Extracting Value from Chaos »In 2010 the size of the digital universe exceeded 1 Zettabyte (=1 trillion Gb) »1.8 Zb in 2011 »35 Zb expected in 2020 »90% unstructured data »70% user-generated »75% resulting from data copying, merging, and transforming »Metadata is the fastest growing data category »Much of such data is dynamic, real-time, volatile

3 Two main challenges Dealing with dynamicity »Challenge 1: Identifying and structuring the relevant portions of the data for the task at hand ›First-class data citizens »Challenge 2: Managing the lifecycle of data entities ›Preservation ›Evolution and versioning ›Decay Both technical and social aspects involved

4 Experiment Results (data) Scientific Interpretation Workflows in the Scientific Method The Research Lifecycle Example: Genome-Wide Association Studies Background Hypothesis Assumptions Input data Method Publication Results (Data)

5 Workflow-based Science »A mechanism for coordinating the execution of services and linking together resources. »The combination of data and processes into a configurable, structured set of steps that implement semi-automated computational solutions in scientific problem-solving What is a Scientific Workflow? Scientific workflows are at the core of scientific data management ›Enable automation ›Encourage best practices

Challenge 1 Identifying and structuring the relevant portions of the data for the task at hand First-class data citizens

7 Questions for Scientific Data and WorkflowsIssues Who are you ? Where and when were you born ? Who were your parents (creators) ? Identity and Description Authenticity Uniqueness For which purpose were you conceived and have been used ?Reuse, Repurpose What do you have inside ?Inspection Visualization Annotations How is your content linked ?Graphical Representation May I access all your parts ?Access Rights Which parts can I replace ?Adaptability What have they done to you ? Who and When ? Why did they do that ? Provenance Versioning Why have you been recommended to me ? Can I believe what you are saying or trust your results ? Information Quality Do you still produce the same results ?Reproducibility Are you still working ? How could I repair you ? Completeness Stability How could I thank you ? How could I talk about you ? Credit

8 Research Objects as Technical Objects Challenge 1: Identifying and structuring the relevant data Carriers of Research Context »Referentiable »Aggregation, Dispersed ›Heterogeneous ›Local and External »Annotated metadata ›Provenance ›Structured: Manifests, Recipes, Permissions, Discourse »Lifecycle ›Publishing, Evolution ›Versioning »Mixed Stewardship ›Graceful Degradation »Sharing »Security & Privacy »Stereotypical User Profiles »Services Distributed Third Party Tenancy Alien Store Technical Objects Social Objects OAI-ORE

9 9 9 Research Objects as Social Objects Package, Explore, Inspect, Review, Exchange, Share, Reuse, Publish, Credit

10 Research Object model core (simplified) ro:Resource ro:ResearchObject ro:Manifest ro:AggregatedAnnotation ore:aggregates ro:annotatesAggregatedResource wfdesc:Workflow ore:isDescribedBy Note: This figure shows a simplified view of the RO core. RO specification: ›ro (aggregation and annotation) ›wfdesc (workflow description) ›Minim * (minimum info model) ›wfprov (workflow provenance) ›roprov (RO provenance) ›roevo (evolution model) * Minim based on M. Gamble’s MIM

Challenge 2 Managing the lifecycle of data entities Evolution and Decay

12 RO Evolution & Versioning Challenge 2: Managing the lifecycle of data entities

13 Workflow Decay Component level flux/decay/unavailability Data level Infrastructure level Experiment Decay Methodological changes New technologies New resources/components New data RO Decay Challenge 2: Managing the lifecycle of data entities

14 Preservation, Conservation, Recreating Preserving Archived Record Fixed Snapshots Review Rerun & Replay Conserving Active Instrument Live Rerun & Reuse Repair & Restore Recreating Archived Record Active Instrument Live Rebuild Recycle Repurpose

15 Possible types of decay (an example) Challenge 2: Managing the lifecycle of data entities

16 A Taxonomy of RO decay Decay Analysis 1.Service tool is missing 2.Service file descriptor disappeared 3.Service up but not contactable 4.Service up but functionality changed 5.Local software dependencies 6.Data unavailability 7.Changes in data formats 8.Chained dependency 9.Credentials deprecated 10.Input data superseded by other data 11.RO metadata outdated (upon versioning) 12.Old fashioned RO 13.External references lose credit 14.Execution framework no longer available

17 Sample decay type A taxonomy of workflow decay

Certificate – Evaluation of Stability and Completeness Decay Analysis Is the RO free from any form of decay preventing workflow execution? »Focus on reproducibility »Assisted detection of RO decay »Active monitoring on decay forms »RO and workflow provenance Is the minimal aggregation of resources encapsulated by the RO consistent? »RO checklists »Produced by scientists »Automatically checked against minimal model (minim) »RO evolution StabilityCompleteness 1.0 Certificate notion originally proposed by Yde de Jong 1.0 Certificate of quality »Notification »Explanation

19 Lessons learnt Recap »Data with a Purpose »Encapsulate & Conquer ›Goal-driven (purpose) ›Aggregation ›Community-managed »Nothing is immutable, especially data. ›Foster evolution ›Monitor decay Scalability Provenance

20 Questions Thanks for your Attention! Any Questions?