David De Roure Creating Research Objects that contain collections of data, papers and research workflows.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

CoAKTing IFD Dave in Hawaii. 2 CoAKTing IFD n Objective is to advance the state of the art in collaborative mediated spaces for distributed e- Science.
David De Roure Social Networking and Workflows in Research.
David De Roure. Between 19 th October and 23 rd November 2007 I attended six international meetings related to e-Science Grid 2007 Scientific and Scholarly.
PaN-data WP7 - Integration Brian Matthews STFC-e-Science.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012.
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
David De Roure Manchester Edition. John Taylor There are a number of grid applications being developed and there is a whole raft of computer technologies.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
David De Roure Eindhoven Edition. Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Sean Making Metadata Work, ISKO London, 23 rd June 2014 Metadata for Research Objects 1.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
David De Roure WSRI Summer School RPI July You will be able to answer the question “What is Web 2.0?” 2.You will have some ideas about how our.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
My Experiment – A Web 2.0 Virtual Research Environment David De Roure Carole Goble.
Taverna in e-Lico  e-Lico is an EU Project ( ) to create a virtual laboratory for data mining and data-intensive sciences  Main partners: –University.
Speaker: Oscar Corcho Building Semantic Sensor Webs and Applications ESWC 2011 Tutorial 29 May 2011.
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
© 2008 IBM Corporation ® IBM Cognos Business Viewpoint Miguel Garcia - Solutions Architect.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
MyExperiment Research Objects: Beyond Workflows and Packs Stian Soiland-Reyes myGrid, University of Manchester BOSC 2013, ISMB, Berlin, This.
David De Roure University of Southampton, UK Carole Goble The University of Manchester, UK A Web 2.0 Virtual Research Environment OGF Semantic Grid Research.
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich,
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
Joint agINFRA & SCI-BUS workshop, 30/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA Joint agINFRA & SCI-BUS workshop agINFRA.
MyExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, Stian.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
1 Dr. Paolo Missier, Prof. Carole Goble Information Management Group School of Computer Science, University of Manchester, UK with additional material.
Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project:
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
Professor Carole Goble
Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar April 2012 José Manuel Gómez Pérez, iSOCO
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
David De Roure Repeat, Reuse, Remix, Reproduce, … Reconstructable Research.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
1 Curation and Characterization of Web Services Jose Enrique Ruiz October 23 rd IVOA Fall Interop Meeting - Sao Paolo.
Khalid Belhajjame 1, Paolo Missier 2, and Carole A. Goble 1 1 University of Manchester 2 University of Newcastle Detecting Duplicate Records in Scientific.
A presentation about myExperiment David De Roure and Carole Goble.
CI.III.1 Wider Adoption, Deployment, Utilization of a Cyberinfrastructure David De Roure.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
W ORKFLOW -C ENTRIC R ESEARCH O BJECTS : F IRST C LASS C ITIZENS IN S CHOLARLY D ISCOURSE Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo.
Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre
Co-evolution of digital technologies and research methods David De Roure.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
MyExperiment Team F2F Manchester November Team Face to Face Meeting (Manchester) Thursday, 26th November myExperiment meeting. University.
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
The Influence and Impact of Web 2.0 on e-Research Infrastructure, Applications and Users User Day.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
myExperiment: Towards Research Objects David De Roure
Professor Carole Goble University of Manchester, UK
Alan Williams, Donal Fellows, Finn Bacall,
1st International Conference on Semantics, Knowledge and Grid
An ontology for e-Research
Presentation transcript:

David De Roure Creating Research Objects that contain collections of data, papers and research workflows

“Web as carrier pigeon”

BioEssays,, 26(1):99–105, January

Social Complexity Compute & Data Complexity

1.The myExperiment experiment 2.Workflow Forever 3.Science fiction about science facts Outline

Data Analysis Pipelines Workflows are the new rock and roll Machinery for coordinating the execution of services and linking together resources Repetitive and mundane boring stuff made easier E. Science laboris

Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle Paul meets Jo. Jo is investigating Whipworm in mouse. Jo reuses one of Paul’s workflow without change. Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study by Jo had failed to do this. Reuse, Recycling, Repurposing

Kepler Triana BPEL Taverna Trident Meandre Galaxy

mySpace for scientists!Facebook Not too open! too passé!

 “Facebook for Scientists”...but different to Facebook!  A repository of research methods  A community social network of people and things  A Social Virtual Research Environment  A probe into researcher behaviour  Open source (BSD) Ruby on Rails app  REST and SPARQL interfaces, supports Linked Data  Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 309 groups, 2553 workflows, 651 files and 264 packs - see wiki.myexperiment.org

Results Logs Results Metadata Paper Slides Feeds into produces Included in produces Published in produces Included in Published in Workflow 16 Workflow 13 Common pathways QTL Paul’s Pack Paul’s Research Object

data method

SELECT?pack ?contrib WHERE { ?pack rdf:type mepack:Pack. ?pack ore:aggregates ?contrib. } SELECT?pack ?contrib WHERE { ?pack rdf:type mepack:Pack. ?pack ore:aggregates ?contrib. } SELECT?wf ?uri WHERE { ?wf mebase:has-current-version ?v. ?v mecomp:executes-dataflow ?d. ?d mecomp:has-component ?c. ?c rdf:type mecomp:WSDLProcessor. ?c mecomp:processor-uri ?uri. } SELECT?wf ?uri WHERE { ?wf mebase:has-current-version ?v. ?v mecomp:executes-dataflow ?d. ?d mecomp:has-component ?c. ?c rdf:type mecomp:WSDLProcessor. ?c mecomp:processor-uri ?uri. }

Workflow – pack contains a number of workflows Presentation - encapsulation of a single presentation Collection - a number of things (workflows/presentations/pa pers) Heterogeneous - where the workflows do not appear to have a clear common purpose Homogeneous - workflows appear to be designed to work together Paper - source for a paper Tutorial - tutorial material Data - collection of data files Derived data - results of workflow Benchmark - benchmarking data Supplementary - stuff associated with a paper Noise - tests, tryouts, rubbish Oddity - none of the above Analysis by Sean Bechhofer Pack analysis

Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed. Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years. Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable. Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. The R dimensions Replacing the Paper: The Twelve Rs of the e-Research Record” on

20 items in this RO, including 3 big workflows and a small pack Research Object: Last execution:Stability: Decay:Annotations: Aggregated resources (20) Evolution. Reused by 4 users Cited by 3 users Liked by 13 users Simple status indicators Abstract (250 chars max.) Key resources inside Popularity Title and basic facts Users’ opinion Collapsed tabs Resources diagram

Q. Are we locking into the paper process? Publish then filter – put everything out there, then see what sticks Web-Particle duality – versioning, conservation, preservation

Machine repeat Machine repeat REPRODUCE Machine software paper Research Record software Software REPRODUCE OR REPEAT? software workflow paper Software wf Machine software workflow software blogs.scilogs.com/eresearch/

openresearchsoftware.metajnl.com

The Executable Thesis new data new results executable thesis PhD Student

Notifications and automatic re-runs Machines are users too Autonomic Curation Self-repair New research? New computer science?

Luna De Ferrari

Knowledge infrastructures comprise robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds Knowledge Infrastructures Rethinking knowledge now that the facts aren't the facts, experts are everywhere, and the smartest person in the room is the room

Automation versus assistance – Letting humans get on with what they’re best at Role of narrative and visualisation – The last mile to the brain Data quality and uncertainty – Data wrangling is significant task today – Provenance, peer-to-peer review? Responsible Innovation – Who owns the intellectual property? – Who is responsible for damage? Enabling or preventing a paradigm shift? – Encoding a research paradigm in the infrastructure? Discussion

myExperiment project wiki Workflow Forever project (Wf4Ever) Future of Research Communication (FORCE11) Fourth Paradigm us/collaboration/fourthparadigm/ us/collaboration/fourthparadigm/ Links

Jun Zhao, Jose Manuel Gomez-Perezy, Khalid Belhajjame, Graham Klyne, Esteban Garcia- Cuestay, Aleix Garridoy, Kristina Hettne, Marco Roos, David De Roure, Carole Goble, "Why Workflows Break - Understanding and Combating Decay in Taverna Workflows", accepted for eScience 2012, Chicago, October 2012 Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo Missier, David Newman, Raul Palma, Sean Bechhofer, Esteban Garc Cuesta, Jose Manuel Gomez-Perez, Graham Klyne, Kevin Page, Marco, Roos, Jose Enrique Ruiz, Stian Soiland-Reyes, Lourdes Verdes- Montenegro, David De Roure and Carole A. Goble, "Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse", SePublica2012 at ESWC2012, Greece, May Carole A. Goble, David De Roure and Sean Bechhofer, "Accelerating scientists’ knowledge turns". In press for publication in Lecture Notes in Computer Science. Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, “Why linked data is not enough for scientists”, Future Generation Computer Systems De Roure, D., Goble, C. and Stevens, R. (2009) The Design and Realisation of the myExperiment Virtual Research Environment for Social Sharing of Workflows. Future Generation Computer Systems 25, pp doi: /j.future Goble, C.A., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., and De Roure, D.: myExperiment: a repository and social network for the sharing of bioinformatics workflows, Nucl. Acids Res., 2010