myExperiment: Towards Research Objects David De Roure

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

David De Roure Social Networking and Workflows in Research.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
David De Roure Manchester Edition. John Taylor There are a number of grid applications being developed and there is a whole raft of computer technologies.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
David De Roure WSRI Summer School RPI July You will be able to answer the question “What is Web 2.0?” 2.You will have some ideas about how our.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Information and Discovery in Neuroscience (IDN) Carole Palmer Graduate School of Library and Information Science University of Illinois at Urbana-Champaign.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
MyExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, Stian.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
1 Dr. Paolo Missier, Prof. Carole Goble Information Management Group School of Computer Science, University of Manchester, UK with additional material.
Professor Carole Goble
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
David De Roure Repeat, Reuse, Remix, Reproduce, … Reconstructable Research.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
INSPIRE and Linked Data : what are the complementarities? INSPIRE Conference – Istanbul Tutorial/discussion on linked data – june 24th Bénédicte Bucher.
The Collaborative Semantic Grid David De Roure University of Southampton, UK
David De Roure Workflows in Support of Large-Scale Science Provenance, a.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
Co-evolution of digital technologies and research methods David De Roure.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
MyExperiment Team F2F Manchester November Team Face to Face Meeting (Manchester) Thursday, 26th November myExperiment meeting. University.
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
International Planetary Data Alliance Registry Project Update September 16, 2011.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
The Influence and Impact of Web 2.0 on e-Research Infrastructure, Applications and Users User Day.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
The Information Environment for Neuroscientists David R Newman
Enhancements to Galaxy for delivering on NIH Commons
Accessing the VI-SEEM infrastructure
GISELA & CHAIN Workshop Digital Cultural Heritage Network
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
CyVerse Discovery Environment
Professor Carole Goble University of Manchester, UK
DART: Drivers, Design, Dimensions, Demonstrators and Deliverables
The GEMBus Architecture and Core Components
Marketplace & service catalog concepts, first design analysis
Tim Smith CERN Geneva, Switzerland
Introduction, Features & Technology
Publishing Communities
VI-SEEM Data Repository
Alan Williams, Donal Fellows, Finn Bacall,
An ecosystem of contributions
NSDL Data Repository (NDR)
An ontology for e-Research
Social media for global scientific community – Mendeley project
HingX Project Overview
GISELA & CHAIN Workshop Digital Cultural Heritage Network
RDF David R Newman 15 July 2009.
Bird of Feather Session
Grid Systems: What do we need from web service standards?
SDMX IT Tools SDMX Registry
Presentation transcript:

myExperiment: Towards Research Objects David De Roure Building Linked Web Communities in Biomedicine to Accelerate Research

What is it? How it’s being used How we built it Towards the e-Laboratory

Virtual Learning Environment Peer-Reviewed Journal & Conference Papers The social process of Science Virtual Learning Environment 2.0 Undergraduate Students Digital Libraries scientists Graduate Students Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata experimentation Local Web Repositories Data, Metadata Provenance Workflows Ontologies Certified Experimental Results & Analyses

Sharing pieces of process http://usefulchem.wikispaces.com/page/code/EXPLAN001 http://www.microsoft.com/mscorp/tc/trident.mspx http://www.mygrid.org.uk/tools/taverna/ Not just collaboration in workflows, but collaborating with sharing workflows Over 400 taverna workflows publicly available. Combine different formalisms in one system? E.g. a dataflow Kahn network and a central- clock based calculus Kepler logo 4

E. Science laboris Workflows are the new rock and roll Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources The era of Service Oriented Applications Repetitive and mundane boring stuff made easier

Triana Trident Kepler Taverna Ptolemy II BioExtract BPEL Not just collaboration in workflows, but collaborating with sharing workflows Over 400 taverna workflows publicly available. Combine different formalisms in one system? E.g. a dataflow Kahn network and a central- clock based calculus Kepler logo BioExtract 6

Reuse, Recycling, Repurposing Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle Paul meets Jo. Jo is investigating Whipworm in mouse. Jo reuses one of Paul’s workflow without change. Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study by Jo had failed to do this.

“Facebook for Scientists” ...but different to Facebook! A repository of research methods A community social network A Virtual Research Environment Open source (BSD) Ruby on Rails application with HTML, REST and SPARQL interfaces Project started March 2007 Closed beta since July 2007 Open beta November 2007 myExperiment currently has 1712 registered users, 141 groups, 584 Taverna workflows plus 81 others, and 51 packs Go to www.myexperiment.org to access publicly available content or create an account

myExperiment Features User Profiles Groups Friends Sharing Tags Workflows Developer interface Credits and Attributions Fine control over privacy Packs Federation Enactment Distinctives

Control over sharing The most important aspect of myExperiment Designed by scientists

A Pack Workflow 16 Logs Results Metadata Slides Paper Results QTL Logs Results A Pack Metadata Slides Paper Common pathways Results Workflow 13

For Developers All the myExperiment services are accessible through simple RESTful programming interfaces use your existing environment and augment it with myExperiment functionality build entirely new interfaces and functionality mashups The Ruby on Rails codebase is open source (BSD) so you can run your own myExperiment – perhaps for your own lab or to develop new funcionality Go to wiki.myexperiment.org for information about our Developer Community

What is it? How it’s being used How we built it Towards the e-Laboratory

Adam Belloum

SigWin-detector: is a grid-enabled workflow application that takes a sequence of numbers and a series of window sizes as input and detects all significant windows for each window size using a moving median false discovery rate (mmFDR) procedure. WS-VLAM composer Human transcriptome map discovered RIDGE Human transcriptome map DNA curvature of the Escherichia Coli chromosome More details: http://staff.science.uva.nl/~inda/SigWin-detector.html

Carol Lushbough

Google Gadgets Bringing myExperiment to the iGoogle user

Taverna Plugin Bringing myExperiment to the Taverna user

Facebook

Scientists do share!  Consumers > Curators > Producers Of the 661 workflows, 531 are publicly visible whereas 502 are publicly downloadable. 3% of the workflows with restricted access are entirely private to the contributor and for the remaining they elected to share with individual users and groups. 69 workflows (over 10%) have been shared, with the owner granting edit permissions to specific users and groups. In addition there are 52 instances where users have noted that a workflow is based on another workflow on the site. The most viewed workflow has 1566 views. There are 50 packs, ranging from tutorial examples to bundles of materials relating to specific experiments. Scientists do share!  Consumers > Curators > Producers

Analysis Two distinct myExperiment communities: Considerations in Collaborative Curation: Supermarket shoppers Workflow consumers prefer larger workflows ready to be downloaded and enacted Tool builders Workflow authors prefer smaller, modularized workflows which can be assembled & customized Quality and sufficiency of good documentation Content decay surveillance Consumers > curators > producers Contributor, expert and community curation Incentives for curation

What is it? How it’s being used How we built it Towards the e-Laboratory

EPrints DSpace Fedora S3 SRB For Developers ORE FOAF SIOC tags ratings Managed REST API facebook iGoogle android XML ORE FOAF SIOC API config HTML RDF Store SPARQL endpoint Search Engine Search API tags ratings reviews profiles groups workflows credits ` EPrints DSpace Fedora S3 SRB friendships packs files Enactor Enactor API mySQL

Semantically-Interlinked Online Communities SPARQL endpoint PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX myexp: <http://rdf.myexperiment.org/ontology#> PREFIX sioc: <http://rdfs.org/sioc/ns#> select ?friend1 ?friend2 ?acceptedat where {?z rdf:type <http://rdf.myexperiment.org/ontology#Friendship> . ?z myexp:has-requester ?x . ?x sioc:name ?friend1 . ?z myexp:has-accepter ?y . ?y sioc:name ?friend2 . ?z myexp:accepted-at ?acceptedat } All accepted Friendships including accepted-at time Semantically-Interlinked Online Communities

http://rdf.myexperiment.org/Aggregation/Pack/56

Exporting packs

Scientific Discourse Relationships Ontology Specification Open Provenance Model Communications of the ACM 51, 4 (Apr. 2008), 52-58

Phase 2 Phase 2 Repository integration (institutional: EPrints, Fedora) Controlled vocabularies Relationships between items (in and between packs) Recommendations Improved search ranking and faceted browsing Indexing of packs New contribution types (Meandre, Kepler, e-books) Further blog / wiki integration Biocatalogue integration

Content Capture and Curation Reuse and Symbiosis Content Capture and Curation Self by Service Providers Experts refine validate refine validate seed seed Workflows and Services refine validate refine validate seed seed In particular a platform for research into curation practices As in the panel today Expert – Is library like Suppliers and crowd are the web side Automated is Expert curators: bioinformaticians who understand the services and workflows whose job it is to annotate and set up the curation pipelines, for services and workflows that are not of their own making. Self-curation: Some registries are closed – the myGrid registry is only curated by experts from the myGrid project itself. Others encourage service developers to self-curate, emphasising the use of plug-ins to service development environments such as Eclipse; examples include BioMoby’s jMoby plugin and SAWSDL4J, Lumina and Radiant toolkits for SAWSDL and WSMO Studio (21). Workflow repositories such as myExperiment rely on self-curation by the workflow developers and community curation by their users. Challenges include (a) the enforcement of controlled vocabularies by self-curators, particularly if the vocabularies are also managed by the developers as they can quickly become unruly and (b) incentivising people to contribute their services and workflows for the good of the community. Community Curators: The trend is to follow in the footsteps of popular Web 2.0 social computing sites and encourage community curation through user feedback, blogging, e-tracking, recommendations and folksonomy based tagging. Community approach to services development and use being tried by Seekda and BioMoby and for workflows by myExperiment. Community and self-curation requires built-in incentive models for people to contribute such as credit and attribution, but can be made to work for example iCapture successfully pioneered community curation of ontologies (Wilkinson PSB). Automated Curators: Automated scavengers and crawlers identify candidates for submission and extract as much metadata as possible. Functional metadata is hard to auto-curate, requiring: specialist metadata extraction tools [54]; software plug-ins that incidentally gather metadata from services as they are used in applications; or smart reasoning over seeded service descriptions and workflows [54]. Operational and usage metadata is ripe for automation, generated from monitoring services, application diagnostics, customer reports and Social Network Analysis. Workflow analytics is the term used for processing workflow collections to identify, for example, service co-use patterns and service popularity. Automated curation needs excellent infrastructure. Social by User Community Automated

Six Principles of Software Design to Empower Scientists Fit in, Don’t Force Change Jam today and more jam tomorrow Just in Time and Just Enough Act Local, think Global Enable Users to Add Value Design for Network Effects Keep your Friends Close Embed Keep Sight of the Bigger Picture Favours will be in your Favour Know your users Expect and Anticipate Change De Roure, D. and Goble, C. "Software Design for Empowering Scientists," IEEE Software, vol. 26, no. 1, pp. 88-95, January/February 2009

What is it? How it’s being used How we built it Towards the e-Laboratory

e-Laboratory Lifecycle Local projects using Taverna and/or myExperiment SysMO Ondex NEMA Obesity eLab Shared Genomics CombeChem LifeGuide IBBRE

What is an e-Laboratory? A laboratory is a facility that provides controlled conditions in which scientific research, experiments and measurements may be performed, offering a work space for researchers. An e-Laboratory is a set of integrated components that, used together, form a distributed and collaborative space for e-Science, enabling the planning and execution of in silico experiments -- processes that combine data with computational activities to yield experimental results

e-Labs An e-Lab consists of: a community work objects generic resources for building and transforming work objects Sharing infrastructure and content across projects People Data Methods

e-Labs + Research Objects An e-Lab is built from a collection of services, consuming and producing Research Objects Visualisation Notification Annotation etc. Workbench/ RO driven UI Service RO Bus RO aware services Service Service Service

e-Laboratory Evolution 1st Generation Current practice of early adoptors of e-Labs tools such as Taverna Characterised by researchers using tools within their particular problem area, with some re-use of tools, data and methods within the discipline. Traditional publishing is supplemented by publication of some digital artefacts like workflows and links to data. Provenance is recorded but not shared and re-used. Science is accelerated and practice beginning to shift to emphasise in silico work 2nd Generation Designing and delivering now, e.g. Obesity e-Lab Experience with Taverna and myExperiment and on our research results arising from these activities Key characteristic is re-use - of the increasing pool of tools, data and methods across areas/disciplines. Contain some freestanding, recombinant, reproducible research objects. Provenance analytics plays a role. New scientific practices are established and opportunities arise for completely new scientific investigations. 3rd Generation The vision - the e-Labs we'll be delivering in 5 years - illustrated by open science. Characterised by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher. Key characteristic is radical sharing Research is significantly data driven - plundering the backlog of data, results and methods. Increasing automation and decision-support for the researcher - the e-Laboratory becomes assistive. Provenance assists design Curation is autonomic and social

Assembling e-Laboratories Example Core Services Workflow Monitoring Event Logging Social Metadata Annotation Service Search, ranking User Registration Distributed Data Query Job Execution Naming and Identity Anonimisation Text Mining Research Object Management Probity Coreference Resolution An e-Lab is a set of components and resources An open system, not a software monolith Utility of components transcends their immediate application We envisage an ecosystem of cooperating e-Laboratories What are the e-Lab components and services? What are the Research Objects?

Workflow 16 Results Logs Metadata Slides Paper Results Workflow 13 Paul Fisher Workflow 16 QTL Results Logs produces Included in Published in Included in Feeds into produces Included in Included in Metadata Slides Paper produces Published in Common pathways Results Workflow 13

David Shotton

Anatomy of a Research Object

SWAN-SIOC Experiments myExperiment Tim Clark

Characteristics of a Research Object Composite. Contain typed interrelationships and dependencies between resources but are in turn labelled and identifiable as an individual resource. Distributed. Structured collections of references to locally managed and externally located resources. Implications for reliability, consistency, mixed stewardship, versioning and identity resolution. Annotated. Carry metadata concerning provenance profile, lifecycle profile, sharing profile (permissions, licensing, downloads, views), curation profile (tags, comments, ratings) and usage profile. Repeatable. Capture information about the lifecycle of the investigation facilitating experiments to be repeatable (without change), reusable (with reconfiguration), replayable and/or repurposable (as new components or templates). Interoperable. Publishable and exchangeable units that facilitate interoperability; OAI-ORE standards increase interoperability and facilitate the consumption of Research Objects in between applications.

Thoughts myExperiment provides social infrastructure – it facilitates sharing and enables scientists to “collaborate in order to compete” myExperiment has growing community and growing content New content types: meandre, kepler, R, matlab, ..., spreadsheets? SPARQL queries? We are targetting how we believe research will be conducted in the future, through the assembly of e-Laboratories which share Research Objects SPARQL endpoint is an effective alternative to the API – provides any service you want! Workflows for Semantic Web scripting?

Simon Coles, Paul Fisher, Adam Belloum, Sean Bechhofer, David Shotton Contact David De Roure dder@ecs.soton.ac.uk Carole Goble carole.goble@manchester.ac.uk Slide Credits Simon Coles, Paul Fisher, Adam Belloum, Sean Bechhofer, David Shotton