MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.

Slides:

Advertisements

Similar presentations

Enabling and Supporting Provenance in e-Science Applications Luc Moreau University of Southampton

Advertisements

The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.

Classical and myGrid approaches to data mining in bioinformatics

Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,

ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.

IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan

GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,

On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.

An integrative approach for attaching semantic annotations to service descriptions Luc Moreau, University of Southampton,UK.

Doing it again: Workflows and Ontologies Supporting Science Phillip Lord Frank Gibson Newcastle University.

GGF Summer School 24 th July 2004, Italy Part 3: Integrating Services Life Science Identifiers & Information model. Data and Metadata management – the.

The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.

Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.

1 Middleware for In silico Biology Phillip Lord

Migrating to the Semantic Web: Bioinformatics as a case study.

Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.

Provenance in myGrid and beyond Luc Moreau, University of Southampton, UK.

Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.

Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †

Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.

CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.

Taverna and my Grid Basic overview and Introduction Tom Oinn

High level Knowledge-based Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK myGrid project

An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.

1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester

Standards and Ontologies to Enable Discovery Data and Information Integration Robin McEntire GlaxoSmithKline 19 Nov, 2002.

Taverna and my Grid Open Workflow for Life Sciences Tom Oinn

1 The myGrid Project Professor Chris Greenhalgh University of Nottingham.

The Grid as Future Scientific Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance

Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.

MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.

MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact

My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble Sun Microsystems BioGrid Symposium, Baltimore, USA.

E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge.

Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.

Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.

VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,

The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.

©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.

Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.

Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Semantic Mediation in myGrid Chris Wroe Manchester University.

High level Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK Robin McEntire, GSK.

LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.

MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK

Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.

GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome.

© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.

My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.

Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)

MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.

PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester

Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood

The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September

Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.

1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.

MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.

Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.

Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.

Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton.

Katy Wolstencroft University of Manchester

Provenance: Problem, Architectural issues, Towards Trust

LSIDs in Taverna Daniele Turi University of Manchester

A myGrid Project Tutorial

Presentation transcript:

myGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

myGrid: eScience and Bioinformatics Oct 2001 – April £3.4 million. UK e-Science Pilot Project. £0.4 million studentships. Newcastle Nottingham Manchester Southampton Hinxton Sheffield

Data (Type) Intensive Bioinformatics ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC ) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE BINDS PEP (BY SIMILARITY). FT CONFLICT S -> A (IN REF. 3). SQ SEQUENCE 429 AA; MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services Views Legacy apps GowLab

Support not Automation

Thin Semantics PRETTYSEQ of CDS1|>CDS2|strand_1 from 1 to | | | | | | 1 atgacggacactgctggtcgctgtggcttcctcctacgcgttcggtcactcctgcacatg 60 1 M T D T A G R C G F L L R V R S L L H M | | | | | | 61 tccgcagtagtggtgctctcggggaccccctcgccaccccacaataccgctcaccacatg S A V V V L S G T P S P P H N T A H H M gccaaacag A K Q 43 CPGREPORT of CDS1|>CDS2|strand_1 from 1 to 129 Sequence Begin End Score CpG %CG CG/GC CDS1|>CDS2|strand_ ######################################## # Program: restrict # Rundate: Thu Jul 15 16:32: # Report_format: table # Report_file: /scratch/emboss_interfaces/a/unknown/Projects/default/Data/out ######################################## Start End Enzyme_name Restriction_site 5prime 3prime 5primerev 3primerev 4 8 TspGWI ACGGA TspRI CASTGNN BtsI GCAGTG CviJI RGCY MnlI CCTC MluI ACGCGT #

Semantic Discovery with Feta Query-ontology – discovering workflows and services described in the registry by building a query in Taverna. A common ontology is used to annotate and query. (Planning For OBO release)

Knowledge in Feta Ontology (OWL-DL) Service Descriptions (XML) Jena Querying (RDF)

Service Discovery Good: RDF provides a convenient search capability, with a well defined link to an ontology Bad: Unsure about scalability. Issues of security, Concurrency will probably also affect us.

Provenance Bioinformatics has a data circularity problem. Computational data is hard to trace, reproduce or repeat. We need to store provenance. Service Orientated Architecture and Service Descriptions start to enable us to do this.

Provenance: The Semantic Web

Generating Provenance Web Services Taverna FreeFluo Metadata Repository (reified) Data Repository LaunchPadHaystack

Workflow run Workflow design Experiment design Project Person Organisation Process Service Event Data item data derivation e.g. output data derived from input data instanceOf partOf componentProcess e.g. web service invocation of NCBI componentEvent e.g. completion of a web service invocation at 12.04pm runBy e.g. NCBI run for Organisation level provenanceProcess level provenance User can add templates to each workflow process to determine links between data items.

Provenance GOOD: RDF provides a convenient data model, which is flexible, and adaptable. BAD: Visualisation tools are lacking. Scalability even more an of issue with reification

LSID’s Standard identifier mechanism, aimed at the life sciences Has standard resolution mechanism by which the data can be obtained. Has semantics for versioning Has standard association with metadata Abbreviation distressingly similar to LSD

Provenance Used LSID within provenance; all of our data is stored and resolved with LSID Notion of a single identifier system within myGrid is attractive.

Worries We are unclear as how the metadata/data split happens with LSID: Use former for mutability, later for immutability. We have also tending toward using “metadata” for RDF based data, and “data” for relational.

LSID GOOD: Defined resolution mechanism, data and metadata. BAD: Unclear how to use data/metadata split.

Acknowledgements Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock, Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

Summary GOOD: RDF provides a convenient search capability, with a well defined link to an ontology RDF provides a convenient data model, which is flexible, and adaptable. LSID: Defined resolution mechanism, data and metadata. BAD: Unsure about scalability. Issues of security, Concurrency will probably also affect Visualisation tools are lacking. Scalability even more an of issue with reification LSID: Unclear how to use data/metadata split.