Presentation is loading. Please wait.

Presentation is loading. Please wait.

MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.

Similar presentations


Presentation on theme: "MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester."— Presentation transcript:

1 myGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

2 myGrid: eScience and Bioinformatics Oct 2001 – April 2005. £3.4 million. UK e-Science Pilot Project. £0.4 million studentships. Newcastle Nottingham Manchester Southampton Hinxton Sheffield

3 Data (Type) Intensive Bioinformatics ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

4 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services Views Legacy apps GowLab

5

6 Support not Automation

7 Thin Semantics PRETTYSEQ of CDS1|>CDS2|strand_1 from 1 to 129 ---------|---------|---------|---------|---------|---------| 1 atgacggacactgctggtcgctgtggcttcctcctacgcgttcggtcactcctgcacatg 60 1 M T D T A G R C G F L L R V R S L L H M 20 ---------|---------|---------|---------|---------|---------| 61 tccgcagtagtggtgctctcggggaccccctcgccaccccacaataccgctcaccacatg 120 21 S A V V V L S G T P S P P H N T A H H M 40 --------- 121 gccaaacag 129 41 A K Q 43 CPGREPORT of CDS1|>CDS2|strand_1 from 1 to 129 Sequence Begin End Score CpG %CG CG/GC CDS1|>CDS2|strand_1 5 109 58 9 64.8 1.12 ######################################## # Program: restrict # Rundate: Thu Jul 15 16:32:30 2004 # Report_format: table # Report_file: /scratch/emboss_interfaces/a/unknown/Projects/default/Data/out1089905549241 ######################################## Start End Enzyme_name Restriction_site 5prime 3prime 5primerev 3primerev 4 8 TspGWI ACGGA 19 17.. 9 15 TspRI CASTGNN 15 6.. 14 19 BtsI GCAGTG 8 6.. 25 28 CviJI RGCY 26 26.. 30 33 MnlI CCTC 40 39.. 36 41 MluI ACGCGT 36 40.. #---------------------------------------

8 Semantic Discovery with Feta Query-ontology – discovering workflows and services described in the registry by building a query in Taverna. A common ontology is used to annotate and query. (Planning For OBO release)

9 Knowledge in Feta Ontology (OWL-DL) Service Descriptions (XML) Jena Querying (RDF)

10 Service Discovery Good: RDF provides a convenient search capability, with a well defined link to an ontology Bad: Unsure about scalability. Issues of security, Concurrency will probably also affect us.

11 Provenance Bioinformatics has a data circularity problem. Computational data is hard to trace, reproduce or repeat. We need to store provenance. Service Orientated Architecture and Service Descriptions start to enable us to do this.

12 Provenance: The Semantic Web

13 Generating Provenance Web Services Taverna FreeFluo Metadata Repository (reified) Data Repository LaunchPadHaystack

14 Workflow run Workflow design Experiment design Project Person Organisation Process Service Event Data item data derivation e.g. output data derived from input data instanceOf partOf componentProcess e.g. web service invocation of BLAST @ NCBI componentEvent e.g. completion of a web service invocation at 12.04pm runBy e.g. BLAST @ NCBI run for Organisation level provenanceProcess level provenance User can add templates to each workflow process to determine links between data items.

15

16 Provenance GOOD: RDF provides a convenient data model, which is flexible, and adaptable. BAD: Visualisation tools are lacking. Scalability even more an of issue with reification

17 LSID’s Standard identifier mechanism, aimed at the life sciences Has standard resolution mechanism by which the data can be obtained. Has semantics for versioning Has standard association with metadata Abbreviation distressingly similar to LSD

18 Provenance Used LSID within provenance; all of our data is stored and resolved with LSID Notion of a single identifier system within myGrid is attractive.

19 Worries We are unclear as how the metadata/data split happens with LSID: Use former for mutability, later for immutability. We have also tending toward using “metadata” for RDF based data, and “data” for relational.

20 LSID GOOD: Defined resolution mechanism, data and metadata. BAD: Unclear how to use data/metadata split.

21 Acknowledgements Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock, Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

22 Summary GOOD: RDF provides a convenient search capability, with a well defined link to an ontology RDF provides a convenient data model, which is flexible, and adaptable. LSID: Defined resolution mechanism, data and metadata. BAD: Unsure about scalability. Issues of security, Concurrency will probably also affect Visualisation tools are lacking. Scalability even more an of issue with reification LSID: Unclear how to use data/metadata split.


Download ppt "MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester."

Similar presentations


Ads by Google