GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome.

Slides:



Advertisements
Similar presentations
Meta Data Larry, Stirling md on data access – data types, domain meta-data discovery Scott, Ohio State – caBIG md driven architecture semantic md Alexander.
Advertisements

IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,
On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.
GGF Summer School 24 th July 2004, Italy Part 3: Integrating Services Life Science Identifiers & Information model. Data and Metadata management – the.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
1 Middleware for In silico Biology Phillip Lord
Migrating to the Semantic Web: Bioinformatics as a case study.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.
Taverna and my Grid Basic overview and Introduction Tom Oinn
High level Knowledge-based Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK myGrid project
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
1 The myGrid Project Professor Chris Greenhalgh University of Nottingham.
The Grid as Future Scientific Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.
KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Semantic Mediation in myGrid Chris Wroe Manchester University.
High level Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK Robin McEntire, GSK.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
E-Science Process. Thoughts on the e-Science Mediator in myGrid M.Nedim Alpdemir.
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Katy Wolstencroft University of Manchester
Provenance: Problem, Architectural issues, Towards Trust
A myGrid Project Tutorial
Presentation transcript:

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome Professor Carole Goble

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Acknowledgements myGrid is an EPSRC funded UK eScience Program Pilot Project Particular thanks to the other members of the Taverna project,

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Roadmap my Grid in a nutshell Gene characterisation in Williams-Beuren Syndrome. Semantic Aspects –Information model –Service discovery –Data Management - LSID –Metadata management for provenance – RDF Lessons learnt and opportunities

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Experiment life cycle Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisation Forming experiments Executing and monitoring experiments

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 In a nutshell Bioinformatics toolkit Open (Web) Services –myGrid components –External domain services –No control or influence over service providers Open to third party metadata Open extensible architecture –Assemble your own components –Designed to work together –Toolkit –Axis/Apache based –RDF and DAML+OIL/OWL –Jena, OilEd, Instance Store & FaCT Freefluo WfEE Taverna WfDE View UDDI registry Event Notification mIR Pedro Semantic Discovery Info. Model Soaplab Gowlab Gateway & CHEF Portal LSID Haystack Provenance Browser

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Williams-Beuren Syndrome Microdeletion of 155 Mbases on Chromosome 7 Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Characterise an unknown gene Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Williams-Beuren Syndrome Microdeletion ** Chr 7 ~155 Mb ~1.5 Mb 7q11.23 GTF2I RFC2CYLN2 GTF2IRD1 NCF1 WBSCR1/E1f4H LIMK1ELNCLDN4CLDN3STX1A WBSCR18 WBSCR21 TBL2BCL7BBAZ1B FZD9 WBSCR5/LAB WBSCR22 FKBP6POM121 NOLR1 GTF2IRD2 C-cen C-midA-cen B-mid B-cen A-midB-telA-telC-tel WBSCR14 WBS SVAS STAG3 PMS2L Block A FKBP6T POM121 NOLR1 Block C GTF2IP NCF1P GTF2IRD2P Block B Patient deletions CTA-315H11 CTB-51J22 Gap Physical Map

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Filling a genomic gap Two major steps: Extend into the gap: Similarity searches; RepeatMasker, BLAST Characterise the new sequence: NIX, Interpro, etc… Numerous web-based services (i.e. BLAST, RepeatMasker) Cutting and pasting between screens Large number of steps Frequently repeated – info now rapidly added to public databases Don’t always get results Time consuming Huge amount of interrelated data is produced – handled in lab book and files saved to local hard drive Mundane Much knowledge remains undocumented Bioinformatician does the analysis

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Point, click, cut, paste ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC ) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE BINDS PEP (BY SIMILARITY). FT CONFLICT S -> A (IN REF. 3). SQ SEQUENCE 429 AA; MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 WBS Workflows: GenBank Accession No GenBank Entry Seqret Nucleotide seq (Fasta) GenScanCoding sequence ORFs prettyseq restrict cpgreport RepeatMasker ncbiBlastWrapper sixpack transeq 6 ORFs Restriction enzyme map CpG Island locations and % Repetative elements Translation/sequence file. Good for records and publications Blastn Vs nr, est databases. Amino Acid translation epestfind pepcoil pepstats pscan Identifies PEST seq Identifies FingerPRINTS MW, length, charge, pI, etc Predicts Coiled-coil regions SignalP TargetP PSORTII InterPro PFAM Prosite Smart Hydrophobic regions Predicts cellular location Identifies functional and structural domains/motifs Pepwindow? Octanol? ncbiBlastWrapper URL inc GB identifier tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr RepeatMasker Query nucleotide sequence ncbiBlastWrapper Sort for appropriate Sequences only Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns RepeatMasker

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Collections of Tasks Finding Description Service Discovery Enactment Building Workflow Provenance Storage Data Management Querying Domain Tasks Service Providers Bioinformaticians Scientists Annotation providers

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Registry mIR Feta Haystack Provenance Browser FreeFluo WfEE Taverna WfDE Pedro Annotation tool Ontology Store Others WSDL Soap- lab Interface Description Annotation/description Annotation providers Query & Retrieve Workflow Execution Store data/ knowledge Scientists Bioinformaticians invoking Querying/sharing/ federating/registering Service Providers Data descriptions Vocabulary

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 High level architecture Freefluo Workflow Engine LSID Authority UDDI mIR Store Service Provenance and Data browser i.e. Haystack Web services, local tools User interaction etc. Taverna Workbench View Service Semantic Discovery & Registration Event Notification Service

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 WBS task Wrap services as web services Register them Build a workflow using the services Evolve the workflow Run it over and over again in case data has changed Record results & provenance Inspect and compare results & provenance Event notification, portal, 3 rd party annotation…

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 User Results Benchmark: Two iterations of workflows (1 day run) –Reduced gap by bp at its centrmeric end –Correctly located all seven known genes in this region –Identified 33 of the 36 known exons residing in this location Manually: takes two days (+) including analysis Now: takes 30 mins to produce results and half a day for analysis. Less boring. Less prone to mistakes. Once notification installed won’t even have to initiate it.

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Where is the semantics

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Information Model v2 Resources and Identifiers People, teams and organizations Representing the e-science process Experimental methods for e- science Scientific data and the life-science identifier –Types –Identifier Types –Values and Documents Provenance information Annotation and Argumentation XML messages between services conform to the IMv2

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Semantic discovery The User does the choosing of services A common ontology is used to annotate and query any myGrid object including services. Ontology is built using DAML+OIL and reasoning Deployed as a static RDF graph Discover workflows and services described in the registry via Taverna. Look for all workflows that accept an input of semantic type nucleotide sequence.

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Role of Ontologies Composing and validating workflows and service compositions & negotiations Describing & Linking Provenance records Change & event Notification topics Ontologies Resource annotations Service & resource registration & discovery Schema mediation Controlling contents of metadata and data Knowledge-based guidance and recommendation Service matching and provisioning Help

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Observations

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Services Practically all the services are remote and third party Services are changeable and unreliable Redundant services are essential WSDL in the wild is poor Automated annotation

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Can you guess what it is yet?

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 operation name, description input output task method resource application workflow bioMoby service WSDL operation Soaplab service service name, description author organisation WSDL service parameter name, description semantic type format transport type collection type collection format Model of services

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 SHIM Services Main Bioinformatics Applications Main Bioinformatics Services Main Bioinformatics Application SHIM Services Services that enable domain services to fit together Outnumber domain services Libraries Candidates for automatic selection, composition and substitution

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Results management Automated workflows produce lots of heterogeneous data These are just some of the results from one workflow run for Williams Disease

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Amplification One input Many outputs

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 FreeFluo agnostic about the data flowing through it. Taverna includes a DataThing class, which can be tagged with terms from ontologies, free text descriptions and MIME types, and which may contain arbitrary collection structures. Using the metadata hints we can locate and launch pluggable view components. Hybrid typing scheme allows for a ‘best effort’ approach to data typing. Life science types are intractable for reasonable effort or completeness. Dealing with results

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Implicit iteration framework handles type mismatches where cardinality changes are required Permissive type scheme, guides rather than enforces Graphical view supplemented by tree ‘explorer’ style view High level language wraps low level operations into sensible conceptual units. Configurable fault handling.

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Intermediate Results

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Intermediate Results Workflows change the way the bioinformatican works Before: analyse results as go along After: all results in one go So linking intermediate results important

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Life Science IDs LSID provides a uniform naming scheme. LSID Resolver guarantees to resolve to same data object. LSID Authority dishes them out. Also returns metadata of object. Used throughout my Grid as an object naming device. my Grid Repository acts an LSID Authority LSID allows universal access to results for collaboration, as well as for review. RDF+LSID explains the context of results, and provides guidance for further investigations. Pioneered by my Grid I3C / IBM / EBI proposal for a Life Science Identifier

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Process Provenance

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Link v Data Representation Data management questions refer to relationships rather than internal content –What are the origins of this data? Which service produced this data? Which data is this derived from? Who was this data produced for? ?What is this data telling me? Data analysis questions delegated to external services.

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Representing links Identify each resource –Life science identifier: URI with associated data and metadata retrieval protocols. –Understanding that underlying data will not change urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Representing links II Identify link type –Again use URI –Allows us to use RDF infrastructure Repositories Ontologies urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Workflow run Workflow design Experiment design Project Person Organisation Process Service Event Data item data derivation e.g. output data derived from input data knowledge statements e.g. similar protein sequence to instanceOf partOf componentProcess e.g. web service invocation of NCBI componentEvent e.g. completion of a web service invocation at 12.04pm runBy e.g. NCBI run for Organisation level provenanceProcess level provenance Data/ knowledge level provenance User can add templates to each workflow process to determine links between data items.

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, AC Homo sapiens BAC clone CTA-315H11 from 7, complete sequence AC Homo sapiens BAC clone RP11-622P13 from 7, complete sequence AL Human DNA sequence from clone RP11-553N16 on chromosome 1, complete sequence AL Homo sapiens chromosome 21 segment HS21C AL Human chromosome 14 DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence BX Homo sapiens mRNA; cDNA DKFZp686G08119 (from clone DKFZp686G08119) AC Homo sapiens 12q22 BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence AK Homo sapiens cDNA FLJ45040 fis, clone BRAWH AC Homo sapiens chromosome 17, clone RP11-104J23, complete sequence AL Human DNA sequence from clone RP4-715N11 on chromosome 20q Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence AC Homo sapiens BAC clone RP11-731I19 from 2, complete sequence AC Homo sapiens chromosome 15, clone RP11-342M21, complete sequence AL Human DNA sequence from clone RP11-461K13 on chromosome 10, complete sequence AC Homo sapiens PAC clone RP3-368G6 from X, complete sequence AC Homo sapiens chromosome 4 clone B200N5 map 4q25, complete sequence AF Homo sapiens chromosome 21q22.3 PAC 171F15, complete sequence >gi| |gb|AC | Homo sapiens BAC clone CTA-315H11 from 7, complete sequence AAGCTTTTCTGGCACTGTTTCCTTCTT CCTGATAACCAGAGAAGGAAAAGATC TCCATTTTACAGATGAG GAAACAGGCTCAGAGAGGTCAAGGCT CTGGCTCAAGGTCACACAGCCTGGGA ACGGCAAAGCTGATATTC AAACCCAAGCATCTTGGCTCCAAAGC CCTGGTTTCTGTTCCCACTACTGTCAG TGACCTTGGCAAGCCCT GTCCTCCTCCGGGCTTCACTCTGCAC ACCTGTAACCTGGGGTTAAATGGGCT CACCTGGACTGTTGAGCG urn:lsid:taverna:datathing:15..BLAST_Report rdf:type urn:lsid:taverna:datathing:13..similar_sequences_to.. nucleotide_sequence rdf:type service invocation..created_by workflow invocation workflow definition experiment definition project person group service description organisation..described_by..run_during..invocation_of..part_of..works_for..part_of..author..run_for AB..masked_sequence_of..filtered_version_of Relationship BLAST report has with other items in the repository Other classes of information related to BLAST report Provenance tracking Automated generation of this web of links preferable Workflow enactor generates –LSIDs –Data derivation links –Knowledge links –Process links –Organisation links

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Storage LSID has no protocol for storage Taverna/ Freefluo implements its own data/ metadata storage protocol Taverna/ Freefluo Metadata Store Data store Publish interface data metadata

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Retrieval LSID protocol used to retrieve data and metadata Query handled separately Metadata Store Data store LSID interface LSID aware client Query RDF aware client Taverna/ Freefluo Metadata Store Data store Publish interface data metadata

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 GenBank record Portion of the Web of provenance Managing collection of sequences for review IBM’s BioHaystack

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Observations Managed the transition from generic middleware development to practical day to day useful services Real users (plural) fundamental to that End to end support for an entire scenario Bury the semantics Show stoppers for practical adoption are not technical showstoppers –Can I incorporate my favourite service? –Can I manage the results? By tapping into (defacto) standards and communities we can leverage others results and tools – LSID, Haystack, Pedro.

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Acknowledgements myGrid is an EPSRC funded UK eScience Program Pilot Project Particular thanks to the other members of the Taverna project,

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 my Grid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick- Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10,

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Semantic Futures Information Management –More on Results management –Complete Deployment of Information Model –Using provenance and event notification – e.g. for impact analysis Access –CHEF-based portal for finding workflows, launching & monitoring workflows, launching Taverna, browsing results –Redeveloping the view registry to be more efficient. –Deploying publicly accessible semantic registry Workflow enactment –Reinstate service discovery during enactment Authorisation & Authentication

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Summary my Grid offers service based middleware components Open source and freely downloadable Open Grid Service Architecture-compliant Allows the scientist to be at the centre of the Grid -- Personalisation Generic middleware that suits the creation of bioinformatics applications Inclusion of rich semantics to facilitate the scientific process Available from