Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Slides:



Advertisements
Similar presentations
1 Semantic Webs and The Semantic Web: Services, Resources and Technologies for Clinical Care and Biomedical Research Alan Rector School of Computer Science.
Advertisements

Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK.
Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Center for Bioinformatics, University of Tübingen
Peter Rice Bioinformatics and Grid: Progress and Potential Peter Rice, EBI ISGC, April 2005.
Classical and myGrid approaches to data mining in bioinformatics
Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,
ISMB Demo; June 27, 2005 Integrating Text Mining into Bio-Informatics Workflows Neil Davis George Demetriou Robert Gaizauskas Yikun Guo Ian Roberts Henk.
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,
On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.
Doing it again: Workflows and Ontologies Supporting Science Phillip Lord Frank Gibson Newcastle University.
Workflows within Taverna Stuart Owen University of Mancester, UK
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
The Representation of Scientific Data
1 Middleware for In silico Biology Phillip Lord
Migrating to the Semantic Web: Bioinformatics as a case study.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
An Introduction to Taverna Dr. Georgina Moulton and Stian Soiland The University of Manchester
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman.
CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.
Science, Workflows and Collections Professor Carole Goble The University of Manchester, UK
The Taverna Workbench: Integrating and analysing biological and clinical data with computerised workflows Dr Katy Wolstencroft myGrid University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
OMII-UK Software Activities Steven Newhouse, Director.
(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
1 The myGrid Project Professor Chris Greenhalgh University of Nottingham.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester
GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome.
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
Taverna Workbench Stuart Owen University of Mancester, UK
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
First International Workshop on Portals for Life Sciences Sandra Gesing
EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester
PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft and Aleksandra Pawlik.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
1 The Grid for Engineers Ruth Pordes Fermilab With thanks for slides from Ian Foster, Vicky White, and many others.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Mangaldai College, Mangaldai
Distributed Computing for System Biology using Taverna Workflows
A myGrid Project Tutorial
Presentation transcript:

Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass, a M. Tassabehji b a Department of Computer Science University of Manchester b University of Manchester, Academic Unit of Medical Genetics St Mary’s Hospital c European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton

Williams-Beuren Syndrome (WBS) Congenital disorder caused by sporadic gene deletion 1/20,000 live births Effects multiple systems – muscular, nervous, circulatory Characteristic facial features Unique cognitive profile Mental retardation (IQ , mean~60, ‘normal’ mean ~ 100 ) Outgoing personality, friendly nature, ‘charming’ Haploinsuffieciency of the region results in the phenotype

Williams-Beuren Syndrome Microdeletion Chr 7 ~155 Mb ~1.5 Mb 7q11.23 GTF2I RFC2CYLN2 GTF2IRD1 NCF1 WBSCR1/E1f4H LIMK1ELNCLDN4CLDN3STX1A WBSCR18 WBSCR21 TBL2BCL7BBAZ1B FZD9 WBSCR5/LAB WBSCR22 FKBP6POM121 NOLR1 GTF2IRD2 C-cen C-midA-cen B-mid B-cen A-midB-telA-telC-tel WBSCR14 STAG3 PMS2L Block A FKBP6T POM121 NOLR1 Block C GTF2IP NCF1P GTF2IRD2P Block B ** WBS SVAS Patient deletions CTA-315H11CTB-51J22 ‘Gap’ Physical Map Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5: Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:

1.Identify new, overlapping sequence of interest 2.Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc Filling a genomic gap in Silico acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Filling a genomic gap in silico Frequently repeated – info rapidly added to public databases Time consuming and mundane Don’t always get results Huge amount of interrelated data is produced – handled in notebooks and files saved to local hard drive Much knowledge remains undocumented: Bioinformatician does the analysis Advantages: Specialist human intervention at every step, quick and easy access to distributed services Disadvantages: Labour intensive, time consuming, highly repetitive and error prone process, tacit procedure so difficult to share both protocol and results

Why Workflows and Services? Workflow = general technique for describing and enacting a process Workflow = describes what you want to do, not how you want to do it Web Service = how you want to do it Web Service = automated programmatic internet access to applications Automation –Capturing processes in an explicit manner –Tedium! Computers don’t get bored/distracted/hungry/impatient! –Saves repeated time and effort Modification, maintenance, substitution and personalisation Easy to share, explain, relocate, reuse and build Available to wider audience: don’t need to be a coder, just need to know how to do Bioinformatics Releases Scientists/Bioinformaticians to do other work Record –Provenance: what the data is like, where it came from, its quality –Management of data (LSID - Life Science Identifiers)

my Grid E-Science pilot research project funded by EPSRC Manchester, Newcastle, Sheffield, Southampton, Nottingham, EBI and RFCGR, also industrial partners. ‘targeted to develop open source software to support personalised in silico experiments in biology on a grid.’ Which means…. Distributed computing – machines, tools, databanks, people Personalisation Provenance and Data management Enactment and notification A virtual lab ‘workbench’, a toolkit which serves life science communities.

Workflow Components Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available Freefluo Workflow engine to run workflows Freefluo SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST

GenBank Accession No GenBank Entry Seqret Nucleotide seq (Fasta) GenScanCoding sequence ORFs prettyseq restrict cpgreport RepeatMasker ncbiBlastWrapper sixpack transeq 6 ORFs Restriction enzyme map CpG Island locations and % Repetitive elements Translation/sequence file. Good for records and publications Blastn Vs nr, est databases. Amino Acid translation epestfind pepcoil pepstats pscan Identifies PEST seq Identifies FingerPRINTS MW, length, charge, pI, etc Predicts Coiled-coil regions SignalP TargetP PSORTII InterPro Hydrophobic regions Predicts cellular location Identifies functional and structural domains/motifs Pepwindow? Octanol? BlastWrapper URL inc GB identifier tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr RepeatMasker Query nucleotide sequence BLASTwrapper Sort for appropriate Sequences only Pink: Outputs/inputs of a service Purple: Tailor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services RepeatMasker TF binding Prediction Promotor Prediction Regulation Element Prediction Identify regulatory elements in genomic sequence Williams Workflow Plan

ABC The Williams Workflows A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence

The Workflow Experience Correct and Biologically meaningful results Automation –Saved time, increased productivity –Process split into three, you still require humans! Sharing –Other people have used and want to develop the workflows Change of work practises –Post hoc analysis. Don’t analyse data piece by piece receive all data all at once –Data stored and collected in a more standardised manner –Results amplification –Results management and visualisation Have workflows delivered on their promise?YES!

The Workflow Experience Activation energy versus Reusability trade-off –Lack of ‘available’ services, levels of redundancy can be limited –But once available can be reused for the greater good of the community Licensing of Bioinformatics Applications –Means can’t be used outside of licensing body –No license = access third-party websites Instability of external services –Research level –Reliant on other peoples servers –Taverna can retry or substitute before graceful failure Shims

shim (sh m) n. A thin, often tapered piece of material used to fill gaps, make something level, or adjust something to fit properly. shimmed, shim·ming, shims To fill in, level, or adjust by using shims or a shim. Shims Explicitly capturing the process Unrecorded ‘steps’ which aren’t realised until attempting to build something Enable services to fit together

Shims Sequence i.e. last known 3000bp MaskBLAST Identify new sequences and determine their degree of identity Sequence database entry Fasta format sequence Genbank format sequence Alignment of full query sequence V full ‘new’ sequence Old BLAST result Simplify and Compare Lister Retrieve BLAST2 ‘I want to identify new sequences which overlap with my query sequence and determine if they are useful’

The Biological Results CTA-315H11CTB-51J22 ELN WBSCR14 RP11-622P13 RP11-148M21RP11-731K22 314,004bp extension All nine known genes identified CLDN4CLDN3 STX1A WBSCR18 WBSCR21 WBSCR22 WBSCR24 WBSCR27 WBSCR28 Four workflow cycles totalling ~ 10 hours The gap was correctly closed and all known features identified

Conclusions It works – a new tool has been developed which is being utilised by biologists More regularly undertaken, less mundane, less error prone Once notification is installed won’t even need to initiate it More systematic collection and analysis of results Increased productivity Services: only as good as the individual services, lots of them, we don’t own them, many are unique and at a single site, research level software, reliant on other peoples services, licenses Activation energy

Future Directions Scheduling and Notification Portals Results visualisation Re-use: other genomic disorders, Graves Disease

Acknowledgments Dr May Tassabehji Prof Andy Brass Medical Genetics team at St Marys Hospital, Manchester Wellcome Trust

my Grid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker