EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester

Slides:



Advertisements
Similar presentations
1 Semantic Webs and The Semantic Web: Services, Resources and Technologies for Clinical Care and Biomedical Research Alan Rector School of Computer Science.
Advertisements

Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK.
Lecture 2 Strachan and Read Chapter 13
Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Center for Bioinformatics, University of Tübingen
Gene identification by whole genome array CGH Richard Barber 21st February Gene Discovery.
Peter Rice Bioinformatics and Grid: Progress and Potential Peter Rice, EBI ISGC, April 2005.
Classical and myGrid approaches to data mining in bioinformatics
Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Doing it again: Workflows and Ontologies Supporting Science Phillip Lord Frank Gibson Newcastle University.
Workflows within Taverna Stuart Owen University of Mancester, UK
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
The Representation of Scientific Data
1 Middleware for In silico Biology Phillip Lord
An Introduction to Taverna Dr. Georgina Moulton and Stian Soiland The University of Manchester
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman.
Science, Workflows and Collections Professor Carole Goble The University of Manchester, UK
The Taverna Workbench: Integrating and analysing biological and clinical data with computerised workflows Dr Katy Wolstencroft myGrid University of Manchester.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
OMII-UK Software Activities Steven Newhouse, Director.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Genomes and Genomics.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
Taverna Workbench Stuart Owen University of Mancester, UK
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
First International Workshop on Portals for Life Sciences Sandra Gesing
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
Genomics Chapter 18.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Accessing and visualizing genomics data
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Mapping of Scientific Workflow within the e-Protein project to Distributed Resources London e-Science Centre Department of Computing, Imperial College.
1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
Notes: Human Genome (Right side page)
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft and Aleksandra Pawlik.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
Biotechnology.
Enrico Fattibene INFN-CNAF
New genes can be added to an organism’s DNA.
Scientists use several techniques to manipulate DNA.
Mutations Changes in the genetic material Gene Mutations
Distributed Computing for System Biology using Taverna Workflows
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Mutations & Genetic Engineering
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

eScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester (on behalf of the my GRID team)

Traditional Bioinformatics acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Requirements Automation Reliability Repeatability Few programming skill required Works on distributed resources

Multi-disciplinary ~37000 downloads Ranked 210 on sourceforge Users in US, Singapore, UK, Europe, Australia, Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput screening Phenotypical studies Plants, Mouse, Human Astronomy Aerospace Dilbert Cartoons

Williams-Beuren Syndrome (WBS) Contiguous sporadic gene deletion disorder 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis Haploinsufficiency of the region results in the phenotype Multisystem phenotype – muscular, nervous, circulatory systems Characteristic facial features Unique cognitive profile Mental retardation (IQ , mean~60, ‘normal’ mean ~ 100 ) Outgoing personality, friendly nature, ‘charming’

Williams-Beuren Syndrome Microdeletion Chr 7 ~155 Mb ~1.5 Mb 7q11.23 GTF2I RFC2CYLN2 GTF2IRD1 NCF1 WBSCR1/E1f4H LIMK1ELNCLDN4CLDN3STX1A WBSCR18 WBSCR21 TBL2BCL7BBAZ1B FZD9 WBSCR5/LAB WBSCR22 FKBP6POM121 NOLR1 GTF2IRD2 C-cen C-midA-cen B-mid B-cen A-midB-telA-telC-tel WBSCR14 STAG3 PMS2L Block A FKBP6T POM121 NOLR1 Block C GTF2IP NCF1P GTF2IRD2P Block B ** WBS SVAS Patient deletions CTA-315H11CTB-51J22 ‘Gap’ Physical Map Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5: Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:

Filling a genomic gap in silico Two steps to filling the genomic gap: 1.Identify new, overlapping sequence of interest 2.Characterise the new sequence at nucleotide and amino acid level Number of issues if we are to do it the traditional way: 1.Frequently repeated – info rapidly added to public databases 2.Time consuming and mundane 3.Don’t always get results 4.Huge amount of interrelated data is produced

ABC The Williams Workflows A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence

The Biological Results CTA-315H11CTB-51J22 ELN WBSCR14 RP11-622P13 RP11-148M21RP11-731K22 314,004bp extension All nine known genes identified (40/45 exons identified) CLDN4CLDN3 STX1A WBSCR18 WBSCR21 WBSCR22 WBSCR24 WBSCR27 WBSCR28 Four workflow cycles totalling ~ 10 hours The gap was correctly closed and all known features identified

Case Study – Graves Disease Autoimmune disease that causes hyperthyroidism Antibodies to the thyrotropin receptor result in constitutive activation of the receptor and increased levels of thyroid hormone Original my Grid Case Study Ref: Li P, Hayward K, Jennings C, Owen K, Oinn T, Stevens R, Pearce S and Wipat A (2004) Association of variations in NFKBIE with Graves? disease using classical and myGrid methodologies. UK e-Science All Hands Meeting 2004

Graves Disease The experiment: Analysing microarray data to determine genes differentially-expressed in Graves Disease patients and healthy controls Characterising these genes (and any proteins encoded by them) in an annotation pipeline From affymetrix probeset identifier, extract information about genes encoded in this region. For each gene, evidence is extracted from other data sources to potentially support it as a candidate for disease involvement

Annotation Pipeline Evidence includes: SNPs in coding and non-coding regions Protein products Protein structure and functional features Metabolic Pathways Gene Ontology terms