eScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester (on behalf of the my GRID team)
Traditional Bioinformatics acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
Requirements Automation Reliability Repeatability Few programming skill required Works on distributed resources
Multi-disciplinary ~37000 downloads Ranked 210 on sourceforge Users in US, Singapore, UK, Europe, Australia, Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput screening Phenotypical studies Plants, Mouse, Human Astronomy Aerospace Dilbert Cartoons
Williams-Beuren Syndrome (WBS) Contiguous sporadic gene deletion disorder 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis Haploinsufficiency of the region results in the phenotype Multisystem phenotype – muscular, nervous, circulatory systems Characteristic facial features Unique cognitive profile Mental retardation (IQ , mean~60, ‘normal’ mean ~ 100 ) Outgoing personality, friendly nature, ‘charming’
Williams-Beuren Syndrome Microdeletion Chr 7 ~155 Mb ~1.5 Mb 7q11.23 GTF2I RFC2CYLN2 GTF2IRD1 NCF1 WBSCR1/E1f4H LIMK1ELNCLDN4CLDN3STX1A WBSCR18 WBSCR21 TBL2BCL7BBAZ1B FZD9 WBSCR5/LAB WBSCR22 FKBP6POM121 NOLR1 GTF2IRD2 C-cen C-midA-cen B-mid B-cen A-midB-telA-telC-tel WBSCR14 STAG3 PMS2L Block A FKBP6T POM121 NOLR1 Block C GTF2IP NCF1P GTF2IRD2P Block B ** WBS SVAS Patient deletions CTA-315H11CTB-51J22 ‘Gap’ Physical Map Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5: Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:
Filling a genomic gap in silico Two steps to filling the genomic gap: 1.Identify new, overlapping sequence of interest 2.Characterise the new sequence at nucleotide and amino acid level Number of issues if we are to do it the traditional way: 1.Frequently repeated – info rapidly added to public databases 2.Time consuming and mundane 3.Don’t always get results 4.Huge amount of interrelated data is produced
ABC The Williams Workflows A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence
The Biological Results CTA-315H11CTB-51J22 ELN WBSCR14 RP11-622P13 RP11-148M21RP11-731K22 314,004bp extension All nine known genes identified (40/45 exons identified) CLDN4CLDN3 STX1A WBSCR18 WBSCR21 WBSCR22 WBSCR24 WBSCR27 WBSCR28 Four workflow cycles totalling ~ 10 hours The gap was correctly closed and all known features identified
Case Study – Graves Disease Autoimmune disease that causes hyperthyroidism Antibodies to the thyrotropin receptor result in constitutive activation of the receptor and increased levels of thyroid hormone Original my Grid Case Study Ref: Li P, Hayward K, Jennings C, Owen K, Oinn T, Stevens R, Pearce S and Wipat A (2004) Association of variations in NFKBIE with Graves? disease using classical and myGrid methodologies. UK e-Science All Hands Meeting 2004
Graves Disease The experiment: Analysing microarray data to determine genes differentially-expressed in Graves Disease patients and healthy controls Characterising these genes (and any proteins encoded by them) in an annotation pipeline From affymetrix probeset identifier, extract information about genes encoded in this region. For each gene, evidence is extracted from other data sources to potentially support it as a candidate for disease involvement
Annotation Pipeline Evidence includes: SNPs in coding and non-coding regions Protein products Protein structure and functional features Metabolic Pathways Gene Ontology terms