Download presentation
Presentation is loading. Please wait.
Published byEsther Mosley Modified over 9 years ago
1
An Introduction to Taverna caBIG monthly workspace call and Taverna, 2009-03-19 Franck Tanoh
2
History Phase 1: 2001-2006 funded by EPSRC
3
Provide software and support to enable a sustained future for the UK e-Science community and its international collaborators. History Phase 2: 2006-2009 funded by OMII-UK
4
Background NAR 2009 – 1170 databases EMBL database growth
5
The Community Problems Everything is Distributed Data, Resources and Scientists Heterogeneous data Very few standards I/O formats, data representation, annotation Everything is a string! Integration of data and interoperability of resources is difficult
6
Manual Methods of data analysis 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa Navigating through hyperlinks No explicit methods Human error Tedious and repetitive
7
Implicit methods
8
Huge amounts of data 200+ Genes Region on chromosome Microarray 1000+ Genes How do I look at ALL the genes systematically?
9
Workflows General technique for describing and enacting a process Describes what you want to do, not how you want to do it High level description of the experiment Repeat Masker Web service GenScan Web Service Blast Web Service Remove repeats Find genes Find orthalogues sequence Predicted Genes
10
“Taverna enables the interoperation between databases and tools by providing a toolkit for composing, executing and managing workflow experiments” “Allows you to build and run workflows”. Access to local and remote resources and analysis tools Automation of data flow between services Iteration over large data sets……..etc
11
http://taverna.sf.net A ‘super client’ to a variety of disparate services on both intra-net and inter-net
12
Taverna Originally designed to support bioinformatics Expanded into new areas: Chemoinformatics Health Informatics Medical Imaging Integrative Biology Astronomy Open source – and always will be
13
Taverna uses Web Services Client Application Client Application SOAP WSDL Remote Application HTTP Request HTTP Response HTTP Request HTTP Response ‘A Web service is a software system designed to support interoperable machine-to-machine interaction over a network’ (W3C)
14
Service discovery Free text search over ‘known’ services. Semantic search over service repository, relies on manual service annotation and submission of those annotations to the repository. Provenance tracking Lineage tracking of result data. Automatic semantic annotation of data from service annotations. Possible as the workflow engine creates a ‘managed environment’ with an overview of all data movement. Result visualization Common renderers included in base distribution include 3d structure, images, graph rendering
15
What types of service? WSDL Web Services BioMart R-processor BioMoby Soaplab Local Java services Beanshell Workflows CDK toolkit
16
What can you do with Taverna? Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput screening Genotype/Phenotype studies Health Informatics Astronomy Chemoinformatics Data integration
17
http://www.genomics.liv.ac.uk/tryps/trypsindex.html Trypanosomiasis in cattle Andy Brass Steve Kemp
18
Trypanosomiasis Study Understanding Phenotype Comparing resistant vs susceptible strains – Microarrays Understanding Genotype Mapping quantitative traits – Classical genetics QTL Need to access Microarray data, genomic sequence information, pathway databases AND integrate the results
19
Key: A – Retrieve genes in QTL region B – Annotate genes with external database Ids C – Cross-reference Ids with KEGG gene ids D – Retrieve microarray data from MaxD database E – For each KEGG gene get the pathways it’s involved in F – For each pathway get a description of what it does G – For each KEGG gene get a description of what it does Trypanosomiasis Study
20
Results Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance. Manual analysis on the microarray and QTL data had failed to identify this gene as a candidate.
21
Scale of analysis task overwhelms researchers – lots of data Handled by computers User bias and premature filtering of datasets – cherry picking All data processed systematically Hypothesis-Driven approach to data analysis Computers know nothing of hypotheses and so process the data independent of any prior judgments Constant changes in data - problems with re-analysis of data Saved workflow can be re-run at any point, over new data sets Implicit methodologies (hyper-linking through web pages) Methodology has been captured in the workflow itself Was the Workflow Approach Successful?
22
Find a web service: BioCatalogue http://www.biocatalogue.org/
23
Reuse workflows: myExperiment http://www.myexperiment.org/
24
A taste of the future Enactor rewrite with more extensibility points. User interface redesigned Long running workflows over large data sets. Enactor invocation extensions Asynchronous processor and data streaming Data manager Security agent ………. Taverna 2
25
Summary Taverna workflows: Combine local and remote resource and analysis tools Automate multi-step processes Iterate over large data sets
26
Please see http://www.mygrid.org.uk/wiki/Mygrid/Acknowledgements for most up to date listhttp://www.mygrid.org.uk/wiki/Mygrid/Acknowledgements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.