Analysing African and European cattle with Taverna 2.2 Stuart Owen Based on the work by : Professor Andy Brass and Mohammad Khodadadi.

Slides:



Advertisements
Similar presentations
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Advertisements

CCMDB 7.2.
Outline to SNP bioinformatics lecture
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
CoMPAS Pro: Comprehensive Meta Prediction and Annotation Services for Proteins Sebastian J. Schultheiß Christoph Malisi.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
ModelBuilder In ArcGIS 9.x By Tim Weigel GEOG 407/607 April 3 rd, 2006.
Trinity College Dublin KARI-TRC Shirakawa Institute of Animal Genetics.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Towards Personal Genomics Tools for Navigating the Genome of an Individual Saul A. Kravitz J. Craig Venter Institute Rockville, MD Bio-IT World 2008.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Whole Exome Sequencing for Variant Discovery and Prioritisation
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CLARIN tools for workflows Overview. Objective of this document  Determine which are the responsibilities of the different components of CLARIN workflows.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Final Biology Group Presentation December 9-11, 2009 Biophysics 101 Anugraha Raman, Jacqueline Nkuebe and Ridhi Tariyal.
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Data-driven research with e-Laboratories Stuart Owen University of Manchester
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Index Building Overview Database tables Building flow (logical) Sequential Drawbacks Parallel processing Recovery Helpful rules.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Apollo Future Plans Nomi Harris, BDGP/FlyBase GMOD Meeting, Cambridge April 27, 2004.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
© 2010 by The Samuel Roberts Noble Foundation, Inc. 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center.
Your Poster Title Here Your name here, and names of others Place the name of your institution here Your Poster Title Here Your name here, and names of.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
A collaborative tool for sequence annotation. Contact:
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
The International Consortium. The International HapMap Project.
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
Taverna in App4Andy. Current status Version 1.0 – AWS-based NGS annotation pipeline – Completed Boran, N’Dama, Cape Buffalo Processed Watson data through.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Research proposal 2009 信息技术会议 Bioinformatics Analysis & Identification of non-Synonymous SNPs in Candidate Genes for Ascites College of Animal Husbandry.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
First Release Requirements Deliverable: –Thick client application that allows the creation of a new project and deployment of that project (edit consists.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Taverna Workbench – Case studies Helen Hulme. Do you really need to use workflows? Bioinformaticians are programmers Can use shell scripts Are used to.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
The Bovine Genome Sequence: potential resources and practical uses. Nicola Hastings, Andy Law and John L. Williams * * Department of Genetics and Genomics,
Canadian Bioinformatics Workshops
Rennie C1 Hulme H2 Fisher P2 Hall L3 Agaba M4 Noyes HA1 Kemp SJ1,4
Rennie C1 Hulme H2 Fisher P2 Hall L3 Agaba M4 Noyes HA1 Kemp SJ1,4
Taverna workflow management system
Explore Evolution: Instrument for Analysis
Welcome - webinar instructions
Presentation transcript:

Analysing African and European cattle with Taverna 2.2 Stuart Owen Based on the work by : Professor Andy Brass and Mohammad Khodadadi University of Manchester, UK Harry Noyes and Steve Kemp University of Liverpool, UK BOSC2010 – Boston.

Analysing African and European cattle with Taverna 2.2 A BioInformatics case study demonstrating the use of the Taverna 2 workflow system This is a snapshot of some exiting science which is currently in progress

Analysing African and European cattle with Taverna ,000 years separation African Livestock adaptations: Hardier Better disease resistance Potential outcomes: Food security Understanding resistance Understanding environmental Conditions Drought Parasites Understanding diversity stm

Workflow and phases MAP FILTER ANALYSIS

Workflow and phases Input SNP file Populate DB with start SNP’s and resource version numbers Lift-over: maps between UMD3 and BTA4 cow assemblies Exon positions from ENSMBL Find SNPs in Exon regions PolyPhen to mark “dangerous” SNP’s

Little more about the phases … Input SNP file result of 15 fold average coverage of an entire Boran cow –11.9 million SNP’s described. –Resulting from Next Generation Sequencing. All initial data is stored within a Database, mapped by a runID to the versions of ENSEMBL, LiftOver, Polyphen. LiftOver – provides a mapping between 2 different reference cow assemblies – –UMD3 : more accurate assembly –BTA4 : better annotated and ENSEMBL friendly –Store BT4 position, Chromosome and Allele in database –Filter out, but store, results where there is a mismatch between the base.

… Little more about the phases ESEMBL is used to retrieve annotations about the SNP’s : –For all the SNPs that have the same base we go over all the exons for cow in ENSEMBL and see if we can match the SNPs to any of these exons ( exon start < SNP position < exon end), also store geneID, Allele, associated Gene names, and Bio-Type. –Filter out, but store, ENSEMBL/BTA4 mismatches. –Second phase fetches the consequence according the the BTA4 positions. –From this information a file is generated for PolyPhen, for all SNPs that got non-synonymous as a consequence. A local instance of PolyPhen is queried using a file generated from the ENSEMBL annotations to produce an indication of the level to which a SNP changes the protein. Outcome is an Annotated Database of ~20,000 “interesting” SNPs

Packaged as a sharable virtual machine image 11.9 Million SNPs LiftOver ResultsPolyPhen 50,000 annotated SNPs ENSEMBL 11.9 Million SNPs LiftOver ResultsPolyPhen 20,000 annotated SNPs + provenance. ENSEMBL

Packaged as a sharable virtual machine image LiftOver, Taverna, PolyPhen and the Workflow is packaged as a Virtual Machine image. –Everything (except ENSEMBL) is run locally –Full Cow analysis takes 2 days – previous attempts would have taken an estimated 3 months for the PolyPhen phase alone. Results and experiment can be distributed and shared as a complete package –Re-use –Repeatable –Reproducible Future plans to deploy the image on “The Cloud”

Packaged as a sharable virtual machine image ENSEMBL Boran Cow Annotated DB MAP FILTER ANALYSIS FILTER ANALYSIS MAP FILTER ANALYSIS Sheko Cow N’Dama Cow Etc …

Highlights of new Taverna 2.2 features Officially released last Wednesday – July 7 th 2010 Loading and sharing of service sets Ability to load and edit workflows that contain services that are offline Reporting on the state of the workflow Tabular representation of a workflow run Retrying and parallelization of service calls Consistent representation of the intermediate and workflow results Pause/resume/cancel of a running workflow Command line tool that allows you to execute workflows outside of the workbench. Faster, Better, Easier