Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental.

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Practical Session: Bayesian evolutionary analysis by sampling trees (BEAST) Rebecca R. Gray, Ph.D. Department of Pathology University of Florida.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
The Effect of Climate on Infectious Disease
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Lsevlet: web-based processing tool for LiDAR data J Ramon Arrowsmith December 14, 2004.
BioQUEST  Case Study Format ◦ Learning Objectives ◦ Resources  Data Analysis & Visualization ◦ Tools ◦ Statistics  Assessment.
Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)
HIV-1 subtype C now accounts for approximately 50% of the estimated 33 million people living with HIV/AIDS and half of the 1-2 million new infections annually.
Bringing Inverse Modeling to the Scientific Community Hydrologic Data and the Method of Anchored Distributions (MAD) Matthew Over 1, Daniel P. Ames 2,
A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Phylogenetic Analysis Dayong Guo. Introduction Phylogenetics is the study of evolutionary relatedness among various species, populations, or among a set.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview.
A Project Training Seminar on “Server Multi Client Chat”
Characterization of antigenetic serotypes from the dengue virus in Venezuela by means of Grid Computing R. Isea 1, E. Montes 2, A.J. Rubio-Montero 2, J.D.
UPPMAX and UPPNEX: Enabling high performance bioinformatics Ola Spjuth, UPPMAX
The NIH Roadmap and the Human Microbiome Project Francis S. Collins, M.D., Ph.D. National Human Genome Research Institute April 22, 2007.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Hiding in the Mobile Crowd: Location Privacy through Collaboration.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
March 26, 2007 Phyloinformatics of Neuraminidase at Micro and Macro Levels using Grid-enabled HPC Technologies B. Schmidt (UNSW) D.T. Singh (Genvea Biosciences)
Whole Genome Repeat Analysis Package A Preliminary Analysis of the Caenorhabditis elegans Genome Paul Poole.
Your Poster Title Here Your name here, and names of others Place the name of your institution here Your Poster Title Here Your name here, and names of.
Genomes To Life Biology for 21 st Century A Joint Initiative of the Office of Advanced Scientific Computing Research and Office of Biological and Environmental.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
EGEE-II INFSO-RI Enabling Grids for E-sciencE WISDOM in EGEE-2, biomed meeting, 2006/04/28 WISDOM : Grid-enabled Virtual High Throughput.
Enabling Multiple Sequence Comparison by Log- Expectation (MUSCLE) on EUAsiaGrid EUAsiaGrid Master Class 6 May 2010 By: Lee Hong Kai and Thomas Tay NUHS.
Sara E. Richardson Calit2 Summer Undergraduate Research Scholarship Program Advisor: Jurgen Schulze Ivl.calit2.net/wiki CAMERA is.
CEOS Data Cube Open Source Software Status Brian Killough CEOS Systems Engineering Office (SEO) WGISS-40 Harwell, Oxfordshire, UK September 30, 2015 (remote.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
8-Dec-15T.Wildish / Princeton1 CMS analytics A proposal for a pilot project CMS Analytics.
AIDS IN AFRICA. Two orphaned children stand next to the graves of their parents who died from the AIDS virus. An infected mother with her child who has.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Jill McElroy Semahgn A. Belew Sami J. Alsahhar Yoseph A. Getachew GIS in Public Health Technical Assessment GISC-6383 Management and Implementation October.
OPTIMIZATION OF DIESEL INJECTION USING GRID COMPUTING Miguel Caballer Universidad Politécnica de Valencia.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
Operating Systems. Categories of Software System Software –Operating Systems (OS) –Language Translators –Utility Programs Application Software.
Full modeling versus summarizing gene- tree uncertainty: Method choice and species-tree accuracy L.L. Knowles et al., Molecular Phylogenetics and Evolution.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.
Windows 7 Ultimate
Bioinformatics Overview
Experience of PROOF cluster Installation and operation
The Phylodynamic nature of Newcastle disease virus isolated
Matt Lemons Nate Mayotte
Comparative genotypic and phenotypic characterization of
Genetic Divergence of Chikungunya virus from the Comoros Island (2005) and Detection of Chikungunya in a Dengue Outbreak Situation in Kenya in 2013 Caroline.
Molecular characterization of dengue virus 1 from autochthonous dengue fever cases in Croatia  I.C. Kurolt, L. Betica-Radić, O. Daković-Rode, L. Franco,
Digital Terrain Analysis for Massive Grids
Pipelines for Computational Analysis (Bioinformatics)
הכרת המחשב האישי PC - Personal Computer
Overview Introduction VPS Understanding VPS Architecture
Methods of molecular phylogeny
Evolution and Spread of Ebola Virus in Liberia, 2014–2015
E. Descloux, C. La Fuentez, Y. Roca, X. De Lamballerie 
Potential Vectors of Xylella fastidiosa in Germany - Species, Biology, Identification and Sampling Michael Maixner JKI, Institute for Plant Protection.
Enrique Garcia-Assad, Indresh Singh, Pratap Venepally, Jason Inman
Declarative Transfer Learning from Deep CNNs at Scale
A Web-based Interactive Genome Library for Surveillance, Detection, Characterization and Drug-Resistance Monitoring of Influenza Virus Infection in the.
Long-term circulation of Zika virus in Thailand: an observational study  Kriangsak Ruchusatsawat, PhD, Pattara Wongjaroen, MSc, Arisara Posanacharoen,
Novel West Nile virus lineage 1a full genome sequences from human cases of infection in north-eastern Italy, 2011  L. Barzon  Clinical Microbiology and.
Run time performance for all benchmarked software.
Presentation transcript:

Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental Health Institute National Environment Agency

Outline  Work scope  Analytical approach  Current limitations  What is expected from Grid-enabling?

Work scope  Understanding the molecular epidemiology of vector-borne, infectious diseases in Singapore with a view of utilizing information in disease control operations Objectives  To determine the routes of pathogen migration (mainly Dengue and Chikungunya viruses)  To understand the evolutionary dynamics of pathogens  To understand the outbreak potential of pathogens within the country

Molecular epidemiology of DENV & CHIKV Phylogenetic relationships (trees) (BEAST, MEGA) Evolutionary dynamics (Evolutionary rates, selection pressure, recombination etc) (BEAST, HYPHY etc.) Population dynamics (Bayesian skyline plots) (BEAST) Temporo-spatial distribution of viruses (BEAST, NETWORK) What phylogenetic inferences are made? BEAST is a multi-task software package

CHIKV whole genome tree with spatial model India Sri Lanka Singapore Malaysia Ind. Ocean Islands Kenya Time (yrs)

Spatial distribution of different lineages of DENV in Singapore

However…….. BEAST analysis is time consuming & requires substantial computing power

Limitations of the BEAST approach?  Size of dataset Length of sequences No. of sequences E.g. Analyzing a dataset of ~90 whole genomes of CHIKV (11.8 kb) takes several days depending on the available computing power

 Analytical parameters A basic analysis takes ~0.3 hrs per million states (Core 2 duo, 2.1 GHz, 4 GB RAM, >50% CPU) A general run involves at least a 100 million sampling frame (=~30 hrs) The duration increases substantially with changing parameters Incorporation of spatial model (7 states) alone increases the runtime to ~0.4 hrs per million states The ultimate duration depends on Effective Sample Size (ESS) values (general requirement >200) Limitations…

BEAST Tracer output window

Limitations…  Number of parallel runs & users ↑ runs & users ↓ analytical efficiency Single run takes up >50% of CPU power

Why to Grid-enable BEAST?  Enables efficient data analysis parallel runs multiple users expanded datasets  Enhances data interpretation

Can Grid-enabling help to improve the existing performance?