Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental.

Similar presentations


Presentation on theme: "Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental."— Presentation transcript:

1 Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental Health Institute National Environment Agency

2 Outline  Work scope  Analytical approach  Current limitations  What is expected from Grid-enabling?

3 Work scope  Understanding the molecular epidemiology of vector-borne, infectious diseases in Singapore with a view of utilizing information in disease control operations Objectives  To determine the routes of pathogen migration (mainly Dengue and Chikungunya viruses)  To understand the evolutionary dynamics of pathogens  To understand the outbreak potential of pathogens within the country

4 Molecular epidemiology of DENV & CHIKV Phylogenetic relationships (trees) (BEAST, MEGA) Evolutionary dynamics (Evolutionary rates, selection pressure, recombination etc) (BEAST, HYPHY etc.) Population dynamics (Bayesian skyline plots) (BEAST) Temporo-spatial distribution of viruses (BEAST, NETWORK) What phylogenetic inferences are made? BEAST is a multi-task software package

5 CHIKV whole genome tree with spatial model India Sri Lanka Singapore Malaysia Ind. Ocean Islands Kenya Time (yrs)

6 Spatial distribution of different lineages of DENV in Singapore

7 However…….. BEAST analysis is time consuming & requires substantial computing power

8 Limitations of the BEAST approach?  Size of dataset Length of sequences No. of sequences E.g. Analyzing a dataset of ~90 whole genomes of CHIKV (11.8 kb) takes several days depending on the available computing power

9  Analytical parameters A basic analysis takes ~0.3 hrs per million states (Core 2 duo, 2.1 GHz, 4 GB RAM, >50% CPU) A general run involves at least a 100 million sampling frame (=~30 hrs) The duration increases substantially with changing parameters Incorporation of spatial model (7 states) alone increases the runtime to ~0.4 hrs per million states The ultimate duration depends on Effective Sample Size (ESS) values (general requirement >200) Limitations…

10 BEAST Tracer output window

11 Limitations…  Number of parallel runs & users ↑ runs & users -------- ↓ analytical efficiency Single run takes up >50% of CPU power

12 Why to Grid-enable BEAST?  Enables efficient data analysis parallel runs multiple users expanded datasets  Enhances data interpretation

13 Can Grid-enabling help to improve the existing performance?


Download ppt "Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental."

Similar presentations


Ads by Google