SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Hybrid MPI/Pthreads Parallelization of the RAxML Phylogenetics Code Wayne Pfeiffer.

Slides:

Advertisements

Similar presentations

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.

Advertisements

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Enabling Phylogenetic Research via the CIPRES Science Gateway Wayne Pfeiffer.

A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.

Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)

Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.

 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.

SAN DIEGO SUPERCOMPUTER CENTER Advanced User Support Project Outline October 9th 2008 Ross C. Walker.

1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.

Molecular Evolution Revised 29/12/06

High-Performance Algorithm Engineering for Computational Phylogenetics [B Moret, D Bader] Kexue Liu CMSC 838 Presentation.

Tree Reconstruction.

Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.

Bioinformatics and Phylogenetic Analysis

Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.

Methods for Phylogenetics and Evolutionary analysis Jianpeng Xu University of Nebraska-Omah a.

Probabilistic methods for phylogenetic trees (Part 2)

Building Phylogenies Parsimony 2.

Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.

Phylogenetic Tree Construction and Related Problems Bioinformatics.

CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.

Lecture 8 – Searching Tree Space. The Search Tree.

Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)

Characterizing the Phylogenetic Tree-Search Problem Daniel Money And Simon Whelan ~Anusha Sura.

Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,

Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003

RESEARCH STATEMENT: High Performance Computing Bioinformatics Alexandros Stamatakis ICS-FORTH

Creating the CIPRES Science Gateway for Inference of Large Phylogenetic Trees Mark A. Miller San Diego Supercomputer Center.

Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.

Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.

1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

/ ZZ88 Performance of Parallel Neuronal Models on Triton Cluster Anita Bandrowski, Prithvi Sundararaman, Subhashini Sivagnanam, Kenneth Yoshimoto,

Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.

Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections

Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.

PERFORMANCE ANALYSIS cont. End-to-End Speedup  Execution time includes communication costs between FPGA and host machine  FPGA consistently outperforms.

Bioinformatics 2011 Molecular Evolution Revised 29/12/06.

Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

March 26, 2007 Phyloinformatics of Neuraminidase at Micro and Macro Levels using Grid-enabled HPC Technologies B. Schmidt (UNSW) D.T. Singh (Genvea Biosciences)

Introduction to Phylogenetics

MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.

Calculating branch lengths from distances. ABC A B C----- a b c.

INDIANAUNIVERSITYINDIANAUNIVERSITY 1 Parallel implementation and performance of fastDNAml - a program for maximum likelihood phylogenetic inference Craig.

More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.

Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:

Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.

Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.

Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox

Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.

Phylogenetic Trees - Parsimony Tutorial #13

Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?

The Big Issues in Phylogenetic Reconstruction Randy Linder Integrative Biology, University of Texas

CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.

Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.

Bioinformatics Overview

Introduction to Bioinformatics Resources for DNA Barcoding

Phylogenetic basis of systematics

Case Study 2- Parallel Breadth-First Search Using OpenMP

Inferring a phylogeny is an estimation procedure.

Phylogenetic Inference

CSE 280A: Advanced Topics in Computational Molecular Biology

Methods of molecular phylogeny

Molecular Evolution.

Summary and Recommendations

BNFO 602 Phylogenetics Usman Roshan.

CS 581 Tandy Warnow.

Lecture 8 – Searching Tree Space

High-Throughput Identification and Quantification of Candida Species Using High Resolution Derivative Melt Analysis of Panfungal Amplicons Tasneem Mandviwala,

Summary and Recommendations

Presentation transcript:

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Hybrid MPI/Pthreads Parallelization of the RAxML Phylogenetics Code Wayne Pfeiffer (SDSC/UCSD) & Alexandros Stamatakis (TUM) December 3, 2009

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO What was done? Why is it important? Who cares? Hybrid MPI/Pthreads version of RAxML phylogenetics code was developed MPI code was added to previous Pthreads-only code This allows multiple multi-core nodes in a cluster to be used in a single run Typical problems now run well on 10 nodes (80 cores) of Dash (a new cluster at SDSC) as compared to only on one node (8 cores) before Hybrid code is being made available on TeraGrid via CIPRES portal Work was done as part of ASTA project supporting Mark Miller of SDSC, who oversees the portal This will give biologists easy access to the code

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO What does a phylogenetics code do? Homo_sapiensAAGCTTCACCGGCGCAGTCATTCTCATAAT... PanAAGCTTCACCGGCGCAATTATCCTCATAAT... GorillaAAGCTTCACCGGCGCAGTTGTTCTTATAAT... Pongo AAGCTTCACCGGCGCAACCACCCTCATGAT... Hylobates AAGCTTTACAGGTGCAACCGTCCTCATAAT... / Homo_sapiens | | Pan + | / Gorilla | | \ / Pongo \ \ Hylobates It starts from a multiple sequence alignment (matrix of taxa versus characters) & generates a phylogeny (usually a tree of taxa)

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO A little more about molecular phylogenetics Multiple sequence alignment Specified by DNA bases, RNA bases, or amino acids Obtained by separate analysis of multiple sequences Possible changes to a sequence Substitution (point mutation or SNP: considered in RAxML) Insertion & deletion (also handled by RAxML) Structural variations (e.g., duplication, inversion, & translocation) Recombination & horizontal gene transfer (important in bacteria) Common methods, typically heuristic & based on models of molecular evolution Distance Parsimony Maximum likelihood (used by RAxML) Bayesian

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO RAxML is a maximum likelihood (ML) code with parallelism at various algorithmic levels Heuristic searching within a tree can exploit fine-grained parallelism using Pthreads A variant of subtree pruning and grafting (SPR) called lazy subtree rearrangement (LSR) Three types of searches across trees can exploit coarse- grained parallelism using MPI Multiple ML searches on the same data set starting from different initial trees to explore solution space better Multiple bootstrap searches on resampled data sets to obtain confidence values on interior branches of tree Comprehensive analysis that combines the two previous analyses

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Comprehensive analysis combines many rapid bootstraps followed by a full ML search Complete, publishable analysis in a single run Four stages (with typical numbers of searches) 100 rapid bootstrap searches 20 fast ML searches 10 slow ML searches 1 thorough ML search Coarse-grained parallelism via MPI In first three stages, but decreasing with each stage Fine-grained parallelism via Pthreads Available at all stages Tradeoff in effectiveness between MPI and Pthreads Typically 10 or 20 MPI processes max Optimal number of Pthreads increasing with number of patterns, but limited to number of cores per node

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Some noteworthy points regarding the MPI parallel implementation A thorough search is done for every MPI process (instead of just a single search as in Pthreads-only code) Increases run time only a little, because load is reasonably balanced Often leads to better quality solution Only two MPI calls are made MPI_Barrier after bootstraps MPI_Bcast after thorough searches to select best one for output Treatment of random numbers is reproducible At least for a given number of MPI processes

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO 5 benchmark problems & 4 benchmark computers were used Benchmark problems Benchmark computers (all with quad-core x64 processors) Abe at NCSA8-way nodes with 2.33-GHz Intel Clovertowns Dash at SDSC8-way nodes with 2.4-GHz Intel Nehalems Ranger at TACC16-way nodes with 2.3-GHz AMD Barcelonas Triton PDAF at SDSC32-way nodes with 2.5-GHz AMD Shanghais

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO For problem with 1.8k patterns, RAxML achieves speedup of 35 on 80 cores of Dash using 10 MPI processes & 8 Pthreads

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Parallel efficiency plot of same data clarifies performance differences; on ≥ 8 cores, 4 or 8 Pthreads is optimal; on 8 cores, 4 Pthreads is 1.3x faster than 8 Pthreads

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO First three stages scale well with MPI, but last (thorough search) stage does not; that limits scalability

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO For problem with 19k patterns, performance on Dash is best using all 8 Pthreads; scaling is poorer than for problem with 1.8k patterns

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO For problem with 19k patterns, scaling is better on Triton PDAF than on Dash using all 32 Pthreads available

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO For problem with 19k patterns, Triton PDAF is faster than other computers at high core counts

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO With more bootstraps (per bootstopping): speedup is better & parallel efficiency ≥ 0.44 on 80 cores; optimal number of threads drops; & run time for longest problem goes from > 4 d to < 1.8 h

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO The hybrid code provides three notable benefits for comprehensive analyses Multiple computer nodes can be used in a single run For problem with 1.8k patterns, speedup on 10 nodes (80 cores) of Dash is 6.5 compared to one node (using 8 Pthreads in both cases) Speedup is 35 on 80 cores compared to serial case Number of Pthreads per node can be adjusted for optimal efficiency For same problem, using 2 MPI processes & 4 Pthreads on a single node of Dash is 1.3x faster than using 8 Pthreads alone Additional thorough searches often lead to better solution

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Some final points Newest computers are fastest Dash is fastest for 4 out of 5 problems considered Triton PDAF is fastest for problem with 19k patterns because it allows 32 Pthreads Hybrid code is available in RAxML and later Available now from RAxML Web site ( Available soon on TeraGrid via CIPRES portal (