Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov, Haruna Cofer, Roberto Gomperts SGI
CMSC 838T – Presentation Problem Statement u Multiple Sequence Alignment (MSA) Basis for phylogenetic analysis - Infer homology relationships Building protein families - conserved region may imply common function Aids in function/structure prediction of new proteins Global MSA – Clustal W Is it computationally expensive ? Yes, for 100 sequences. u Goal : Parallelize Clustal W Clustal W takes hours for 100 or more sequences Parallelization possible for the algorithm u Contribution of the paper Parallel Clustal W l Parallel version of basic Clustal W HT Clustal l Parallelize heterogeneous Multiple Sequence Alignment problems MULTICLUSTAL l Parallel version of an optimization on Clustal W
CMSC 838T – Presentation Talk Overview u Overview of talk Motivation Background l Sequential Clustal W Parallel Clustal W HT Clustal l Problem Statement l Optimizations MULTICLUSTAL l Sequential Algorithm l Optimizations Observations
CMSC 838T – Presentation Introduction u Sequential Clustal W Algorithm Given N sequences of length M each Pairwise Alignment (PA) l Creates distance matrix N x N based on pairwise alignment scores l Evolutionary distance Guide Tree (GT) construction (Phylogenetic tree) l Use Neighbor-joining algorithm Progressive Multiple Alignment (PA) l Use guide tree to align closely related pairs of sequences l Progressively align next sequence to existing alignment
CMSC 838T – Presentation Parallel Clustal W u Problem Statement Parallelize the Sequential Clustal W u Execution time breakup PW = pairwise alignment, GT = guide tree, PA = progressive alignment
CMSC 838T – Presentation Parallel Clustal W u Pairwise Alignment Stage N(N-1)/2 pairwise alignments Send them randomly to different processors l Random – as jobs of different load l Random also produces statistically uniform distribution (over a large set of jobs) 1.8X speedup achieved on a 1000 sequence MSA with 8 CPUs u Guide Tree Stage Parallelize “find closest neighbors from distance matrix” Used in the neighbor joining algorithm l Find minimum element of each row concurrently l Use this to find minimum element of matrix
CMSC 838T – Presentation Parallel Clustal W u Progressive Alignment Stage Computation of a function score(I,J) precomputed in parallel l Alignment score of sequence I and J Not much parallelization in the third stage u Overall Speedup Speedup of 10x for 600 MA sequences using 16 CPUs Time reduced from 1 hr 7 minutes to 6.5 minutes Relative scaling is better for larger inputs
CMSC 838T – Presentation HT Clustal u Problem Statement Calculate large numbers of MSAs of various sizes (independent problems) Such problems seen in high-throughput (HT) research environments Representative Problem (from paper) : l Perform independent MSA over 100 sets of sequences l Each set has between 20 to 100 sequences with average of 60 sequences l Average Length of sequence = 390
CMSC 838T – Presentation HT Clustal - Optimizations u Basic Idea Each MSA operation (on one set of sequences) is independent of the other Run ClustalW as a uniprocessor job on one MSA problem Launch multiple Clustal W jobs on different processors u Job Scheduling Jobs of different duration – depends on sequence set Two scheduling options explored: l Schedule dynamically – if processor is free, schedule an MSA job – chosen randomly l Schedule dynamically – Sequences are presorted (based on filesize)
CMSC 838T – Presentation HT Clustal – Performance Numbers u Speedups Almost linear speedups 31x on 32 CPUs for the representative MSA problem 116X on 128 CPUs for a larger test case l Solution time reduced from 18.5 hours to 9.5 minutes Speedup shown for the example MSA set:
CMSC 838T – Presentation HT Clustal – Effect of Presorting u Effect of presorting Figure shows effect of presorting for the example MSA set 32 CPUs, 100 sets, ~3 jobs per CPU If average number of jobs per CPU < 5 presorting helps For larger number of jobs per CPU statistical averaging reduces load imbalance
CMSC 838T – Presentation MULTICLUSTAL u MULTICLUSTAL Algorithm A Perl script to generate high quality MSA with little user intervention Searches for best combination of Clustal W input parameters l To reduce gaps, increase clustering Parameters to vary : l Scoring matrices : pairwise and multiple l Gap open and extension penalties (pairwise and multiple) Sequential Algorithm : 1. Till all parameters are sufficiently varied { 2. alignment = Run Clustal W () 3. Calculate quality of alignment 4. Change Parameters } Quality of alignment l A numerical quantity based on u identitical amino acid matches u Conservative amino acid substitutions u Gap events, amino acid islands I.e. –X-, -XX-, -XXX-, -XXXX-
CMSC 838T – Presentation MULTICLUSTAL Optimizations u Optimization on MULTICLUSTAL Run Clustal W once Reuse tree generated in the PW/GT Stages l Guide tree calculated only once for multiple runs l Results in speedups from 1.5X to 3X Use Parallel Clustal W for each run of Clustal W
CMSC 838T – Presentation Observations u Parallelizability First (pairwise alignment) and second (guide tree) stages are parallelizable Third stage is mostly sequential – speedup limited u 100 sequence MSAs possible ? PIR at NBRF (Georgetown University) takes maximum of 20 sequences for MSA Speedup improves user response, for 20 sequences a PC would be sufficient u Probable applications: u Research Environments ? u PIR servers ? u Speedup only on shared memory SGI 3000 workstation ?