Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,

1 Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov, Haruna Cofer, Roberto Gomperts SGI

2 CMSC 838T – Presentation Problem Statement u Multiple Sequence Alignment (MSA)  Basis for phylogenetic analysis - Infer homology relationships  Building protein families - conserved region may imply common function  Aids in function/structure prediction of new proteins  Global MSA – Clustal W  Is it computationally expensive ? Yes, for 100 sequences. u Goal : Parallelize Clustal W  Clustal W takes hours for 100 or more sequences  Parallelization possible for the algorithm u Contribution of the paper  Parallel Clustal W l Parallel version of basic Clustal W  HT Clustal l Parallelize heterogeneous Multiple Sequence Alignment problems  MULTICLUSTAL l Parallel version of an optimization on Clustal W

3 CMSC 838T – Presentation Talk Overview u Overview of talk  Motivation  Background l Sequential Clustal W  Parallel Clustal W  HT Clustal l Problem Statement l Optimizations  MULTICLUSTAL l Sequential Algorithm l Optimizations  Observations

4 CMSC 838T – Presentation Introduction u Sequential Clustal W Algorithm  Given N sequences of length M each  Pairwise Alignment (PA) l Creates distance matrix N x N based on pairwise alignment scores l Evolutionary distance  Guide Tree (GT) construction (Phylogenetic tree) l Use Neighbor-joining algorithm  Progressive Multiple Alignment (PA) l Use guide tree to align closely related pairs of sequences l Progressively align next sequence to existing alignment

5 CMSC 838T – Presentation Parallel Clustal W u Problem Statement  Parallelize the Sequential Clustal W u Execution time breakup  PW = pairwise alignment, GT = guide tree, PA = progressive alignment

6 CMSC 838T – Presentation Parallel Clustal W u Pairwise Alignment Stage  N(N-1)/2 pairwise alignments  Send them randomly to different processors l Random – as jobs of different load l Random also produces statistically uniform distribution (over a large set of jobs)  1.8X speedup achieved on a 1000 sequence MSA with 8 CPUs u Guide Tree Stage  Parallelize “find closest neighbors from distance matrix”  Used in the neighbor joining algorithm l Find minimum element of each row concurrently l Use this to find minimum element of matrix

7 CMSC 838T – Presentation Parallel Clustal W u Progressive Alignment Stage  Computation of a function score(I,J) precomputed in parallel l Alignment score of sequence I and J  Not much parallelization in the third stage u Overall Speedup  Speedup of 10x for 600 MA sequences using 16 CPUs  Time reduced from 1 hr 7 minutes to 6.5 minutes  Relative scaling is better for larger inputs

8 CMSC 838T – Presentation HT Clustal u Problem Statement  Calculate large numbers of MSAs of various sizes (independent problems)  Such problems seen in high-throughput (HT) research environments  Representative Problem (from paper) : l Perform independent MSA over 100 sets of sequences l Each set has between 20 to 100 sequences with average of 60 sequences l Average Length of sequence = 390

9 CMSC 838T – Presentation HT Clustal - Optimizations u Basic Idea  Each MSA operation (on one set of sequences) is independent of the other  Run ClustalW as a uniprocessor job on one MSA problem  Launch multiple Clustal W jobs on different processors u Job Scheduling  Jobs of different duration – depends on sequence set  Two scheduling options explored: l Schedule dynamically – if processor is free, schedule an MSA job – chosen randomly l Schedule dynamically – Sequences are presorted (based on filesize)

10 CMSC 838T – Presentation HT Clustal – Performance Numbers u Speedups  Almost linear speedups  31x on 32 CPUs for the representative MSA problem  116X on 128 CPUs for a larger test case l Solution time reduced from 18.5 hours to 9.5 minutes  Speedup shown for the example MSA set:

11 CMSC 838T – Presentation HT Clustal – Effect of Presorting u Effect of presorting  Figure shows effect of presorting for the example MSA set 32 CPUs, 100 sets, ~3 jobs per CPU  If average number of jobs per CPU < 5 presorting helps  For larger number of jobs per CPU statistical averaging reduces load imbalance

12 CMSC 838T – Presentation MULTICLUSTAL u MULTICLUSTAL Algorithm  A Perl script to generate high quality MSA with little user intervention  Searches for best combination of Clustal W input parameters l To reduce gaps, increase clustering  Parameters to vary : l Scoring matrices : pairwise and multiple l Gap open and extension penalties (pairwise and multiple)  Sequential Algorithm : 1. Till all parameters are sufficiently varied { 2. alignment = Run Clustal W () 3. Calculate quality of alignment 4. Change Parameters }  Quality of alignment l A numerical quantity based on u identitical amino acid matches u Conservative amino acid substitutions u Gap events, amino acid islands I.e. –X-, -XX-, -XXX-, -XXXX-

13 CMSC 838T – Presentation MULTICLUSTAL Optimizations u Optimization on MULTICLUSTAL  Run Clustal W once  Reuse tree generated in the PW/GT Stages l Guide tree calculated only once for multiple runs l Results in speedups from 1.5X to 3X  Use Parallel Clustal W for each run of Clustal W

14 CMSC 838T – Presentation Observations u Parallelizability  First (pairwise alignment) and second (guide tree) stages are parallelizable  Third stage is mostly sequential – speedup limited u 100 sequence MSAs possible ?  PIR at NBRF (Georgetown University) takes maximum of 20 sequences for MSA  Speedup improves user response, for 20 sequences a PC would be sufficient u Probable applications: u Research Environments ? u PIR servers ? u Speedup only on shared memory SGI 3000 workstation ?

