Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SC'03, Nov. 15–21, 2003 A Million-Fold Speed Improvement in Genomic Repeats Detection John W. Romein Jaap Heringa Henri E. Bal Vrije Universiteit, Amsterdam.

Similar presentations


Presentation on theme: "1 SC'03, Nov. 15–21, 2003 A Million-Fold Speed Improvement in Genomic Repeats Detection John W. Romein Jaap Heringa Henri E. Bal Vrije Universiteit, Amsterdam."— Presentation transcript:

1 1 SC'03, Nov. 15–21, 2003 A Million-Fold Speed Improvement in Genomic Repeats Detection John W. Romein Jaap Heringa Henri E. Bal Vrije Universiteit, Amsterdam Vrije Universiteit Faculty of Sciences, Department of Computer Science Bio-Informatics Group & Computer Systems Group Amsterdam, the Netherlands

2 2 SC'03, Nov. 15–21, 2003 repeats in bio sequences important to detect  essential for evolution  protein structure & function  diseases hard to detect  any length  mutations  insertions/deletions  different fragment sizes  tandem and distant

3 3 SC'03, Nov. 15–21, 2003 repro delineates repeats ☺ sensitive two phases 1.find top alignments (slow)‏ 2.find repeats replaced phase 1  old algorithm ☹ O(n 4 )  n < 2,000  new algorithm ☺ O(n 3 )  n < 60,000 ☺ 3-level parallel: SIMD, SMP, cluster

4 4 SC'03, Nov. 15–21, 2003 sidestep: sequence alignment  superpose two sequences ( TATGCAG, TCTGAG )‏  match symbols vertically (good: +2, bad: -1)‏  allow gaps (-2-1*length)‏  maximize score  compute matrix using dynamic programming

5 5 SC'03, Nov. 15–21, 2003 sidestep: local alignment  Find sub-sequences that match well  Ignores non-matching values before and after the subsequence (by disallowing negative values)  Construct actual alignment: O(n 3 ) time  Computing only the scores: O(n 2 ) time  (see paper)

6 6 SC'03, Nov. 15–21, 2003 summary  (TATGCAG, TCTGAG) => 6  takes O( n 2 ) time  (TATGCAG, TCTGAG) =>  takes O( n 3 ) time  Matching TATGCAG with TCTGAG gives same result as matching only the substrings TATGCAG and TCTGAG

7 7 SC'03, Nov. 15–21, 2003 finding top alignments red lines: top alignments split sequence every possible way  align subsequence-pair  best is first top alignment trick: find next best (top) alignment using O(n 2 ) algorithm n times; construct top alignment using O(n 3 ) algorithm repeat while avoiding found top alignments  user typically wants 5-30 top alignments  ordered list, do most promising alignments first  realign 3-10%

8 8 SC'03, Nov. 15–21, 2003 performance old vs. new sequence: longest known protein (titin)‏ speed improvement increases with sequence length

9 9 SC'03, Nov. 15–21, 2003 parallel alignment parallelism within alignment ☹ loop-carried dependency concurrent alignments ☹ speculative parallelism ☺ good performance three-level parallelism  SSE/SSE2 multimedia extensions (SIMD)  shared memory MIMD  distributed memory MIMD

10 10 SC'03, Nov. 15–21, 2003 SIMD parallelism multimedia extensions  4 (SSE) or 8 (SSE2) parallel operations on consecutive 2-byte words  compiler intrinsics compute 4 (or 8) neighboring matrices concurrently ☹ interleaved memory layout use fine-grained hardware for coarse-grained computation applicable to any program that does many alignments

11 11 SC'03, Nov. 15–21, 2003 SSE/SSE2 performance speedups w.r.t. new algorithm superlinear speedups  MAX operator  8 extra mmx/xmm registers  scheduling cache-aware alignment: 4 – 6.5 times faster

12 12 SC'03, Nov. 15–21, 2003 MIMD parallelism SIMD (SSE) parallelism is speculative  If a matrix (alignment) is ‘promising’, its neighbors probably also are promising MIMD parallelism:  use dynamic task scheduling, selecting most promising tasks from a job queue Shared memory (SMP): easy Distributed memory: MPI, master/worker

13 13 SC'03, Nov. 15–21, 2003 total parallel performance SMP: 2 CPUs  2 2 times faster cluster: 64*2 CPUs  548 – 889-fold speedup Up to 125x faster than SSE version on 1 CPU

14 14 SC'03, Nov. 15–21, 2003 conclusions new algorithm >> 100 times faster  much more for longer sequences parallel: SSE(2), SMP, cluster  SSE(2) parallelism yields superlinear speedups  128 CPUs: 548 – 889-fold speedup 1,000,000-fold speed improvement


Download ppt "1 SC'03, Nov. 15–21, 2003 A Million-Fold Speed Improvement in Genomic Repeats Detection John W. Romein Jaap Heringa Henri E. Bal Vrije Universiteit, Amsterdam."

Similar presentations


Ads by Google