Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig Presenter: Erkan Okuyan

Motivation Massive amount of sequencing data (Illumina – 454 - SOLID) (short reads - with high error rate) Assembly processes sensitive to errors in reads thus sequencing errors needs to be corrected Size of error correction problem is computationally demanding

Definitions - Let R = {r 1, r 2,…,r k } be a set of k reads with |r i | = L - Let r i be in {A, C, G, T} L for all 1 ≤ i ≤ k. - Let m (multiplicity) and l (length) satisfy m>1 and l<L Definition1 (Solid and Weak): An l-tuple (a DNA string of length l) is called solid with respect to R and m if it is a substring of at least m reads in R and weak otherwise. –m-way replicated l-tuple is probably a correct l-tuple Definition2 (Spectrum): The spectrum of R with respect to m and l, denoted as T m,l (R), is the set of all solid l-tuples with respect to R and m. –Spectrum T m,l (R) is the set of all correct l-tuples

Definitions - Let R = {r 1, r 2,…,r k } be a set of k reads with |r i | = L - Let r i be in {A, C, G, T} L for all 1 ≤ i ≤ k. - Let m (multiplicity) and l (length) satisfy m>1 and l<L Definition3 (T-string): A DNA string s is called a T m,l (R)- string if every l-tuple in s is an element of T m,l (R). Definition4 (SAP): Given a DNA string s and spectrum T m,l (R). Find a T m,l (R)-string s* in the set of T m,l (R)-strings that minimizes the distance function d(s,s*).

CUDA (Compute Unified Device Architecture) Serial Code (host) Parallel Kernel (device) KernelA >>(args); Serial Code (host) Parallel Kernel (device) KernelB >>(args); Integrated host+device app program –Serial or modestly parallel parts in host C code –Highly parallel parts in device SPMD kernel C code

CUDA Execution A GPU device –Is a coprocessor to the CPU or host –Has its own DRAM (device memory) –Runs many threads in parallel Data-parallel portions of an application are expressed as device kernels which run on many threads Differences between GPU and CPU threads –GPU threads are extremely lightweight –Very little creation overhead –GPU needs 1000s of threads for full efficiency

Parallel Error Correction with CUDA Each kernel thread is responsible for correction of a single read r i. Voting based algorithm –First Step: Calculation of voting matrix –Second Step:Single-Mutation fixing/trimming/discarding

Step1: Voting Matrix Calculation

Step2: Fixing/Trimming/Discarding Reads

Fast Membership Tests First algorithm(kernel) dominates time –(L-l). (l+3. p. l) membership tests required where p is the number of l-tuples that do not belong in the spectrum. –Space efficient Bloom filter speeds up membership test of spectrum Compute bloom filter on CPU and store it on texture memory (fast read only cache) on device

Bloom Filter Probabilistic data structure –No false negatives –Small percentage of false positives –Space efficient and fast Uses a bit array B of length m and d hash functions –to insert x, we set B[h i (x)] = 1, for i=1,…,d –to query y, we check if B[h i (y)] all equal 1, for i=1,…,d

Bloom Filter Example a and b are inserted to a m=10 n=2 d=3 bloom filter Query of c on bloom filter returns false since some bits are 0. Query of d on bloom filter returns true since all bits are 1 (False positive).

Overall Algorithm 1)Pre-Computation on the CPU: Program the Bloom filter (counting bloom filter) bit-vector by hashing each l-tuple present on read R. 2)Data transfer from CPU to GPU: Allocate memory/transfer Bloom filter and reads. 3)Execute CUDA kernel. 4)Data transfer from GPU to CPU: Transfer the set of corrected/trimmed reads.

Performance Evaluation System Parameters –Nvidia Geforce GTX 280 with 1GB memory –AMD Opteron dual core 2.2Ghz CPU with 2GB memory Datasets –Artificial Sets (1%, 2%, 3% error rates) Yeast Chromosomes (S.cer5, S.cer7) Bacterial Genomes (H.inf, E.col) –Real Set Staphylococcus Aureus strain MW2 (H.Aci) (error rate ~1%)

Performance Evaluation

Discussion/Conclusion (GOOD) Runtime savings of 10 to 19 times reported. Bigger datasets is not an issue as long as Bloom filter fits in texture memory. (More than one round of read-load/read-correct approach) Possible to even further parallelize on distributed memory GPU farms.

Discussion/Conclusion (BAD) Does not exploit fast shared memory within thread blocks (i.e. each read r i does not really have to be handled by a single thread, voting matrix can be constructed in parallel) thus further speed-up is possible. Predetermined read length L is a bit restrictive.

Thank You

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.

Similar presentations

Presentation on theme: "Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.

Similar presentations

Presentation on theme: "Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig."— Presentation transcript:

Similar presentations

About project

Feedback