Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain.

Slides:



Advertisements
Similar presentations
CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.
Advertisements

Speed, Accurate and Efficient way to identify the DNA.
Text Indexing The Suffix Array. Basic notation and facts Occurrences of P in T = All suffixes of T having P as a prefix SUF(T) = Sorted set of suffixes.
INSTRUCTION SET ARCHITECTURES
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
L16: Sorting and OpenGL Interface
Parallel Implementation of BWT Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain.
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
Next Generation Sequencing, Assembly, and Alignment Methods
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of Computer.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
GTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAA exon intron intergene Find Gene Structures in DNA Intergene State First Exon State Intron State.
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
Sorting Algorithms CS 524 – High-Performance Computing.
Blockwise Suffix Sorting for Space-Efficient Burrows-Wheeler Ben Langmead Based on work by Juha Kärkkäinen.
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Sorting and Searching Timothy J. PurcellStanford / NVIDIA Updated Gary J. Katz based on GPUTeraSort (MSR TR )U. of Pennsylvania.
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg Center.
Compressed Index for a Dynamic Collection of Texts H.W. Chan, W.K. Hon, T.W. Lam The University of Hong Kong.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
MES Genome Informatics I - Lecture V. Short Read Alignment
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
New Mexico Computer Science For All Algorithm Analysis Maureen Psaila-Dombrowski.
Cache-efficient string sorting for Burrows-Wheeler Transform Advait D. Karande Sriram Saroop.
QCAdesigner – CUDA HPPS project
CUDA. Assignment  Subject: DES using CUDA  Deliverables: des.c, des.cu, report  Due: 12/14,
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
© David Kirk/NVIDIA, Wen-mei W. Hwu, and John Stratton, ECE 498AL, University of Illinois, Urbana-Champaign 1 CUDA Lecture 7: Reductions and.
Parallel Data Compression Utility Jeff Gilchrist November 18, 2003 COMP 5704 Carleton University.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
© David Kirk/NVIDIA and Wen-mei W. Hwu University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 10 Reduction Trees.
CS/EE 217 GPU Architecture and Parallel Programming Midterm Review
CS 732: Advance Machine Learning
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Parallel Programming - Sorting David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama, Gupta,
GPGPU: Parallel Reduction and Scan Joseph Kider University of Pennsylvania CIS Fall 2011 Credit: Patrick Cozzi, Mark Harris Suresh Venkatensuramenan.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 9: Sorting1 Sorting & Searching Ch. # 9. Chapter 9: Sorting2 Chapter Outline  What is sorting and complexity of sorting  Different types of.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Sorting: Parallel Compare Exchange Operation A parallel compare-exchange operation. Processes P i and P j send their elements to each other. Process P.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
S ORTING ON P ARALLEL C OMPUTERS Dr. Sherenaz Al-Haj Baddar KASIT University of Jordan
Linear Time Suffix Array Construction Using D-Critical Substrings
Advanced Sorting 7 2  9 4   2   4   7
Burrows-Wheeler Transformation Review
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Pairwise and NGS read alignment
CS/EE 217 – GPU Architecture and Parallel Programming
CSC2431 February 3rd 2010 Alecia Fowler
GPGPU: Parallel Reduction and Scan
Maximize read usage through mapping strategies
6- General Purpose GPU Programming
Presentation transcript:

Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Application Domain & objective General Alignment Procedure Scope of parallelism in BWT Selection sort and quick sort implementation Bwt Implementation on GPU Comparative study Agenda

Application Domain & objective General Alignment Procedure Scope of parallelism in BWT Selection sort and quick sort implementation Bwt Implementation on GPU Comparative study Time-Line

Application Domain & objective General Alignment Procedure Scope of parallelism in BWT Selection sort and quick sort implementation Bwt Implementation on GPU Comparative study Time-Line

Application Domain & objective General Alignment Procedure Scope of parallelism in BWT Selection sort and quick sort implementation Bwt Implementation on GPU Comparative study Time-Line

Application Domain & objective General Alignment Procedure Scope of parallelism in BWT Selection sort and quick sort implementation Bwt Implementation on GPU Comparative study Time-Line

Application Domain & objective General Alignment Procedure Scope of parallelism in BWT Selection sort and quick sort implementation Bwt Implementation on GPU Comparative study Time-Line

Application Domain & Objective To present an efficient implementation (Specially parallel) that effectively aids the problem of searching for short sequences in DNA. Analyzing Gene expression Mapping variations between individuals Mapping homologous Proteins Assembling Genome of Organism

Indexing { Location,Occurance} Reads Basic Alignment Procedure To be parallelized Parallelized Intermediate size :10^18 Genome O(logG) Searching

10 Scope of Parallelism in BWT With BWT, w length string can be find in O(w) time. The BWT is closely related to the suffix array Lexicographic sorted list of all suffixes in a genome. Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i) =1} BWT

● Implementation of Bwt using Selection Sort – OpenMp Initial Step - 1

Selection Sort - Openmp

● Implementation of Bwt using Selection Sort – OpenMp ● Implementation of Bwt using Quick Sort – OpenMp Initial Step - 2

Quick Sort - Openmp

● Implementation of Bwt using Selection Sort – OpenMp ● Implementation of Bwt using Quick Sort – OpenMp ● Implementing Bwt on GPU – Bitonic sort Initial Step - 3

Why Bitonic ??... Concatenations of two sub-sequences sorted in opposite directions – A cyclic shift of elements Implemented by comparator networks – Work in place – No Communication Naturally suitable for SIMD architectures – Each thread executing same code but different data O(log 2 n) time and O(nlog 2 n) work

18 Burrows-Wheeler Transform 5$ACGTA 4A$ACGT 3TA$ACG 2GTA$AC 1CGTA$A 0ACGTA$ Input: A C G T A $ Output: A T $ A C G Basic String Sorting Algorithm indices: $ACGTA 4A$ACGT 0ACGTA$ 1CGTA$A 2GTA$AC 3TA$ACG indices:

Steps Performed Copy Genome from host to device Memory Indices Array for pointing Reference string Compare Suffix based on indices array – Swap indices accordingly. Sorts n elements in log 2 n Kernel calls. – Each of O(1) time & O(n) work One more step for BWT from suffix array – Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i)= 1}

Cuda_Memcpy & kernel call CPU – GPU Interaction (BWT) Genome O(log 2 G) Searching Suffix Array

Evaluation Bwt with Bitonic Sort

Comparison between Expected (GPU) and Exact result (Quick_Sort_time) * 2 ) / 240

References : Fast in-place sorting with CUDA based on bitonic sort :Hagen Peters Rapid Parallel Genome Indexing with MapReduce :Rohith K. Menon M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Technical report Lightweight Data Indexing and Compression in External Memory :Paolo Ferragina Parallel Lossless Data Compression on the GPU : Yao Zhang

Thanks

Future Work Run in limited memory environments – Compute in parts To use the memory hierarchy of GPU – Sort keys are cached in register or shared memory – Long runs of repeated character Position indicating end of run Can only sort sequence,with length power of 2 – 2 k +1  2 k+1 – Padding with largest symbol