Accelerator Architecture in Computational Biology and Bioinformatics, February 24th, 2018, Vienna, Austria Exploring Speed/Accuracy Trade-offs in Hardware.

Slides:

Advertisements

Similar presentations

Indexing DNA Sequences Using q-Grams

Advertisements

Key idea: SHM identifies matching by incrementally shifting the read against the reference Mechanism: Use bit-wise XOR to find all matching bps. Then use.

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.

Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.

Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.

Pairwise Sequence Alignment

Heuristic alignment algorithms and cost matrices

Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.

This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.

Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCE August 30 – September 01, 2004 Albert A. Conti, Tom Van Court, Martin C. Herbordt.

Introduction to Sequence Alignment PENCE Bioinformatics Research Group University of Alberta May 2001.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11: Core String Edits.

Protein Multiple Sequence Alignment Sarah Aerni CS374 December 7, 2006.

NEW APPROACH TO CALCULATION OF RANGE OF POLYNOMIALS USING BERNSTEIN FORMS.

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –

Sequence comparison: Local alignment

Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,

Accelerating Read Mapping with FastHASH †† ‡ †† Hongyi Xin † Donghyuk Lee † Farhad Hormozdiari ‡ Samihan Yedkar † Can Alkan § Onur Mutlu † † † Carnegie.

Sequence Alignment.

Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒黃尹柔田耕豪蕭逸嫻謝朝茂莊閔傑 2014/05/12 1.

Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.

Gene Matching Using JBits Steven A. Guccione Eric Keller.

Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.

1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.

Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,

BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.

Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.

Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.

Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.

BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.

Chapter 3 Computational Molecular Biology Michael Smith

1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.

Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.

Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.

Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.

Qq q q q q q q q q q q q q q q q q q q Background: DNA Sequencing Goal: Acquire individual’s entire DNA sequence Mechanism: Read DNA fragments and reconstruct.

Chapter 16: Searching, Sorting, and the vector Type.

Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.

Dynamic Programming for the Edit Distance Problem.

Aligning Genomes Genome Analysis, 12 Nov 2007 Several slides shamelessly stolen from Chr. Storm.

Genome Read In-Memory (GRIM) Filter: Fast Location Filtering in DNA Read Mapping using Emerging Memory Technologies Jeremie Kim 1, Damla Senol 1, Hongyi.

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.

FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.

Andreas Klappenecker [partially based on the slides of Prof. Welch]

Floating-Point FPGA (FPFPGA)

1Carnegie Mellon University

Sequence comparison: Local alignment

Genomic Data Clustering on FPGAs for Compression

Challenging Cloning Related Problems with GPU-Based Algorithms

Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,

GateKeeper: A New Hardware Architecture

Genome Read In-Memory (GRIM) Filter:

Dynamic Programming.

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping

Pairwise sequence Alignment.

Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.

COMP60621 Fundamentals of Parallel and Distributed Systems

COMPUTER NETWORKS CS610 Lecture-5 Hammad Khalid Khan.

Department of Electrical Engineering Technion

Bioinformatics Algorithms and Data Structures

Fig. 1. Neighborhood map (N) and the Shouji bit-vector, for text T=GGTGCAGAGCTC and pattern P=GGTGAGAGTTGT for E = Fig. 1. Neighborhood map (N)

COMP60611 Fundamentals of Parallel and Distributed Systems

By Nuno Dantas GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping Mohammed Alser §, Hasan Hassan †, Hongyi.

BitMAC: An In-Memory Accelerator for Bitvector-Based

Reconfigurable Computing (EN2911X, Fall07)

SneakySnake: A New Fast and Highly Accurate Pre-Alignment Filter

Presentation transcript:

Accelerator Architecture in Computational Biology and Bioinformatics, February 24th, 2018, Vienna, Austria Exploring Speed/Accuracy Trade-offs in Hardware Accelerated Pre-Alignment in Genome Analysis Mohammed Alser, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan Bilkent University, TU Dresden, ETH Zürich

Executive Summary Problem: There is a significant performance gap between high-throughput DNA sequencers and read mapper. Observations: Inaccuracy of state-of-the-art pre- alignment filters leads to high computational burden. Goal: Identify and mitigate the sources of inaccuracy in state-of-the-art filters. Key Results: A pre-alignment filter is beneficial if the filter: Is at least 2x faster than the alignment. Can reject at least 80% of incorrect mappings.

What makes Read Mapper SLOW? As Prof. Onur explained that there exist a performance bottleneck between the sequencer and the read mapper. And bridging this gap requires understanding what makes read mapper SLOW! What makes Read Mapper SLOW?

What makes Read Mapper SLOW? Key Observation # 1 90% of the read mapper’s execution time is spent in read alignment. If we analyze the execution time of current read mappers that have alignment step, we will observe that 90% of the time is spent in read alignment step. Alser et al, Bioinformatics (2017)

What makes Read Mapper SLOW? (cont’d) Key Observation # 2 98% of candidate locations have high dissimilarity with a given read. We also observe that an overwhelming majority of the candidate locations have high dissimilarity with a given read, which leads to waste the time verifying these locations. H. Cheng, et al., "BitMapper: an efficient all-mapper based on bit-vector computing," BMC bioinformatics, vol. 16, p. 1, 2015. H. Xin, et al., "Accelerating read mapping with FastHASH," BMC genomics, vol. 14, p. S13, 2013. Cheng et al, BMC bioinformatics (2015) Xin et al, BMC genomics (2013)

What makes Read Mapper SLOW? (cont’d) Key Observation # 3 Quadratic-time dynamic-programming algorithms. etc etc Data dependencies limits the computation parallelism. etc 1- Read alignment follows the basic dynamic-programming doctrine which runs in a quadratic time. 2- Data dependencies between the entries limits the parallelism. Each cell depends or three pre-computed cells (immediate left, upper, and upper-left cells). Thus, we can compute the vectors one after another but not in parallel. Left-to-right, or top-to-bottom, anti-diagonal. 3- We can solve a significant amount of time, If we can find a way to detect the incorrect mappings with cheap heuristics, much cheaper than computing the alignment. Computing the entire matrix all the time for all mappings.

So, can we do better? Can we do better?

Proposed Strategy Compute the alignment for only similar sequences Highly parallel matrix computation 1- Our first proposed strategy is to differentiate between correct mappings and incorrect ones. Remove the incorrect ones and align only similar sequences. 2- parallelizing the matrix computation. 3- Design an accurate filter to remove most of the incorrect mappings. Highly accurate filtering algorithm

1- Align Only Similar Sequences The main aim of the pre-alignment filtering is to remove dissimilar sequences and allow only similar ones to be further processed.

The Effect of Pre-Alignment Filter+ Alignment assuming alignment processes 100 Mappings/sec Pre-alignment saves more than 40% to 80% of the total processing time What effect pre-alignment has on overall execution time? Well, that depends on how much and how fast it can remove incorrect mappings Target

2- Highly Parallel Matrix Computation 8 matches 0 mismatches I S T A N B U L I S T A N B U L Our second proposed method to accelerate read mappers is to parallelize the matrix computation. To explain our new matrix, here is an example of exact match sequences. Now imagine there is a base deletion for any reason.

2- Highly Parallel Matrix Computation (cont’d) 8 matches 0 mismatches I S T A N B U L I S T A N B U L Let it be the character “A”. What effect the deletion has on the overall alignment?

2- Highly Parallel Matrix Computation (cont’d) 3 matches 5 mismatches I S T A N B U L I S T N B U L After deletion, the trailing bases will be shifted to left to form a single sequence. But when we align it back, we get too many mismatches though the number of edits is only ONE. To cancel the effect of deletion and correctly align the sequences, we have to shift the sequence to right and align again. To cancel the effect of deletion, we need to shift to right direction

2- Highly Parallel Matrix Computation (cont’d) 7 matches 1 mismatches I S T A N B U L I S T N B U L With the help of another right-shifted copy of the original sequence, we can have more similarities between the two sequences. Think about other scenarios where you have an insertion? Or a combination of deletion and insertion? I S T N B U L

2- Highly Parallel Matrix Computation (cont’d) Reference 2 Deletion masks We need to compute 2E+1 vectors, E=edit distance threshold dp[i][j]= 0 if X[i]=Y[j] 1 if X[i]≠Y[j] Query No data dependencies! So this is how we compute the filter matrix. We pairwise compare each character from a sequence to its corresponding character from the other sequence. Match =0, Mismatch=1 The yellow diagonal vector represents XOR between the two sequences. The pink diagonal vectors represent right-shifted copies of the query sequence then compared to the reference. The blue vectors represent left-shifted copies of the query. By this we can guarantee that we can correctly examine any two sequences regardless the type of edits they have. AND NO DATA DEPENDENCIES between the cells. 2 Insertion masks

3- Highly accurate filtering algorithm Pigeonhole principle: if E items are put into E+1 boxes, then one or more boxes would be empty. I S T A N B U L Our aim is to find these E+1 segments quickly. I S T N B U L Our third proposed method is design a highly accurate filtering algorithm. I S T N B U L

3- Highly accurate filtering algorithm (cont’d) MAGNET Check for substitutions. The longest identical subsequence ≥ (m−E)/(E+1) . Extraction & Encapsulation (divide-and-Conquer fashion). 1- we check for exact matching, If not enough matches in the first vector, then we continue. 2- Each mask nominate the longest segment of consecutive zeros. Then we pick the longest out of all nominated segments. We evaluate its length by the lower bound equality (m−E)/(E+1) , which occurs when all edits are equispaced and all E+1 subsequences are of the same length. If it satisfies then move to step 3. Not much of matches in the first mask 38 ≥ 75/4

3- Highly accurate filtering algorithm (cont’d) MAGNET Check for substitutions. The longest identical subsequence ≥ (m−E)/(E+1) . Extraction & Encapsulation (divide-and-Conquer fashion). Step 3: Replace the longest match and all its corresponding positions in the other masks by ‘1’s. We also encapsulate the longest matches by one from right and left. This encapsulation represents the edits that divide a single long match into smaller matches. Then we can apply the third step recursively over the right side and left side separately (divide-and-conquer approach). Now divide the problem into two subproblems and repeat

3- Highly accurate filtering algorithm (cont’d) MAGNET Check for substitutions. The longest identical subsequence ≥ (m−E)/(E+1) . Extraction & Encapsulation (divide-and-Conquer fashion). When the algorithm is terminated, then the number of edits equals to the number of encapsulation bits = 5 edits Counting the encapsulation bits reveals the number if edits

MAGNET Accelerator We implement our algorithm in Verilog and design a hardware accelerator for it. Each processing core is able to examine a single mapping. We integrate many hardware processing cores in the architecture of MAGNET for examining many mappings in a parallel fashion.

VC709 Resource Utilization Edit Distance Threshold MAGNET 1 core GateKeeper Slice LUT Slice Register 2 10.5% 0.86% 0.39% 0.01% 5 37.8% 2.3% 0.71% Edit Distance Threshold MAGNET 8 cores, 2 cores GateKeeper 16 cores Slice LUT Slice Register 2 85% 7% 32% 2% 5 83% 6% 45% GateKeeper occupies at least 10x less resources than MAGNET. This helps to integrate more processing cores than MAGNET.

False Accept Rate MAGNET is 7x - 105x less false accept rate However, MAGNET is 7x - 105x less false accept rate

True Reject Rate MAGNET rejects 87% - 99% incorrect mappings MAGNET also rejects 87% - 99% incorrect mappings

Alignment vs Pre-Alignment Speedup Work Platform Mappings #/ 1 sec MAGNET FPGA (Virtex7) 37,500,000 GateKeeper [17] 1,665,811,051 SHD [4] Intel SSE 18,820,572 Myers’s algorithm [12] Intel SSE [13] 2,146,266 Smith-Waterman [4] 201,783 Smith-Waterman [16] FPGA (Virtex4) (128 bp) 689,543 GPU (128 bp) 86,192 Smith-Waterman [15] 4,000 MAGNET requires 2x less time than SHD and 44x more time than GateKeeper. MAGNET is 17x faster than the accelerated implementation [13] of Myers’s algorithm [12]).

Speed/Accuracy Trade-offs (end-to-end) Filter+ Alignment assuming alignment processes 100 Mappings/sec GateKeeper MAGNET So we have MAGNET that is accurate but slow And GateKeeper that is fast but inaccurate. What is better, SPEED or ACCURACY? Target

Conclusion We introduce, MAGNET, fast and accurate FPGA pre-alignment filter. Adding pre-alignment filter to genome analysis is beneficial if the filter is at least 2x faster than the alignment and able to reject at least 80% of incorrect mappings. FPGAs will likely continue to be the best acceleration platform for computational genomics Aluru et al., IEEE Design & Test, (2014). Integrating the FPGA accelerators with the sequencer can help to hide the complexity and details of the underlying hardware.

Acknowledgements ALKAN Lab SAFARI Lab CfAED center Can Alkan Mohamme Alser SAFARI Lab Onur Mutlu Hasan Hassan CfAED center Akash Kumar

Accelerator Architecture in Computational Biology and Bioinformatics, February 24th, 2018, Vienna, Austria Exploring Speed/Accuracy Trade-offs in Hardware Accelerated Pre-Alignment in Genome Analysis Mohammed Alser, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan Bilkent University, TU Dresden, ETH Zürich