Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCE August 30 – September 01, 2004 Albert A. Conti, Tom Van Court, Martin C. Herbordt.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Chapter 7 – Registers.
Improved Alignment of Protein Sequences Based on Common Parts David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina Introduction.
UNIVERSITY OF MASSACHUSETTS Dept
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Sequencing and Sequence Alignment
Finding approximate palindromes in genomic sequences.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Sequence similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Implementation of Planted Motif Search Algorithms PMS1 and PMS2 Clifford Locke BioGrid REU, Summer 2008 Department of Computer Science and Engineering.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
A Signature Match Processor Architecture for Network Intrusion Detection Janardhan Singaraju, Long Bu and John A. Chandy Electrical and Computer Engineering.
Chapter 2: Algorithm Discovery and Design
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Chapter 2: Algorithm Discovery and Design
Sequencing a genome and Basic Sequence Alignment
Counters and Registers
Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Sequence Alignment.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Firmware based Array Sorter and Matlab testing suite Final Presentation August 2011 Elad Barzilay & Uri Natanzon Supervisor: Moshe Porian.
CS3502: Data and Computer Networks DATA LINK LAYER - 1.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.
Sequencing a genome and Basic Sequence Alignment
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Chapter 3 Computational Molecular Biology Michael Smith
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Whole Genome Repeat Analysis Package A Preliminary Analysis of the Caenorhabditis elegans Genome Paul Poole.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Regular Expression Matching for Reconfigurable Packet Inspection Authors: Jo˜ao Bispo, Ioannis Sourdis, Jo˜ao M.P. Cardoso and Stamatis Vassiliadis Publisher:
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
1 Data Link Layer Lecture 23 Imran Ahmed University of Management & Technology.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
SLAAC SLD Update Steve Crago USC/ISI September 14, 1999 DARPA.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Backprojection and Synthetic Aperture Radar Processing on a HHPC Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
Topics in Internet Research: Project Scope Mehreen Alam
1 An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern.
CHAPTER 16 SEQUENTIAL CIRCUIT DESIGN
Regular Expression Matching in Reconfigurable Hardware
GateKeeper: A New Hardware Architecture
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCE August 30 – September 01, 2004 Albert A. Conti, Tom Van Court, Martin C. Herbordt Department of Electrical and Computer Engineering Boston University, Boston, MA herbordt | alconti | Processing Repetitive Sequence Structures at Streaming Rate String Matching for Bioinformatics Repeating patterns make up a significant fraction of DNA and protein molecules. These repeating regions are important to biological function because they may act as catalytic, regulatory or evolutionary sites and because they have been implicated in human diseases such as Fragile-X mental retardation and Huntington’s disease. 1 While identifying exact-matching repetitive structures is a task easily handled by a standard PC, identifying structures with a variable number of mismatches, insertions and/or deletions is computationally prohibitive. Existing solutions include expensive dedicated platforms and inaccurate heuristic methods. Our Model, Problems we address In this first study, we examined what could be done with the simplest algorithmic models. Our program is to investigate techniques for analyzing repetitive sequence structure by feeding sequences through the FPGA at streaming rate. By “streaming rate” we mean that characters are processed systolically with emphasis on simple logic. C G A T G C G C T G G T T C A A C T G A tandem repeat of length 5 with 1 mismatch An even palindrome of length 4/5 with 1 insertion/deletion The following tasks were examined on an FPGA and analyzed. Each of these tasks enumerates quantities for strings of arbitrary length but with n determined by available hardware. 1. tandem repeats of length 1 to n with k or fewer mismatches 2. palindromes of length 1 to n with k or fewer mismatches 3. tandem repeats of length 1 to n with k or fewer mismatches and one edit error 4. palindromes of length 1 to n with k or fewer mismatches and one edit error 5. tandem arrays of arbitrary length with period from 1 to n In our system, an Avnet Virtex II Pro Development Board housing a Xilinx XC2VP20 FPGA (right) acts as a coprocessor. Designs implemented on the FPGA for each task are all organized in a two-tier structure (left). Input is streamed through arrays of comparators/counters in the first tier. In the second tier, which we call post processing, we decide what information to send off chip, and determine higher order structures such as arrays of repeats. Implementations for detection Results: > 500x speedup Tier 1 Structure specific comparator arrays and systollic logic surrounded by shift registers for input stream Tier 2 Post-Processing Filters Data Input High Bandwidth Intermediate Results Low Bandwidth Output === c IN c NEXT eqeq -1 eq +1 Tandem repeats : Our method of detecting repeats is similar to the method for detecting palindromes. The difference is that we can take advantage of comparisons made in previous steps through the string. Note below that when our frame of reference shifts for length=4, there is only one comparison that was not made in the previous step. Because there is only a single comparison change for every step through the string, the number of mismatches (k) for any given length can change by no more than one. k is updated for each length at each step according to the table below. We can perform this computation for each length up to n/2 by replicating the logic as shown expired compnew comp ΔkΔk Extending these models for edit errors: The basic cells are modified to look at registers to the left and right of their pair-wise matches. In addition, a combinatorial network is used to detect every possible insertion/deletion point for each length. The diagram to the right shows a cell for palindrome detection with a single insertion or deletion. Precise Tandem Arrays: An additional level of counters count successive shifts with mismatches below a certain threshold. The values in these counters divided by the length of the repeat they are looking for is the number of consecutive repeated cycles detected. The following tables report the maximum size and minimum clock period (post place-and-route timing) of each problem that will fit on our target FPGA. The serial version times are that of a C program running on a 3GHz Xeon-based workstation class PC. Please note that while designs were tested for correctness on the Xilinx XC2VP20, maximum size and timing figures are based on the Xilinx XC2VP100. [1] G. Benson. A Space efficient algorithm for finding the best nonoverlapping alignment score. In M. Crochemore and D. Gusfield, editors, Proc. 5 th Annual Symp. On Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 807, pages Springer-Verlag, BOSTO N UNIVERSITY Taskmax n 1. tandem repeats of length 1 to n with k or fewer mismatches 2. palindromes of length 1 to n with k or fewer mismatches 3. tandem repeats of length 1 to n with k or fewer mismatches and one edit error 4. palindromes of length 1 to n with k or fewer mismatches and one edit error 5. tandem arrays of arbitrary length with period from 1 to n max n Serial Version for Task 1 FPGA Version for Task 1 Serial Version for Task 3 FPGA Version for task us 2.3 us 4.6 us 8.8 us 17.1 us 33.1 us 5 ns 10.5 us 36.0 us 5 ns Results can be sent off chip or processed further. len=2 len=3 Palindromes: Our method here is simple. Pair-wise comparisons are made for all characters 1 to n/2. Results from these comparisons are added systolically to arrive at the number of matching characters n/2 clock cycles later. len=4 ===== C C G A T G C G C T G A A C T new compare expired compare == k  1 == == ==