Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCE August 30 – September 01, 2004 Albert A. Conti, Tom Van Court, Martin C. Herbordt Department of Electrical and Computer Engineering Boston University, Boston, MA herbordt | alconti | Processing Repetitive Sequence Structures at Streaming Rate String Matching for Bioinformatics Repeating patterns make up a significant fraction of DNA and protein molecules. These repeating regions are important to biological function because they may act as catalytic, regulatory or evolutionary sites and because they have been implicated in human diseases such as Fragile-X mental retardation and Huntington’s disease. 1 While identifying exact-matching repetitive structures is a task easily handled by a standard PC, identifying structures with a variable number of mismatches, insertions and/or deletions is computationally prohibitive. Existing solutions include expensive dedicated platforms and inaccurate heuristic methods. Our Model, Problems we address In this first study, we examined what could be done with the simplest algorithmic models. Our program is to investigate techniques for analyzing repetitive sequence structure by feeding sequences through the FPGA at streaming rate. By “streaming rate” we mean that characters are processed systolically with emphasis on simple logic. C G A T G C G C T G G T T C A A C T G A tandem repeat of length 5 with 1 mismatch An even palindrome of length 4/5 with 1 insertion/deletion The following tasks were examined on an FPGA and analyzed. Each of these tasks enumerates quantities for strings of arbitrary length but with n determined by available hardware. 1. tandem repeats of length 1 to n with k or fewer mismatches 2. palindromes of length 1 to n with k or fewer mismatches 3. tandem repeats of length 1 to n with k or fewer mismatches and one edit error 4. palindromes of length 1 to n with k or fewer mismatches and one edit error 5. tandem arrays of arbitrary length with period from 1 to n In our system, an Avnet Virtex II Pro Development Board housing a Xilinx XC2VP20 FPGA (right) acts as a coprocessor. Designs implemented on the FPGA for each task are all organized in a two-tier structure (left). Input is streamed through arrays of comparators/counters in the first tier. In the second tier, which we call post processing, we decide what information to send off chip, and determine higher order structures such as arrays of repeats. Implementations for detection Results: > 500x speedup Tier 1 Structure specific comparator arrays and systollic logic surrounded by shift registers for input stream Tier 2 Post-Processing Filters Data Input High Bandwidth Intermediate Results Low Bandwidth Output === c IN c NEXT eqeq -1 eq +1 Tandem repeats : Our method of detecting repeats is similar to the method for detecting palindromes. The difference is that we can take advantage of comparisons made in previous steps through the string. Note below that when our frame of reference shifts for length=4, there is only one comparison that was not made in the previous step. Because there is only a single comparison change for every step through the string, the number of mismatches (k) for any given length can change by no more than one. k is updated for each length at each step according to the table below. We can perform this computation for each length up to n/2 by replicating the logic as shown expired compnew comp ΔkΔk Extending these models for edit errors: The basic cells are modified to look at registers to the left and right of their pair-wise matches. In addition, a combinatorial network is used to detect every possible insertion/deletion point for each length. The diagram to the right shows a cell for palindrome detection with a single insertion or deletion. Precise Tandem Arrays: An additional level of counters count successive shifts with mismatches below a certain threshold. The values in these counters divided by the length of the repeat they are looking for is the number of consecutive repeated cycles detected. The following tables report the maximum size and minimum clock period (post place-and-route timing) of each problem that will fit on our target FPGA. The serial version times are that of a C program running on a 3GHz Xeon-based workstation class PC. Please note that while designs were tested for correctness on the Xilinx XC2VP20, maximum size and timing figures are based on the Xilinx XC2VP100. [1] G. Benson. A Space efficient algorithm for finding the best nonoverlapping alignment score. In M. Crochemore and D. Gusfield, editors, Proc. 5 th Annual Symp. On Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 807, pages Springer-Verlag, BOSTO N UNIVERSITY Taskmax n 1. tandem repeats of length 1 to n with k or fewer mismatches 2. palindromes of length 1 to n with k or fewer mismatches 3. tandem repeats of length 1 to n with k or fewer mismatches and one edit error 4. palindromes of length 1 to n with k or fewer mismatches and one edit error 5. tandem arrays of arbitrary length with period from 1 to n max n Serial Version for Task 1 FPGA Version for Task 1 Serial Version for Task 3 FPGA Version for task us 2.3 us 4.6 us 8.8 us 17.1 us 33.1 us 5 ns 10.5 us 36.0 us 5 ns Results can be sent off chip or processed further. len=2 len=3 Palindromes: Our method here is simple. Pair-wise comparisons are made for all characters 1 to n/2. Results from these comparisons are added systolically to arrive at the number of matching characters n/2 clock cycles later. len=4 ===== C C G A T G C G C T G A A C T new compare expired compare == k 1 == == ==