Download presentation
Presentation is loading. Please wait.
Published byQuentin Marsh Modified over 9 years ago
1
Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers Esam El-Araby 1, Mohamed Taher 1, Tarek El-Ghazawi 1, Mohamed Abouellail 1, Nandakishore Sastry 2, and Kris Gaj 2 1 The George Washington University, 2 George Mason University Esam El-Araby 1, Mohamed Taher 1, Tarek El-Ghazawi 1, Mohamed Abouellail 1, Nandakishore Sastry 2, and Kris Gaj 2 1 The George Washington University, 2 George Mason University
2
21017 / MAPLD2005El-Araby Outline Introduction SRC Hardware & Software Cray XD1 Hardware & Software String Matching Algorithms Implementation Methodology Results and Comparisons Conclusions
3
31017 / MAPLD2005El-Araby Introduction Interface P memory P memory... PP PP I/O Interface FPGA memory FPGA memory... FPGA... I/O Microprocessor SystemReconfigurable Processor System
4
41017 / MAPLD2005El-Araby Outline Introduction SRC Hardware & Software Cray XD1 Hardware & Software String Matching Algorithms Implementation Methodology Results and Comparisons Conclusions
5
51017 / MAPLD2005El-Araby Hi-Bar sustains 1.4 GB/s per port with 180 ns latency per tier Up to 256 input and 256 output ports with two tiers of switch Common Memory (CM) has controller with DMA capability Controller can perform other functions such as scatter/gather Up to 8 GB DDR SDRAM supported per CM node SRC Architecture (Hi-Bar TM Based Systems) Storage Area Network Local Area Network Wide Area Network Disk Customers’ Existing Networks PCI-X PCI-X MAP ® SRC-6 MAP PPPP Memory SNAP™ PPPP Memory SNAP Gig Ethernet etc. Common Memory ChainingGPIO SRC Hi-Bar Switch
6
61017 / MAPLD2005El-Araby SRC Reconfigurable Processor
7
71017 / MAPLD2005El-Araby SRC Programming Environment P system FPGA system HLL (C) HDL (VHDL)
8
81017 / MAPLD2005El-Araby SRC Programming Environment (cnt’d)
9
91017 / MAPLD2005El-Araby Main program Function_1(a, d, e) Function_2(d, e, f) Function_1 Function_2 Macro_1(a, b, c) Macro_2(b, d) Macro_2(c, e) Macro_3(s, t) Macro_1(n, b) Macro_4(t, k) FPGA …… Macro_1 Macro_2 a b c de FPGA contents after the Function_1 call Program in C or Fortran SRC Programming Environment (cnt’d)
10
101017 / MAPLD2005El-Araby Outline Introduction SRC Hardware & Software Cray XD1 Hardware & Software String Matching Algorithms Implementation Methodology Results and Comparisons Conclusions
11
111017 / MAPLD2005El-Araby Cray XD1 System Architecture (One Chassis) RapidArray components in a Cray XD1 chassis FPGA and 2 nd RAP are on Expansion Module Compute 12 AMD Opteron 32/64 bit, x86 processors High Performance Linux RapidArray Interconnect 12 communications processors 1 Tb/s switch fabric Active Management Dedicated processor Application Acceleration 6 co-processors
12
121017 / MAPLD2005El-Araby Cray XD1 Application Acceleration Interfaces XC2VP30-50 running at up to 200 MHz 4 QDR II RAM with over 400 HSTL-I I/O at 200 MHz DDR (400 MTransfers/s) 16 bit simplified HyperTransport I/F at 400 MHz DDR (800 MTransfers/s) QDR and HT I/F take up <20 % of XC2VP30. The rest is available for user applications User Logic ADDR(20:0) D(35:0) Q(35:0) TX RX RapidArray Transport ADDR(20:0) D(35:0) Q(35:0) ADDR(20:0) D(35:0) Q(35:0) ADDR(20:0) D(35:0) Q(35:0) RapidArray Transport Core QDR RAM Interface Core QDR II SRAM RAP Virtex-II Pro
13
131017 / MAPLD2005El-Araby Cray XD1 Development Flow Hardware FlowSoftware Flow Standard Hardware Flow
14
141017 / MAPLD2005El-Araby Cray XD1 Hardware Development Flow Standard Flow Additional High-Level Tools
15
151017 / MAPLD2005El-Araby Design Methodology using Cray XD1 Write application in C for system microprocessor Identify computation intense routine(s) Generate a bitstream using Cray Cores (RT & QDRII) and language of choice Create module in HDL (Verilog, VHDL) Create module using High Level Language Tools Validate Module Synthesize using (XST, Leonardo, Synplify Pro) Create bitstream using Xilinx place & route tools Replace routines with Cray API calls Run Application
16
161017 / MAPLD2005El-Araby Outline Introduction SRC Hardware & Software Cray XD1 Hardware & Software String Matching Algorithms Implementation Methodology Results and Comparisons Conclusions
17
171017 / MAPLD2005El-Araby String Matching - Introduction String Matching – detecting the occurrence of a particular substring, called the pattern, in another string, called the text Types of String matching: Exact string matching Approximate string matching Exact string matching: Involves match patterns, where they exist completely, that is unbroken and with no irrelevant data in between any letters Numerous Applications : NIDS, text editing, …etc. Approximate string matching: Pattern rarely matches the text completely Finds application in Computational biology (DNA matching), image detection, handwriting recognition…etc.
18
181017 / MAPLD2005El-Araby Why align two protein or DNA sequences? Determine whether they are descended from a common ancestor (homologous) Infer a common function Locate functional elements Infer protein structure, if the structure of one of the sequences is known Problem: find the best pairwise alignment of GAATC and CATAC DNA Matching Basics GAATC CATAC GAATC- CA-TAC GAAT-C C-ATAC GAAT-C CA-TAC -GAAT-C C-A-TAC GA-ATC CATA-C We need a way to measure the quality of a candidate alignment Alignment scores consist of two parts: substitution matrix gap penalty
19
191017 / MAPLD2005El-Araby PurineAG PyrimidineCT Transition (cheap) Transversion (expensive) 10-5 0 T 10-5 0G 0 10-5C 0 10A TGCA A hypothetical substitution matrix GAAT-C CA-TAC -5 + 10 + ? + 10 + ? + 10 = ? GAAT-C d=-4 CA-TAC -5 + 10 + -4 + 10 + -4 + 10 = 17 G--AATC d=-4 CATA--C e=-1 -5 + -4 + -1 + 10 + -4 + -1 + 10 = 5 DNA Matching Basics (cnt’d) Scoring aligned bases Scoring gaps Linear gap penalty: every gap receives a score of d Affine gap penalty: opening a gap receives a score of d; extending a gap receives a score of e
20
201017 / MAPLD2005El-Araby Read sequences A & B Into two arrays Set traceback & Similarity matrix to (A+1) * (B+1) 1’s row & column of Similarity Matrix = 0 Initialize traceback Arrays by setting to -1 (default value) Compute Similarity Matrix [i] [j] Update traceback Array Traceback for best alignments NOTE: Traceback array carries the coordinates of one of three cells involved in the calculation of the cell [i] [j] in the similarity matrix no A A yes Similarity Matrix Complete? Approximate String Matching Algorithm (Smith-Waterman Algorithm)
21
211017 / MAPLD2005El-Araby Outline Introduction SRC Hardware & Software Cray XD1 Hardware & Software String Matching Algorithms Implementation Methodology Results and Comparisons Conclusions
22
221017 / MAPLD2005El-Araby Software Only Implementation Software/Hardware Implementation Hardware Only Implementation C function for P C function for MAP VHDL Macro P System FPGA System Implementation Schemes in SRC
23
231017 / MAPLD2005El-Araby Operational Environment Operational Scenarios for Cray XD1 µP-Initiated Transfers FPGA-Initiated Transfers Write-Only Transfers
24
241017 / MAPLD2005El-Araby Outline Introduction SRC Hardware & Software Cray XD1 Hardware & Software String Matching Algorithms Implementation Methodology Results and Comparisons Conclusions
25
251017 / MAPLD2005El-Araby Performance Results Rate = (FPGA freq.) X (cycles/cell) X (# SWPEs) Opteron Implementation (SSEARCH34) * 100 Million Cell Updates Per Second (CUPS) Cray Inc. Implementation * Current unoptimized design 80 MHz X 1 X 32 = 2.56 Billion CUPS (GCUPS) With optimization 100 MHZ x 1 x 50 = 5.0 GCUPS With future Virtex 4 FPGA 100 MHZ x 1 x 150 = 15 GCUPS 25x speedup vs. Opteron Our Implementation SRC-6 Current unoptimized design » 100 MHz X 1 X (16x16) = 25.6 GCUPS 10x speedup vs. Cray 256x speedup vs. Opteron Cray XD1 Current unoptimized design » 200 MHz X 1 X (16x16) = 51.2 GCUPS 20x speedup vs. Cray 512x speedup vs. Opteron * CUG’05, New Mexico, May 2005
26
261017 / MAPLD2005El-Araby Conclusions Smith-Waterman sequence alignment algorithm has been implemented on both SRC-6 and Cray XD1 systems Similarities and differences are highlighted with regard to: System hardware architecture Ease of programming Programming model Development time Hardware/software libraries Performance The speed-up vs. microprocessor is reported Primary bottlenecks limiting the performance of both systems are recognized The capability to share and port applications between the SRC and Cray systems is explored
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.