Download presentation
Presentation is loading. Please wait.
Published byCraig Brantingham Modified over 10 years ago
1
Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit, Netherlands http://www.liacs.nl/~herbertb/projects/biocomp/ H. Bos – Leiden University 13/02/20041
2
Case study: BLAST ● search nucleotide/protein database for query ● BLAST discovers similarity rather than exact match ● two main phases: 1. scoring (registering where query and DNA DB match) 2. alignment (dynamic programming) ● only the first phase on NPUs H. Bos – Leiden University 13/02/20042
3
Window matching H. Bos – Leiden University 13/02/20043
4
Window matching H. Bos – Leiden University 13/02/20044
5
Window matching H. Bos – Leiden University 13/02/20045
6
Window matching H. Bos – Leiden University 13/02/20046
7
Window matching ● naïve approach: roughly W*N*M comparisons ● does not scale ● string search algorithms: Aho-Corasick – all windows matched at the same time – shifting genome one nucleotide at a time – matching algorithm transformed in a DFA ● DFA may be quite large H. Bos – Leiden University 13/02/20047
8
Aho-Corasick H. Bos – Leiden University 13/02/20048 ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga}
9
Aho-Corasick H. Bos – Leiden University 13/02/20049 0123 456 12 1011 789 t acg c g gc a g cc c s123456789101112 f(s)0450780410451 ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga}
10
Aho-Corasick H. Bos – Leiden University 13/02/200410 0123 456 12 1011 789 t acg c g gc a g cc c ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga} s123456789101112 f(s)0450780410451 3691112 acgcgcgccccgcga
11
Aho-Corasick H. Bos – Leiden University 13/02/200411 0123 456 12 1011 789 t acg c g gc a g cc c ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga} s123456789101112 f(s)0450780410451 3691112 acgcgcgccccgcga tacgcga
12
H. Bos – Leiden University 13/02/200412 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture
13
H. Bos – Leiden University 13/02/200413 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture
14
H. Bos – Leiden University 13/02/200414 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture
15
H. Bos – Leiden University 13/02/200415 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture 0123 456 12 1011 789 t acg c g gc a g cc c
16
H. Bos – Leiden University 13/02/200416 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture 0123 456 12 1011 789 t acg c g gc a g cc c
17
H. Bos – Leiden University 13/02/200417 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture 0123 456 12 1011 789 t acg c g gc a g cc c
18
IXPBlast: packet handling ● packets read and processed in batches of 100.000 ● “spilling” must be taken into account ● currently no feedback H. Bos – Leiden University 13/02/200418 012345678910111213141516171819202122232425262728293031
19
Results ● 232 MHz IXP1200 ~ 1.8GHz Pentium-4 ● 1611 Nucleotide query (MyD88) ● 1.4 GB genome (Zebrafish) – IXP1200: 90 sec with DFA – IXP1200: 129 sec with “trie” – P4: 132: 132 sec with “trie” ● number of matches: 524856 H. Bos – Leiden University 13/02/200419
20
Results H. Bos – Leiden University 13/02/200420 Query size DNA DB size Impl.Performance 16111.4 GBP4132 sec 16111.4 GBIXP1200129 sec 16111.4 GB IXP1200 DFA 90 sec
21
Conclusions ● NPUs are useful in other application domains ● Newer hardware is expected to perform much better ● “Throughput processors” ● Adapting our current approach to use BLAST tricks/heuristics H. Bos – Leiden University 13/02/200421
22
Network processors ● geared for high throughput ● used exclusively in network systems ● example: intrusion detection ● similar to looking for gene on in genomes ● differences H. Bos – Leiden University 13/02/200422 Radisys ixp1200 board
23
Application domain: “Genomics” ● example: search genome for occurrence of “patterns” ● similar problems as IDS, poor performance on GPP cannot exploit parallelism – throughput-driven – how about FPGAs? – how about clusters? ● NPU – easier to program than FPGAs – cheaper than cluster computing – “on the desktop” IP never leaves the room H. Bos – Leiden University 13/02/200423
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.