IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference On DATE, pp.93-98, 2010 Presenter: Ye-Zhi Chen Date: 2011/8/24 1
Introduction Internet traffic will grow at an accelerated rate, routers, therefore, have to deliver increasing processing capacity accordingly. Throughput and Programmability Hardware - Higher Throughput, but Lower Programmability Software – Higher Programmability, but Lower Throughput Netowrk Processors(NPs) – The lack of mature programming models and software development tools and incompatibility of architectures GPU – With high-performance computing and its programming is accessible with CUDA. 2
CUDA CUDA Program – composed of codes running on both CPU and GPU. Kernel - The function called by CPU but executed on GPU Block – threads inside a block could exchange data through the shared memory and synchronize with one another. 3
Network Intrusion Detection Signature matching - checks if network payloads contain pre-supplied signatures to at line rates. 60% Two Algorithms Bloom filter – 1. hash table 2. space-efficient data structure 3. Errors – hash conflicts Aho-Corisick (AC) – 1. DFA 4
Implementation 5 Input
Implementation Transfer packet : Individually : simplest way but time-consuming Batch : batch many small transfers into a larger one. Paged-locked memory :be mapped into the address space of the device Store Bloom vector and transition table in GPU’s texture memory Divide each packet into smaller chunks, and every two neighboring chunks have an overlapped content with a length equal to the largest match texts. 6
Rounting Table Lookup Longest prefix match(LPM) Radix tree Portable Routing Table trie 7
Result 8 CPU : 0.6 Gbit/s GPU : Kernel only : 19 Gbit/s Transfer : 3.4 Gbit/s Paged-locked : 17 Gbit/s
Result 9 CPU : 0.6 Gbit/s GPU : Kernel only : 3.6 Gbit/s Transfer : 2.3 Gbit/s Paged-locked : 3.2 Gbit/s
Result 10 The GPU performance of DFA improves rapidly and approaches a peak throughput of 9.2Gbit/s, which is more than 15 times faster than CPU
Result 11