Download presentation
Presentation is loading. Please wait.
1
1 High-performance TCAM- based IP Lookup Engines Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date: 2008/9/24
2
2 Outline Introduction Previous works MSMB scheme MSMB-PT scheme MSMB-LPT scheme Goals of this paper Proposed works M-MSMB-LPT scheme MSMB-LPT-I scheme Experimental results
3
3 Introduction (1/3) To achieve high IP lookup performance, it has been proposed to use TCAMs to implement IP-Lookup accelerators. One TCAM-based routing table is shared by multiple packet streams in one line card or multiple line cards in practice. Previous works on reconfiguring a TCAM into several independent blocks. MSMB MSMB – PT MSMB – LPT
4
4 Introduction (2/3) MSMB (Multi – Selector and Multi – Block) scheme Proposed in [6] to reconfiguring a TCAM into several independent blocks so that parallel IP lookup is possible. With K TCAMs, instead of performing only one lookup in each cycle, all TCAMs can concurrently be used for different lookups. One would need M parallel RDs for the this system.
5
5 Introduction (3/3) MSMB – PT (Popular – prefix table) scheme This scheme is based on temporal locality of packet destinations. In order to alleviate the TCAM contention problem caused by traffic bias. Popular-Prefix Table (PT) : caching some of the prefixes recently used by all inputs.
6
6 MSMB – LPT (Local PT) (1/2) A flow is a stream of packets, for which the packets are transmitted as a bursty sequence. For a given router R, the packets of flows arrive at same input of R exhibit bias of IP streams to a small set of IP prefixes. For any bursty traffic period of an input of R, the bias of IP addresses is called the temporal locality of flows. The major difference between MSMB – LPT and MSMB – PT are as follows MSMB – LPT improve the performance of MSMB – PT by up to 250%(speedup), 80%(hit ratio), 82%(TCAM contention), and 71%(TCAM power consumption). LPT helps to reduce the number of accesses to the TCAM blocks and TCAM contentions. MSMB-PTMSMB-LPT Capture temporal locality global to all input. Capture temporal locality of flow
7
7 MSMB – LPT (Local PT) (2/2) Local Popular-Prefix Table (LPT) : it used to dynamically store recently referenced IP prefixes requested from input i. Contention Resolver (CR) : chooses one request according to a priority scheme and passes it to TCAM.
8
8 Goals of this paper How to design a TCAM-based IP lookup engine that improves MSMB-LPT without using more HW resources ? satisfy given performance requirements ? For lage m (inputs) How to design a scalable TCAM-based IP lookup engine ? How to find tradeoffs among cost, performance and reliability ?
9
9 Proposed work (1/5) Definitions: MSMB – LPT has a configuration with (m, n, k) m input k TCAM blocks LPT of size n Total number of prefixes M (each block contains M/k prefixes). The parameters m and k are carefully selected to achieve optimized cost and performance. Are there better MSMB schemes for given m and k ? Two proposed schemes: M – MSMB – LPT MSMB – LPT – I
10
10 Proposed work (2/5) Multiple(M) – MSMB – LPT For large m (input), we propose to use w identical copies of MSMB – LPT of configuration (m’, n, k). input i*m ’ + j as the j-th input of the (i+1)-th MSMB-LPT. m ’ = m / w
11
11 Proposed work (3/5) Multiple(M) – MSMB – LPT The w TCAM clocks TCAM j,u,have the same content as TCAM u in MSMB-LPT, where j = 1 ~ w. We say that an M-MSMB-LPT has configuration (m, n, w, k). if it has w MSMB-LPT s of configuration (m ’, n, k). In an M-MSMB-LPT scheme, w MSMB-LPT s operate completely independently. MSMB - LPT j Input (j-1)*m ’ + 1 Input (j-1)*m ’ + 2 Input j*m ’ k CR s and k TCAM blocks …
12
12 Proposed work (4/5) MSMB – LPT – Interleaved TCAMs (MSMB – LPT – I) An MSMB – LPT – I of configuration (m, n, w, k) has m input, and the LPT of size n. wk TCAM blocks that are partitioned into k groups, each called TCAM bundle. Input 1 Input 2 Input m The w TCAM blocks in the j-th TCAM bundle contain the same content as that of TCAM j in the MSMB-LPT scheme. k bundles
13
13 Proposed work (5/5) Process runs concurrently i = 1~ m j = 1~ k n i – th key from input i The concurrent TCAM – search processes are coordinated by CR, which can be implemented as a round robin m – to – w selector.
14
14 Experimental results (1/9) We conduct a serious simulations on M-MSMB-LPT and MSMB-LPT-I. First – in – first – out (FIFO) replacement policy is used for LPT update. Round – rodin (RR) arbitration is used for TCAM contention resolution. Two packet traces are used in simulations. 1. generating accroding to routing table described in [17]. 2. derived from actual packet flows given in [19]. The performance of an M-MSMB-LPT is determined by a single component MSMB-LPT. The performance of MSMB-LPT and M-MSMB-LPT can be derived from the performance of MSMB-LPT-I with configurations (m, n, w, k) as follows. (m, n, 1, k) = MSMB-LPT with (m, n, k). (m, n, 1, k) = M-MSMB-LPT with (w*m, n, w, k). Example: MSMB-LPT-I with (6, n, 1, 4) can be used to indicate the performance of M- MSMB-LPT with (12, n, 2, 4) as well as (18, n, 3, 4) # bundles # blocks
15
15 Experimental results (2/9) Performance metrics TCAM contention ratio Speedup over naïve MSMB TCAM utilization # contentions at TCAM blocks Total # key search time. Total # parallel cycles to complete IP lookup for all packets in a trace. A MSMB-LPT-I(j) : total # cycles in which TCAM j blocks is searched.
16
16 Experimental results (3/9) Power consumption
17
17 Experimental results (4/9) Speedup 48 TCAM blocks 16 TCAM blocks
18
18 Experimental results (5/9) Power consumption
19
19 Experimental results (6/9) Contention ratio 36 inputs and 4 TCAM blocks in each bundle. Increase the number of TCAM bundles. From 1 to 2 From 4 to 6 1 2 3 4 (36, n, w, 4) w = 1, 2, 4, 6
20
20 Experimental results (7/9) Given the available TCAM resource such as # TCAM bundles – 2 # TCAM blocks in each bundle – 4 It is important to know the expected contention ratio under different inputs. (m, n, 2, 4) m = 6, 12, 18, 36 6 12 18 36
21
21 Experimental results (8/9) Speedup gain of increasing the TCAM bundle for a given # inputs. (36, n, w, 4) w = 1, 2, 4, 6 1 2 4 6
22
22 Experimental results (9/9) The speedup changes with the number of inputs. (m, n, 2, 4) m = 6, 12, 18, 36
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.