Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar.

Slides:

Advertisements

Similar presentations

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta

Advertisements

Router/Classifier/Firewall Tables Set of rules—(F,A)  F is a filter Source and destination addresses. Port number and protocol. Time of day.  A is an.

Internet Routers

1 IP-Lookup and Packet Classification Advanced Algorithms & Data Structures Lecture Theme 08 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.

1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.

M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.

IP Routing Lookups Scalable High Speed IP Routing Lookups.

Hybrid Data Structure for IP Lookup in Virtual Routers Using FPGAs Authors: Oĝuzhan Erdem, Hoang Le, Viktor K. Prasanna, Cüneyt F. Bazlamaçcı Publisher:

A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.

Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

An Efficient IP Address Lookup Algorithm Using a Priority Trie Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Mar. 11, 2008.

Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.

1 A Novel Scalable IPv6 Lookup Scheme Using Compressed Pipelined Tries Author: Michel Hanna, Sangyeun Cho, and Rami Melhem Publisher: NETWORKING 2011 Presenter:

Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.

Parallel-Search Trie-based Scheme for Fast IP Lookup

1 Geometric Solutions for the IP-Lookup and Packet Classification Problem (Lecture 12: The IP-LookUp & Packet Classification Problem, Part II) Advanced.

Study of IP address lookup Schemes

1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.

1 HEXA: Compact Data Structures or Faster Packet Processing Author: Sailesh Kumar, Jonathan Turner, Patrick Crowley, Michael Mitzenmacher. Publisher: ICNP.

EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;

Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

Address Lookup in IP Routers. 2 Routing Table Lookup Routing Decision Forwarding Decision Forwarding Decision Routing Table Routing Table Routing Table.

Binary Trees Chapter 6.

UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

IP Address Lookup Masoud Sabaei Assistant professor

Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.

Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.

Advance Computer Networking L-8 Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan.

Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.

CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis.

Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.

Balajee Vamanan and T. N. Vijaykumar School of Electrical & Computer Engineering CoNEXT 2011.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”

CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.

1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.

Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height

Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.

A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.

Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.

8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.

Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.

IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.

Higher Order Tries Key = Social Security Number.

IP Routers – internal view

HEXA: Compact Data Structures for Faster Packet Processing

Advance Computer Networking

Implementing tries in RAM

Packet Classification Using Coarse-Grained Tuple Spaces

Higher Order Tries Key = Social Security Number.

A Small and Fast IP Forwarding Table Using Hashing

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

A Trie Merging Approach with Incremental Updates for Virtual Routers

Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Presentation transcript:

Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar

2 - Sailesh Kumar - 2/22/2016 A Simple router IP LookupCrossbar : Arriving Packets VOQs Routing table contains prefix, dest. pairs IP-lookup finds dest. with longest matching prefix At OC768, IP lookup needs to be carried out in 2 ns, can become a bottleneck

3 - Sailesh Kumar - 2/22/2016 This Paper’s Contribution n This paper presents an IP lookup ASIC architecture which addresses following 5 scalability challenges »Memory size - grow slowly with #prefixes »Lookup throughput – line rate »Implementation cost - complexity, chip area, etc »Power dissipation - grow slowly with #prefixes and line rate »Routing table update cost – O(1) n No existing lookup architecture effectively addresses all 5 challenges!

4 - Sailesh Kumar - 2/22/2016 Previous work n Several IP lookup schemes proposed n Memory access time > packet inter-arrival time »Must use pipelining n Several papers have proposed using pipelining SpaceThroughputUpdatesPowerArea TCAMs Yes HLP [Varghese et al – ISCA’03] Yes DLP [Basu, Narlikar - Infocom’05] Yes This paper Yes

5 - Sailesh Kumar - 2/22/2016 IP Address Lookup n Routing tables at router input ports contain (prefix, next hop) pairs n Address in packet is compared to stored prefixes, starting at left. n Prefix that matches largest number of address bits is desired match. n Packet is forwarded to the specified next hop. 01*5 110*3 1011*5 0001*0 10* * * * * * * * * * *6 prefix next hop routing table address: Taken from CSE 577 Lecture Notes

6 - Sailesh Kumar - 2/22/2016 Address Lookup Using Tries n Prefixes stored in “alphabetical order” in tree. n Prefixes “spelled” out by following path from top. »green dots mark prefix ends n To find best prefix, spell out address in tree. n Last green dot marks longest matching prefix. address:

7 - Sailesh Kumar - 2/22/2016 P2 P3 Leaf Pushing P *P2 101*P3 0*P1 prefix next hop routing table P2 Every Internal node might need to store the next hop information Leaf Pushing avoids using longest prefix matching, also reduces the node size with proper encoding Leaf Pushing, push P2 to all leaves Complicates the updates, as all leaves needs to be updated

8 - Sailesh Kumar - 2/22/2016 Multibit Trie n Match several bits in one step instead of single bit. »equivalent to turning sub-trees of binary trie into single nodes. n Each node may be associated with several prefixes. n For stride of s, reduces tree depth by factor of s. address: , *010,001, * 1,10

9 - Sailesh Kumar - 2/22/2016 Controlled Prefix Expansion 1*P2 101*P3 0*P1 prefix next hop routing table 01*P1 10*P2 00*P1 1010*P3 1011*P3 11*P2 Stride 2, multibit trie P3 P2 P3 P P P2 P1 Controlled prefix expansion to align the stride boundaries In worst-case, controlled prefix expansion causes non- deterministic increases in the routing table size There are schemes, which uses variable strides to improve average case, but worst-case remains the same

10 - Sailesh Kumar - 2/22/2016 Need for Pipelined Tries n Tomorrow’s routers will run at 160 Gbps, 2 ns per packet n At most one memory access / 2 ns (may be less) n Moreover there may be millions of prefixes n In worst-case, memory requirements will be very high »Memory will be slower n Needs an architecture which »Uses multiple smaller memories »Accesses them in a pipelined manner

11 - Sailesh Kumar - 2/22/2016 Pipelined Trie-based IP-lookup Each level in different stage → overlap multiple packets Tree data-structure, prefixes in leaves (leaf pushing) Process IP address level-by-level to find the longest match P4 = 10010* P1P2 P4 P3 P5 1 P6 P7

12 - Sailesh Kumar - 2/22/2016 Closest Previous Work Maps trie level to stage but this is a static mapping Updates change prefix distribution but mapping persists In worst-case any stage can have all prefixes Large worst-case memory for each stage 0* 00* 000*.. P1 P2 P3.. P1 P3 P2 X Data Structure Level Pipelining (DLP) - level to stage mapping No bound on worst-case update → Could be O(1) using Tree Bitmap But constant huge, 1852 memory accesses per update [SIGCOMM Comm Review ’04] Figure taken from Hasan et al.

13 - Sailesh Kumar - 2/22/2016 Memory bound per stage Figure below, shows the worst case prefix distribution There are 1 million prefixes, each of length 32-bits In this case n Largest stage will be 5 MB. n Total memory size will be 80 MB as opposed to 6 MB of the total prefix size Figure taken from Hasan et al. Moreover, a 5 MB memory can’t be accessed faster than 6 ns or so

14 - Sailesh Kumar - 2/22/2016 Hardware Level Pipelining - HLP n HLP pipelines the memory accesses at hardware level n Multiple words of memory are read together in a pipelined manner »Throughput only limited by the memory array access time Such memories can improve the IP lookup throughput Figure taken from Sherwood et al. As such not scalable as higher degree of pipelining leads to a prohibitive chip area and power dissipation

15 - Sailesh Kumar - 2/22/2016 Key Idea n HLP doesn’t scale well in chip area and power n DLP scales well in power but doesn’t scale well in »Memory size (due to static level to stage mapping) »Throughput, as one stage can’t go faster than 6 ns n Combine these two (SDP) »Use a DLP, but with a better mapping so that each stage is smaller »Use HLP at every stage to accelerate it further

16 - Sailesh Kumar - 2/22/2016 Key Idea: Use Dynamic Mapping Map node height to stage (instead of level to stage) Height changes with updates, captures distribution of prefixes below Hence the name dynamic mapping 0* 00* 000*.. P1 P2 P3.. P1 P3 P2 X However, the worst-case memory requirements will remain the same, i.e. when all prefixes are 32-bit long Figure taken from Hasan et al.

17 - Sailesh Kumar - 2/22/2016 Key Idea: Use Jump Nodes Use Jump nodes so that the worst-case memory requirements can be reduced Also restores the relation between height and distribution However, one can argue that jump nodes will reduce the memory requirements of SDP too, NO we will soon see why! Figure taken from Hasan et al. Jump * 1010*.. P4 P5.. P5 X P4 P5 X

18 - Sailesh Kumar - 2/22/2016 Another example of Jump nodes Leaf Pushing => Jump 100 Jump 11 Note that this trie will need more than one node operation for table updates, different from what the paper CLAIMS! Adding Jump nodes =>

19 - Sailesh Kumar - 2/22/2016 Tries with jump nodes Key properties (1) Number of leaves = number of prefixes No replication Avoids inflation of prefix expansion, leaf-pushing (2) Updates do not propagate to subtrees No replication (3) Each internal node has 2 children Jump nodes collapse away single-child nodes

20 - Sailesh Kumar - 2/22/2016 Total versus Per-Stage Memory Jump-nodes bound total size by 2N Would DLP+Jump nodes → small per-stage memory? log 2 N W - log 2 N N No, DLP is still static mapping → large worst-case per-stage Total bounded but not per-stage Figure taken from Hasan et al.

21 - Sailesh Kumar - 2/22/2016 SDP’s Per-Stage Memory Bound Proposition: Map all nodes of height h to (W-h) th pipeline stage Result: Size of k th stage = min( N / (W-k), 2 k )

22 - Sailesh Kumar - 2/22/2016 Key Observation #1 A node of height h has at least h prefixes in its subtree At least one path of length h to some leaf h -1 nodes along path Each node leads to at least 1 leaf Path has h -1+1 leaves = h prefixes h Figure taken from Hasan et al.

23 - Sailesh Kumar - 2/22/2016 Key Observation #2 No more than N / h nodes of height h for any prefix distribution Assume more than N / h nodes of height h Each accounts for at least h prefixes (obs #1) Total prefixes would exceed N By contradiction, obs #2 is true

24 - Sailesh Kumar - 2/22/2016 Main Result of the Proposition Map all nodes of height h to (W-h)th pipeline stage K-th stage has only N / (W-k) nodes from obs #2 1-bit trie has binary fanout → at most 2 k nodes in k-th stage Size of k-th stage = min( N / (W-k), 2 k ) nodes Results in ~20 MB for 1 million prefix 4x better than DLP Static pipelining (DLP) Dynamic pipelining (SDP) Figure taken from Hasan et al.

25 - Sailesh Kumar - 2/22/2016 Optimum Incremental Updates 1 update → change height and stage of many nodes Must migrate all affected nodes → inefficient update? Each ancestor in different stage = 1 node-write in each stage = 1 write bubble for any update update Updating SDP not just O(1) but exactly 1 Not many nodes needs to be moved as only ancestors’ heights can be affected Figure taken from Hasan et al.

26 - Sailesh Kumar - 2/22/2016 Incremental Updates Pipe 0Pipe 1Pipe 2Pipe 3Pipe 4Pipe

27 - Sailesh Kumar - 2/22/2016 Incremental Updates Pipe 0Pipe 1Pipe 2Pipe 3Pipe 4Pipe , Jump 74 The implementation complexity may be pretty high, cos on the fly you might need to compute the jump nodes (e.g. for 7)

28 - Sailesh Kumar - 2/22/2016 Efficient Memory Management Tree bit map and segmented hole compaction requires multiple memory accesses for updates Multibit trie with variable stride requires even more complex memory management SDP: No variable striding / compression → all nodes same size No fragmentation/compaction upon updates Memory management is trivial and has zero fragmentation

29 - Sailesh Kumar - 2/22/2016 Scaling SDP for Throughput Each SDP stage can be further pipelined in hardware HLP [ISCA’03] pipelined only in hardware without DLP Too deep at high line-rates Combine HLP + SDP for feasibly deep hardware Throughput matches future line rates Size = N / (W-k) Size = 2 k # of HLP stages Figure taken from Hasan et al.

30 - Sailesh Kumar - 2/22/2016 Experiments Figure taken from Hasan et al.

31 - Sailesh Kumar - 2/22/2016 Experiments Figure taken from Hasan et al.

32 - Sailesh Kumar - 2/22/2016 Experiments Figure taken from Hasan et al.

33 - Sailesh Kumar - 2/22/2016 Discussion / Questions Figure taken from Hasan et al.