Basic Data Structures for IP lookups and Packet Classification

Slides:



Advertisements
Similar presentations
Router/Classifier/Firewall Tables Set of rules—(F,A)  F is a filter Source and destination addresses. Port number and protocol. Time of day.  A is an.
Advertisements

Internet Routers
1 IP-Lookup and Packet Classification Advanced Algorithms & Data Structures Lecture Theme 08 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Min Chen School of Computer Science and Engineering Seoul National University Data Structure: Chapter 9.
Chapter 4: Trees Part II - AVL Tree
Skip List & Hashing CSE, POSTECH.
A Fast and Memory Efficient Dynamic IP Lookup Algorithm Based on B-Tree Author:Yeim-Kuan Chang and Yung-Chieh Lin Publisher: 2009 International Conference.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
Digital Search Trees & Binary Tries Analog of radix sort to searching. Keys are binary bit strings.  Fixed length – 0110, 0010, 1010,  Variable.
IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.
BTrees & Bitmap Indexes
Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees,
Digital Search Trees & Binary Tries Analog of radix sort to searching. Keys are binary bit strings.  Fixed length – 0110, 0010, 1010,  Variable.
Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.
1 Geometric Solutions for the IP-Lookup and Packet Classification Problem (Lecture 12: The IP-LookUp & Packet Classification Problem, Part II) Advanced.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
張 燕 光 資訊工程學系 Dept. of Computer Science & Information Engineering,
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
Binary Search Trees CSE, POSTECH. Search Trees Search trees are ideal for implementing dictionaries – Similar or better performance than skip lists and.
Existing Range Encoding Schemes Presenter: Kai-Yang, Liu Date: 2011/11/23.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
IP Address Lookup Masoud Sabaei Assistant professor
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
B + -Trees Same structure as B-trees. Dictionary pairs are in leaves only. Leaves form a doubly-linked list. Remaining nodes have following structure:
Multi-Field Range Encoding for Packet Classification in TCAM Author: Yeim-Kuan Chang, Chun-I Lee and Cheng-Chien Su Publisher: INFOCOM 2011 Presenter:
1 Binary Trees Informal defn: each node has 0, 1, or 2 children Informal defn: each node has 0, 1, or 2 children Formal defn: a binary tree is a structure.
Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
Segment Trees Basic data structure in computational geometry. Computational geometry.  Computations with geometric objects.  Points in 1-, 2-, 3-, d-space.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
1 Basic Data Structures for IP lookups and Packet Classification.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
DESIGN AND IMPLEMENTATION OF DYNAMIC ROUTING TABLES Author: Yeim-Kuan Chang and Dung-Jiun Lin Publisher/Conf.: The graduation paper of CIAL master student,
Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
1 LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups 張 燕 光 資訊工程學系 Dept. of Computer Science & Information Engineering,
B/B+ Trees 4.7.
IP Routers – internal view
AN ON-CHIP IP ADDRESS LOOKUP ALGORITHM
The Variable-Increment Counting Bloom Filter
Multiway range trees: scalable IP lookup with fast updates
Week 11 - Friday CS221.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Digital Search Trees & Binary Tries
Transport Layer Systems Packet Classification
Segment Trees Basic data structure in computational geometry.
Indexing and Hashing Basic Concepts Ordered Indices
Digital Search Trees & Binary Tries
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Hash Functions for Network Applications (II)
Presentation transcript:

Basic Data Structures for IP lookups and Packet Classification

Prefix Length format: bn-1…b0/l (l is prefix length) In IPv4, d3.d2.d1.d0/l can also be used. Mask format: bn-1…b0/mn-1…m0 (prefix length is l) mj = 1 for all n – 1  j  n – l+1, and mj =0 otherwise. d3.d2.d1.d0/ m3.m2.m1.m0 for IPv4. Ternary format: bn-1…bn-l+1*…* (prefix length is l) bj = 0 or 1 for n – 1  j  n – l + 1. If tk is *, then tj must also be * for all j < k. A single don’t care bit can be used to denote a series of don’t care bits, e.g., 1* denotes 1**** in the 5-bit address space.

Prefix (n+1)-bit format: bn-1…bn-l+110…0 (l is prefix len) or for the prefix bn-1…bn-l+1* of length l in ternary format, there is one trailing ‘1’ followed by n – l 0’s. or (n+1)-bit format: bn-1…bn-l+101…1 for the prefix bn-1…bn-l+1* of length l in ternary format, there is one trailing ‘0’ followed by n – l 1’s.

5-bit Prefixes: bn-1…bn-l+110…0 ***** 0**** 00*** 11*** 1 * * * 1 * 1 * 1 * 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6-bit binary address space 000000 is not used

5-bit Prefixes: bn-1…bn-l+101…1 ***** 0**** 00*** 11*** 1 * * * 1 * 1 * 1 * 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6-bit binary address space 111111 is not used

Prefix properties Disjoint prefixes: Prefix enclosure: Two prefixes are said to be disjoint if they do not share any address. Prefix enclosure: A = bn-1…bj…bi* and B = bn-1…bj* and j > i. Prefix A is enclosed by B (B  A) since the IP address space covered by A is a subset of that covered by B, where  is the enclosure operator. A special case of overlapping. Prefix comparison The inequality 0 < * < 1 is used to compare two prefixes in the ternary representation of prefixes.

Prefix properties The most specific prefixes (MSP): The prefixes that do not cover any others. Disjoint, so can be put in an array for binary search Grouping prefixes in layers based on MSP. Six layers at most for IPv4 tables 1 2 3 4 5

Prefix properties Database (year-month) AS6447 (2000-4) (2002-4) (2005-4) number of prefixes 79,530 124,798 163,535 Level-1 prefixes 73,891(92.9%) 114,745 (91.9%) 150,245 (91.9%) Level-2 prefixes 4,874 (6.1%) 8,496 (6.8%) 11,135 (6.8%) Level-3 prefixes 642 (0.8%) 1,290 (1%) 1,775 (1.1%) Level-4 prefixes 104 (0.1%) 235 (0.2%) 329 (0.2%) Level-5 prefixes 17 29 45 Level-6 prefixes 2 3 6

Prefix properties Number Prefix length

Prefix Forwarding table example Prefix Next-hop P1 111* H1 P2 10* H2 1010* H3 P4 10101 H4 P1 is disjoint from the other three prefixes. P2  P3  P4 Longest prefix match(LPM), not exact match enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult

Example Forwarding Table Prefix Next-hop P1 111* H1 P2 10* H2 P3 1010* H3 P4 10101 H4 Longest prefix match(LPM), not exact match Prefix enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult. So, trie based schemes emerge naturally

Binary Trie (Radix Trie) Trie node Lookup 10111 A next-hop-ptr (if prefix) 1 B left-ptr right-ptr P1 111* H1 P2 10* H2 P3 1010* H3 P4 10101 H4 1 C Add P5=1110* I P5 D P2 1 1 F E P1 G P3 1 H P4

Binomial spanning tree 1110 1111 1100 2 1 0000 3 1000 0000 3 1000 2 1100 1 1110 1111 A 4-cube and its corresponding binomial spanning tree.

Perfect code: Hamming code (7, 4) 7-cube example: 0000000 1000000 0100000 0010000 0001000 0000100 0000010 0000001 = 7-cube 24(16) one-level binomial spanning trees

Perfect code: Hamming code (7, 4) 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 H7 = G7 = 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 1 1 1 1 (a) Parity-check and generator matrices of Hamming code (7, 4). Syndrome ErrorPattern Inner product Transpose 000 0000-000 001 0000-001 010 0000-010 011 0010-000 100 0000-100 101 0100-000 110 1000-000 111 0001-000 r = received code Syndrome s = (s2 s1 s0) = r.H7T Corrected code = r + ErrorPattern[s] (c) Decoding table

Perfect code: Hamming code (7, 4) u Codeword 0000 0000-000 0001 0001-111 0010 0010-011 0011 0011-100 0100 0100-101 0101 0101-010 0110 0110-110 0111 0111-001 1000 1000-110 1001 1001-001 1010 1010-101 1011 1011-010 1100 1100-011 1101 1101-100 1110 1110-000 1111 1111-111 Generate 16 Codewords u.G7 16 codewords

Perfect code: Golay code (23, 12) 212 3-level binomial spanning trees C(23,0)+C(23, 1)+C(23,2)+C(23,3) = 1 + 23 + 23*22/2 +3*22*21/(3*2) = 24 + 23*11 + 23*11*7 = 24 + 253*8 = 24 + 2024 = 2048 = 211

Ranges Why ranges? Prefixes are special cases of ranges. Prefixes can also be represented by ranges. The source/destination port fields of rule tables for packet classification are ranges. Prefixes are special cases of ranges. Prefix bn-1…bn-l+1* of length l is the range of addresses from bn-1…bn-l+10…0 to bn-1…bn-l+11…0, denoted as [bn-1…bn-l+10…0, bn-1…bn-l+11…0]. Overlapping: Two ranges are overlapping if they are not disjoint. Partially overlapping: Two ranges are partially overlapping if they are neither disjoint nor enclosing.

Elementary Intervals for Ranges Definition: Let the set of k elementary intervals constructed from a set R of ranges in the address space of 0 … N – 1 be X = {Xi | Xi = [ei, fi], for i = 1 to k}. X must satisfy the following: e1 = 0 and fk = N – 1, fi = ei+1 – 1 for i = 1 to k – 1, all addresses in Xi are covered by the same subset of R (called the range matching set of Xi) denoted by EIi, and EIi  EIi+1, for i = 1 to k – 1.

Elementary Intervals for Ranges ID Prefix Range Minus-1 Traditional start finish start finish P1 000000/2 [0, 15] - 15 0 15 P2 010000/2 [16, 31] 15 31 16 31 P3 000100/4 [4, 7] 3 7 4 7 P4 100000/1 [32, 63] 31 - 32 63 P5 010110/5 [22, 23] 21 23 22 23 P6 110000/2 [48, 63] 47 - 48 63 P7 110000/4 [48, 51] 47 51 48 51 P8 110111/6 [55, 55] 54 55 55 55 P9 100000/3 [32, 39] 31 39 32 39

Elementary Intervals for Ranges Graphical view EI1 {P1} X1 [0, 3] EI2 {P1,P3} X2 [4, 7] EI3 {P1} X3 [8, 15] EI4 {P2} X4 [16, 21] EI5 {P2,P5} X5 [22, 23] EI6 {P2} X6 [24, 31] EI7 {P4,P9} X7 [32, 39] EI8 {P4} X8 [40, 47] EI9 {P4,P6,P7} X9 [48, 51] EI10 {P4,P6} X10 [52, 54] EI11 {P4,P6,P8} X11 [55, 55] EI12 {P4,P6} X12 [56, 63]

Segment Tree leaf node w 23 y z 7 47 P1 u v q g 3 15 31 54 P3 P1 P2 P2 X1 [0,3] X2 [4,7] X3 [8,15] P2 h X6 [24,31] P4 r s t 21 39 51 55 leaf node P5 P9 P7 P8 X4 [16,21] X5 [22,23] X7 [32,39] X8 [40,47] X9 [48,51] X10 [52,54] X11 [55,55] X12 [56,63]

Interval Tree Each node in an interval tree is associated with a key which must be covered by at least one range. Depending on whether a node can store 1 or 1+ range, fat interval tree each node is allowed to store more than one range. The number of nodes in the interval tree is O(N). To insert a range R = [e, f], if R covers root’s key, R is stored in the root. Otherwise, R is inserted in the left (right) subtree of the root when f is smaller (e is larger) than the key of the root. When R does not cover the key of any node which is traversed, a new node with the key selected from addresses e to f is created and inserted as the left or right child of the node which was last visited. O(logN + k) time, k is # of prefixes that match the given address. Prefix insertion and deletion are very expensive because ranges in some nodes may need relocations after tree rotations.

Interval Tree thin interval tree: each node of the interval tree stores exactly one range. Since ranges may overlap, two comparison rules are used to compare if a range is smaller or larger than another range. For two ranges R1 = [e1, f1] and R2 = [e2, f2], R1 < R2 if e1 < e2. If tie, the second rule applies. R1 < R2 if R2 is a subrange of R1 (i.e. e1 = e2 and f2 < f1). Also, a node stores a max value, Max(the finish endpoints of all ranges) stored in the subtree rooted at that node. In contrast with the fat interval tree, prefix insertion and deletion take O(logN) time. However, O(min{N, klogN}) time is needed to find the longest matching prefix as well as the highest-priority matching prefix, where k is the number of matched prefixes for a given address.

Hash Table Narrowing down the search space. Index = Hash_function(key)%m, where key may be the first k bits of IP addresses and m is the size of the hash table. Perfect hash: no collision Minimal perfect hash: A perfect hash, where the size of its hash table is k for k different hashing keys.

Hash Table Difficulties: prefixes and ranges can not be used as the keys of the hash functions directly. Array of m elements H(k1)%m k1 k2 H(k2)%m collision

Hash Table: 8-bit Segmentation table A 8-bit segmentation table is usually used for IPv4 forwarding tables because there is no prefix of length shorter than 8. Array of 256 elements Prefix: 0.x.y.z H(prefix)%256 (MSB 8 bits of prefix) 1 Prefixes with the same first 8 MSB bits Maybe empty set 255

Hash Table: 16-bit Segmentation table Prefixes of length <= 16 must be stored properly. For example, duplicate 0.0.b.c/15 into buckets 0 and 1 or store the port of 0.0.b.c/15 into elements 0 and 1. Put them into another set (good for update but need to search two sets in the worst case). Array of 216 elements Prefix: 0.0.y.z H(prefix)%216 (MSB 16 bits of prefix) 1 Prefixes with the same first 16 MSB bits Maybe empty set 216-1 Prefixes of length  16

Hash Table: Compression Since there are many empty elements in the segmentation table, we can use bitmap to compress the segmentation table. 216-Bitmap containing M 1’s Array of M elements Prefix: 0.0.y.z 1 . 1 Prefix: 0.1.y.z Prefixes with the same first 16 MSB bits Must be non-empty M-1

Bloom filter H1(key) = P1 H2(key) = P2 H3(key) = P3 H4(key) = P4 … Hk(key) = Pk Hi() is a hash function, e.g. MD5 Bit vector of m bits 1 1 m bits 1 1

Bloom filter After inserting n keys (kn bits), the probability that a particular bit is still 0 is (1-1/m)kn So, the probability of a false positive is p for the right-hand side is minimized when k = ln2m/n m/n = 6, k = 4: p = 0.0561 m/n = 8, k = 6: p = 0.0215 m/n=12, k = 8: p =0.00314 m/n=16, k=11: p =0.000458

Bloom filter Update: Update whole SC Threshold: when the digests differ beyond a threshold, say, 5% or 10%, Regular time intervals: every say 5 mins,

Counting Bloom filter Deletion operation for local digest: For each bit in the m-bit vector, use an l-bit counter to record the number of times that a particular bit is turned on by different URLs l = 4 by experience If deletion is not supported, cache summary must be rebuilt from scratch on a periodic basis to erase stale bits and prevent bit pollution