Balajee Vamanan and T. N. Vijaykumar School of Electrical & Computer Engineering CoNEXT 2011.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Fast Updating Algorithms for TCAMs Devavrat Shah Pankaj Gupta IEEE MICRO, Jan.-Feb
Packet Classification using Hierarchical Intelligent Cuttings
VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.
Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010.
Scalable Packet Classification Using Hybrid and Dynamic Cuttings Authors : Wenjun Li,Xianfeng Li Publisher : Engineering Lab on Intelligent Perception.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
An On-Chip IP Address Lookup Algorithm Author: Xuehong Sun and Yiqiang Q. Zhao Publisher: IEEE TRANSACTIONS ON COMPUTERS, 2005 Presenter: Yu Hao, Tseng.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
HybridCuts: A Scheme Combining Decomposition and Cutting for Packet Classification Author: Wenjun Li, Xianfeng Li Publisher: 2013 IEEE 21 st Annual Symposium.
A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.
Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
Authors: Balajee Vamanan and T. N. Vijaykumar Conf. : ACM CoNEXT 2011 Presenter : JHAO-YAN JIAN Date : 2012/5/9 1 TreeCAM: Decoupling Updates and Lookups.
Router Architecture : Building high-performance routers Ian Pratt
Efficient Multi-match Packet Classification with TCAM Fang Yu Randy H. Katz EECS Department, UC Berkeley {fyu,
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
1 Energy Efficient Multi-match Packet Classification with TCAM Fang Yu
Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.
PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers Author: Tania Banerjee-Mishra, Sartaj Sahni,Gunasekaran Seetharaman Publisher: IEEE Symposium.
Efficient Multi-Match Packet Classification with TCAM Fang Yu
1 Energy Efficient Packet Classification Hardware Accelerator Alan Kennedy, Xiaojun Wang HDL Lab, School of Electronic Engineering, Dublin City University.
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
Two stage packet classification using most specific filter matching and transport level sharing Authors: M.E. Kounavis *,A. Kumar,R. Yavatkar,H. Vin Presenter:
SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Martin Austin Motoyama 1 Randy H. Katz 1 1 EECS.
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
1 EffiCuts : Optimizing Packet Classification for Memory and Throughput Author: Balajee Vamanan, Gwendolyn Voskuilen and T. N. Vijaykumar Publisher: ACM.
EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;
HybridCuts: A Scheme Combining Decomposition and Cutting for Packet Classification Wenjun Li Xianfeng Li School of Electronic and Computer Engineering.
1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
Applied Research Laboratory Edward W. Spitznagel 7 October Packet Classification for Core Routers: Is there an alternative to CAMs? Paper by: Florin.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification using Extended TCAMs Edward W. Spitznagel, Jonathan S. Turner,
1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Boundary Cutting for Packet Classification Author: Hyesook Lim, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Changhoon Yim Publisher: Networking,
Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.
1 Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: IEEE/ACM.
Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Packet Classification Using Dynamically Generated Decision Trees
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
Transport Layer Systems Packet Classification
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.
Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan.
Packet Classification Using Coarse-Grained Tuple Spaces
Scalable Multi-Match Packet Classification Using TCAM and SRAM
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Towards TCAM-based Scalable Virtual Routers
Presentation transcript:

Balajee Vamanan and T. N. Vijaykumar School of Electrical & Computer Engineering CoNEXT 2011

 Packet Classification:  Find highest-priority rule that matches a packet  Packet classification is key for  Security, traffic monitoring/analysis, QoS  Classifier: a set of rules 2 Packet classification prevalent in modern routers Source IPDestination IP Source Port Dest. Port ProtocolAction 120…0/ /20: :170xFF/0xFFAccept 138…1/0174…0/850:100000:655350x06/0xFFDeny

 Line rates increasing (40 Gbps now, 160 Gbps soon)  Classifier size (number of rules) increasing  Custom rules for VPNs, QoS  Rules are getting more dynamic too Larger classifiers at faster lookup & update rates  Much work on lookups, but not on updates 3 Must perform well in lookups and updates at low power

 Two flavors  Virtual interfaces: add/remove 10,000s rules per minute  QoS: update 10s of rules per flow (milliseconds)  For either flavor, update rate (1/ms) << packet rate (1/ns)  Current approaches  Either incur high update effort despite low update rates ▪ Eat up memory bandwidth ▪ Hold-up memory for long ▪  packet drops, buffering complexity, missed deadlines  Or do not address updates  Recent OpenFlow, online classification  faster updates 4 Updates remain a key problem

 TCAM  Unoptimized TCAMs search all rules per lookup  high power  Modern partitioned TCAMs prune search  reduce power ▪ Extended TCAMs [ICNP 2003]  Physically order rules per priority for fast highest-priority match ▪ This ordering fundamentally affects update effort  Updates move many rules to maintain order ▪ E.g., updates 10 per ms; lookups 1 per 10 ns; 100,000 rules ▪ If 10 % updates move (read+write) 10 % rules ▪ Updates need 20,000 ops/ms = 0.2 op per 10 ns ▪  20% bandwidth overhead 5 High-effort updates in TCAM degrade throughput & latency

 Decision Trees:  Build decision trees to prune search per lookup  Do not address updates  No ordering like TCAMs but updates may cause tree imbalance ▪ Imbalance increase lookup accesses ▪ Re-balancing is costly 6 Previous schemes are not good in both lookups and updates

 TreeCAM: Three novel ideas  Dual tree versions to decouple lookups and updates ▪ coarse tree in TCAM  reduce lookup accesses ▪ Tree/TCAM hybrid ▪ fine tree in control memory  reduce update effort  Interleaved layout of leaves to cut ordering effort  Path-by-path updates to avoid hold-up of memory ▪ Allow non-atomic updates interspersed with lookups  Performs well in lookups and updates  6-8 TCAM accesses for lookups  Close to ideal TCAM for updates 7

 Introduction  Background: Decision Trees  Dual tree versions ▪ Coarse Tree ▪ Fine Tree  Updates using Interleaved Layout  Path-by-path updates  Results  Conclusion 8

 Rules are hypercubes in rule space  Builds decision tree by cutting rule space to separate rules into smaller subspaces (child nodes)  Stop when a small number of rules at a leaf called binth (e.g., 16)  Packets traverse tree during classification  Many heuristics  Dimension, number of cuts 9 X Y Root

 Introduction  Background: Decision Trees  Dual tree versions ▪ Coarse Tree ▪ Fine Tree  Updates using Interleaved Layout  Path-by-path updates  Results  Conclusion 10

 Idea: partition rules among TCAM subarrays using decision trees  4k-entry subarrays  coarse tree with each leaf in a subarray  2-deep tree  fast lookup  Packets traverse subarrays  Previous heuristics complicated  We propose simple sort heuristic  Sort rules & cut equally ▪ EffiCuts  min. rule replication 11 X Y Leaf 2 Leaf1 Root Subarray 2 TCAM Root ot Leaf 1 Leaf 2 Subarray 1 Subarray 3 Coarse Trees  Search Pruning (trees) + Parallel Search (TCAM)

Fine Trees: Small binth, reduced ordering effort in TCAM  Key observation: A packet cannot match multiple leaves  only rules within the same leaf need ordering  Reduce update effort  Tree with small binth – fine tree ▪ Not real, just annotations ▪ One coarse-tree leaf contains some contiguous fine-tree leaves ▪ Both trees kept consistent  Updates slow  store fine tree in control memory 12 Root X Y

 Introduction  Background: Decision Trees  Dual tree versions ▪ Coarse Tree ▪ Fine Tree  Updates using Interleaved Layout  Path-by-path updates  Results  Conclusion 13

Leaf 1 Leaf 2  Add a rule  Create empty space at right priority level via repeated swaps  With naïve, contiguous layout of leaves, requires many repeated swaps ▪ As many swaps as #rules  Observation: Only overlapping rules need ordering per priority  Too hard in full generality Root 14 Empty Entries Priority

Leaf 1 Leaf 2  Insight: Leaves are naturally non- overlapping  only rules in a leaf need to be ordered  Trivial unlike general overlap  Interleave rules across all leaves  Contiguously place same priority- level rules from all leaves  Order only across priority levels  Move at most one rule per level ▪ binth levels  small for fine tree ▪ Update effort ≈ binth swaps 15 Root Empty Entries Priority

 Paper breaks down into detailed cases  Worst case where rules move across TCAM subarrays  Interleaved layout effort ≈ 3*binth *#subarrays swaps ▪ ≈ 3*8*25 ≈ 600 swaps (100K rules and 4K rules/subarray)  Contiguous layout effort ≈ #rules ▪ ≈ 100K swaps (100K rules)  Details in the paper 16

 Problem: Update moves hold up memory for long  Make updates non-atomic  Packet lookups can be interspersed between updates  Details in the paper 17

 Introduction  Background: Decision Trees  Dual tree versions ▪ Coarse Tree ▪ Fine Tree  Updates using Interleaved Layout  Path-by-path updates  Results  Conclusion 18

 Key metrics: Lookups & Updates  Lookup accesses: #accesses per packet match  Update effort: #accesses per one-rule addition  Software simulator  EffiCuts, TCAM, and TreeCAM  TCAM: Unoptimized & Partitioned TCAM (Extended TCAM) ▪ 4K subarrays  Decision tree algorithm (EffiCuts)  Tree/TCAM hybrid (TreeCAM) ▪ Coarse tree binth = 4K, Fine tree binth = 8  ClassBench generated classifiers – ACL, FW and IPC 19

20  EffiCuts require many more SRAM accesses than TCAMs  Extended TCAM and TreeCAM require only at most 8 accesses even for 100,000 rule classifiers  Extended TCAM does not handle updates

 Compare TCAM-basic, TCAM-ideal and TreeCAM  EffiCuts and Extended TCAM do not discuss updates  TCAM-basic: periodic empty entries  TCAM-Ideal: Adapt existing work on longest prefix match for packet classification  Identify groups of overlapping rules, and ideally assume ▪ Enough empty entries at the end of every group ▪ Two groups of overlapping rules DO NOT merge (ideal)  We generate worst case update stream for each scheme  Details in the paper 21

Classifier Type Classifier Size TCAM-basicTCAM-idealTreeCAM Empty Slots Max # TCAM Ops Max. Overlaps Max # TCAM Ops #Sub- arrays Max # TCAM Ops ACL10K44K30K K60K276K FW10K68K64K K82K567K IPC10K43K22K K46K236K TreeCAM is close to Ideal and two orders of magnitude better than TCAM-basic

 Previous schemes do not perform well in both lookups and updates  TreeCAM uses three techniques to address this challenge:  Dual tree versions: Decouples lookups and updates ▪ Coarse trees for lookups and fine trees for updates  Interleaved layout bounds the update effort  Path-by-path update enables non-atomic updates which can be interspersed with packet lookups  TreeCAM achieves 6 – 8 lookup accesses and close to ideal TCAM for updates, even for large classifiers (100K rules) 23 TreeCAM scales well with classifier size, line rate, and update rate