Deep Packet Inspection Which Implementation Platform? Sarang Dharmapurikar Cisco.

Slides:

Advertisements

Similar presentations

1 May 19th, 2009 Announcement. 2 Drivers for Web Application Delivery Web traffic continues to increase More processing power at data aggregation points.

Advertisements

ITRS Roadmap Design + System Drivers Makuhari, December 2007 Worldwide Design ITWG Good morning. Here we present the work that the ITRS Design TWG has.

Deep Packet Inspection: Where are We? CCW08 Michela Becchi.

Content Aware Networks

OpenFlow and Software Defined Networks. Outline o The history of OpenFlow o What is OpenFlow? o Slicing OpenFlow networks o Software Defined Networks.

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.

Network Processor Technical Report Present by: Jiening Jiang June 05.

Hardware specification. The “brain” The brain processes instructions, performs calculations, and manages the flow of information through the body The.

Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas.

U NIVERSITY U NIVERSITY OF T ORONTO U NIVERSITY OF T ORONTO Bionic databases are coming. What will they look like? *Ryan Johnson & Ippokratis Pandis**

4. Shared Memory Parallel Architectures 4.4. Multicore Architectures

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.

Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.

A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.

1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.

Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.

INTEL COREI3 INTEL COREI5 INTEL COREI7 Maryam Zeb Roll#52 GFCW Peshawar.

Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Some Thoughts on Technology and Strategies for Petaflops.

Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

Network based System on Chip Students: Medvedev Alexey Shimon Ofir Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

Define Embedded Systems Small (?) Application Specific Computer Systems.

Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.

Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.

Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.

System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.

Kamalapurkar Shounak Rajarshi Salil Joshi Rohan Bhavsar Sagar Pai Sandesh Low Latency Publisher-Subscriber Network for Stock Market Application Team WhiteWalkers.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Programmable Data Planes COS 597E: Software Defined Networking.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.

Spring 2007Lecture 16 Heterogeneous Systems (Thanks to Wen-Mei Hwu for many of the figures)

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.

Parallel Event Processing for Content-Based Publish/Subscribe Systems Amer Farroukh Department of Electrical and Computer Engineering University of Toronto.

Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.

HARDWARE BASED PACKET FILTERING USING FPGAs (or “How hardware is better than software at judging a book by its cover”) Timothy Whelan Supervisor: Mr Barry.

ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.

TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.

Introduction to Network Processors Readout Unit Review 24 July 2001 Beat Jost Cern / EP.

Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.

Reconfigurable Supercomputing (2) Key Issues in HPC  Leveling off of performance Traditional Scalar/Vector – long product cycles, too few vendors.

Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy

Hardware Architecture

VIRTUAL NETWORK COMPUTING SUBMITTED BY:- Ankur Yadav Ashish Solanki Charu Swaroop Harsha Jain.

What is CRKIT Framework ? Baseband Processor :  FPGA-based off-the-shelf board  Control up to 4 full-duplex wideband radios  FPGA-based System-on-Chip.

NFV Compute Acceleration APIs and Evaluation

LHCb and InfiniBand on FPGA

2018/4/27 PiDFA : A Practical Multi-stride Regular Expression Matching Engine Based On FPGA Author: Jiajia Yang, Lei Jiang, Qiu Tang, Qiong Dai, Jianlong.

Parallel Software Development with Intel Threading Analysis Tools

What happens inside a CPU?

Why PC Based Control ?.

Emu: Rapid FPGA Prototyping of Network Services in C#

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

NetFPGA - an open network development platform

Presentation transcript:

Deep Packet Inspection Which Implementation Platform? Sarang Dharmapurikar Cisco

Implementation Platform Several choices, each with some pros and cons –ASICs –FPGA –Network Processors –Graphics Processors (nVidia) –multiple-core, multi-threaded Commodity processors Needs evaluation with respect to –Cost –Speed –Overall system performance (DPI is just a small piece of the puzzle) –Ease of use and upgrading A hardware-software co-design approach –Profile a DPI system and push some components in hardware if the overall speed up is effective (Ahmdals law)

ASIC Examples: ClassiPi, NetLogic, Tarari, some Cisco ASICs Requires too much investment –NRE close to a million dollars! A long design cycle –Most of the time is consumed in verification Hard to upgrade –Algorithms evolve –It is hard to build a flexible enough ASIC Applications get locked to a platform –To migrate to a new platform requires a lot of software rewriting

FPGA Very flexible but expensive and power-consuming –Virtex-5 offers 330,000 lookup tables units –4MB of SRAM Latest Xilinx FPGA contain multiple PowerPC cores Possible to design hybrid hw/sw systems –The compoents that assist DPI such as TCP-reassembly, normalization, flow classification done in hardware Several FPGA platforms for networking acceleration available today –NetFPGA –FPX Need to be careful in the DPI approach –The raw signature matching techniques that use FPGA logic resources for each signature wont scale

Network Processors Intel IXP2850 –16 micro-engines with 2KB D$ and 8KB I$ and 16 entry CAM –An integrated XScale processor for control path 32KB I$ and 32kB D$ –2 Crypto units –16KB shared scratch pad SRAM Cisco QuantumFlow processor –40 packet processing engines (PPE) 1.2 GHz –4 threads per PPE –Dedicated hardware for queuing, buffering, IP lookup and classification

Commodity processors Really powerful server class processors coming up –Intels Nehalem 8 cores 2 threads per core 32KB L1, 256 KB L2, 10+MB of shared L3 cache –Suns Niagara2 8 cores 8 threads per core! 16KB I$ and 8KB D$ per core, 4MB shared L2 cache. Integrated cryptographic coprocessors units Need to think multi-core, multi-threaded –Think in terms of a complete system, not just pattern matching –Which core should do what? Need to design cache-friendly data structures

Conclusion While hardware can assist DPI systems, building proprietary hardware not a good idea Lets understand the actual performance needs –Lets not be misguided by marketing needs Need to think of hardware-software co-design –Requires careful profiling of DPI systems to identify the components that can be pushed to hardware Need to design algorithms for multi-core multi-threaded processors