Department of Electrical and Computer Engineering Kekai Hu, Harikrishnan Chandrikakutty, Deepak Unnikrishnan, Tilman Wolf, and Russell Tessier Department.

Slides:

Advertisements

Similar presentations

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

Advertisements

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Parallell Processing Systems1 Chapter 4 Vector Processors.

Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.

Geoff Salmon, Monia Ghobadi, Yashar Ganjali, Martin Labrecque, J. Gregory Steffan University of Toronto.

Extensible Networking Platform IWAN 2005 Extensible Network Configuration and Communication Framework Todd Sproull and John Lockwood

Router Architecture : Building high-performance routers Ian Pratt

400 Gb/s Programmable Packet Parsing on a Single FPGA Authors : Michael Attig 、 Gordon Brebner Publisher: 2011 Seventh ACM/IEEE Symposium on Architectures.

Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.

By Aaron Thomas. Quick Network Protocol Intro. Layers 1- 3 of the 7 layer OSI Open System Interconnection Reference Model  Layer 1 Physical Transmission.

Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )

Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )

Customizing Virtual Networks with Partial FPGA Reconfiguration

10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.

1 Multi-Core Architecture on FPGA for Large Dictionary String Matching Department of Computer Science and Information Engineering National Cheng Kung University,

Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.

ECE 526 – Network Processing Systems Design

Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.

Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.

Chapter 9 Classification And Forwarding. Outline.

1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.

Router Architectures An overview of router architectures.

Router Architectures An overview of router architectures.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.

On-Chip Control Flow Integrity Check for Real Time Embedded Systems Fardin Abdi Taghi Abad, Joel Van Der Woude, Yi Lu, Stanley Bak, Marco Caccamo, Lui.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

VeriFlow: Verifying Network-Wide Invariants in Real Time

University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.

Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.

Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Computer Architecture Lecture10: Input/output devices Piotr Bilski.

Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf A Methodology.

Vigilante: End-to-End Containment of Internet Worms Authors : M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham In Proceedings.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.

4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.

Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.

Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )

Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler

Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.

Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.

TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.

OpenFlow MPLS and the Open Source Label Switched Router Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,

Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.

1 IEX8175 RF Electronics Avo Ots telekommunikatsiooni õppetool, TTÜ raadio- ja sidetehnika inst.

ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.

Fabric: A Retrospective on Evolving SDN Presented by: Tarek Elgamal.

A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.

Dynamo: A Runtime Codesign Environment

A DFA with Extended Character-Set for Fast Deep Packet Inspection

Hardware Support for Embedded Operating System Security

Dynamic Packet-filtering in High-speed Networks Using NetFPGAs

Implementing an OpenFlow Switch on the NetFPGA platform

P4FPGA : A Rapid Prototyping Framework for P4

EE 122: Lecture 7 Ion Stoica September 18, 2001.

Project proposal: Questions to answer

Presentation transcript:

Department of Electrical and Computer Engineering Kekai Hu, Harikrishnan Chandrikakutty, Deepak Unnikrishnan, Tilman Wolf, and Russell Tessier Department of Electrical and Computer Engineering University of Massachusetts Amherst High-Performance Hardware Monitors to Protect Network Processors from Data Plane Attacks

Russell Tessier 2 Outline  The problem: software attacks on network routers Routers now include programmable processors  Our solution: include a monitor for the processor  Software to generate the monitor information  Implementation on DE4 NetFPGA board  Experimental results

Russell Tessier 3 Computer Networks  Networks provide connectivity between end- systems  Success of the Internet: hourglass architecture  Success is also a problem: diverse apps, diverse systems  Changing requirements for the network layer

Russell Tessier 4 Network Architecture  Extensions to current Internet  Customization of data plane  Requires router systems with ability to adapt

Russell Tessier 5 Programmable Router  Network processor General-purpose processing capability in data path Packet processing in software  High-performance processing hardware Scalability through high levels of parallelism  Example: 40-core network processors  Key challenge: Security due to software processing Protocol processing on packet processor

Russell Tessier 6 Exploit of Vulnerable Network Processor  Vulnerability in network processor can be exploited  “In-network” denial of service attack (new type of attack)  Key questions Can such vulnerabilities occur in packet processing code? (Yes, we show one example.) Can vulnerabilities be exploited? (Yes, for von-Neumann and for Harvard architectures)

Russell Tessier 7 Header Insertion Exploit  Stack smashing during header insertion Control flow can be changed to attack code  Attack code: infinite transmission loop Devastating denial-of-service attack  Prototype implementation Custom network processor on NetFPGA  Harvard arch.: possible, but less exciting

Attack model 8 Congestion management protocol –Inserts a custom header Integer overflow vulnerability –len1 = 12 –len2 = –sum = 10

Russell Tessier 9 Defense Mechanism: Hardware Monitor  Hardware monitor co-located with each processor core Core reports hash of each executed instruction  Monitoring graph repre- sents correct behavior Obtained from offline analysis of binary Deviations trigger reset  Change of software easy Just need matching monitoring graph

Russell Tessier 10 Offline Analysis of Processing Binary  Executed instruction reported by core as 4-bit hash Hash combines address, opcode, registers Hash allows for compact representation of information  Monitoring graph Each instruction represented as a state Edges correspond to execution of instruction Control-flow operations lead to multiple possible next states

Russell Tessier 11 NFA-to-DFA conversion  Problem: non-deterministic finite automaton (NFA) State may have two next states with same edge value due to hash Implementation would need to keep track of multiple states  Solution: NFA-to-DFA conversion (powerset construction) Well-known algorithm [Hopcroft and Ullman, 1976] Deterministic finite automaton (DFA) requires only one state

Russell Tessier 12 Implementation of DFA Monitor  Each state may have up to 16 next states Most states only have one or two next states  Requirements Compact representation Fast processing of each hash value (single memory access)  Idea: grouping of next states in contiguous memory Trick: determine offset into memory based on order of hash value

Russell Tessier 13 Implementation of DFA Monitor  DFA monitor system Memory keeps all valid hash values in one bit vector Next state based on offset of group and position of hash value

Russell Tessier 14 Altera DE4 NetFPGA Infrastructure  Open source port of NetFPGA 1G [1]  Networking research platform Altera DE4 board 5x resources compared to NetFPGA 1G PCI Express, 8GB DDR2  Configurable designs Reference router Packet generator

Russell Tessier 15 Altera DE4 NetFPGA reference router  Complete IPV4 router Forward packets on all four 1Gbps interfaces  Design components Input queues Input arbiter Output port lookup Output queues  Prototype system replaces output port lookup module

Russell Tessier 16 Single-core system architecture  Single-core network processor system integrated along with Altera DE4 NetFPGA reference router pipeline

Harvard memory architecture 17 Separate physical memory space for instruction and data –General memory error techniques for code-injection attacks will not be successful –Attacks need to be generated with existing library functions

Russell Tessier 18 Experimental Setup  Altera DE4 reference router with prototype system  Altera DE4 packet generator Generate packets Capture forwarded packets  Evaluation metrics Attack detection Throughput Resource utilization

Russell Tessier 19 Offline analysis  MIPS-GCC compiler generates binary and branch information  Powerset construction to perform NFA- to-DFA transformation.  Memory initialization file is loaded into memory

Russell Tessier 20 Evaluation  Monitoring speed Single memory access Lookup into fixed- size register file  Memory size of monitor More states due to NFA-to-DFA conversion More states due to multiple entries in memory for certain states In practice, overhead is below 10%  Very fast and compact hardware monitor

Benchmarks 21 NpBench [1] –Modern network applications –Three specific functional groups Traffic management and quality of service group Security and media processing group Packet processing group [1] B.K. Lee and L.K. John, "NpBench: a benchmark suite for control plane and data plane applications for network processors," in Proc of 21st International Conference on Computer Design, vol., no., pp , Oct

Russell Tessier 22 Prototype Implementation on FPGA  Small overhead compared to processor core (4096 states):  Correct operation: attack packet detected and dropped

Russell Tessier 23 Attack with Defense in Place  Attack packet dropped, router continues to operate

Russell Tessier 24 Throughput 24  Throughput performance of the network processor with security monitor CM protocol and IPV4 application

Russell Tessier 25 Multicore Monitor  Dynamic workloads pose problem for hardware monitor Processing may differ between packets Monitors need to match processing  Mapping between processors and monitors 1-to-1 mapping requires frequent reload of monitor Any-to-any mapping costly to implement Clusters with n-to-m mapping provide balance  Interconnect is configured dyna- mically depending on workload Mapping between core and monitor

Russell Tessier 26 System Architecture of Clustered System  Multiple cores can access multiple monitors Dynamic configuration of crossbar  Secure loading of monitors through external interface

Russell Tessier 27 Cluster Design  Simple implementation of clustered monitor Dynamic configuration through programming of demultiplexers

Russell Tessier 28 Dual-Ported Monitor Implementation  Memory of monitor can be shared between two monitors Effective use of dual-ported memory Two monitoring graphs can be used in parallel

Russell Tessier 29 Prototype Implementation on FPGA  Resources cost for a multi-core system (4 cores, 6 monitors):

Russell Tessier 30 Conclusions  Current and future Internet needs to meet new demands  Programmable routers provide packet processing platform Systems problem: security vulnerabilities Attacks can be launched within data plane (i.e., not control access) Monitor-based hardware defense mechanism is effective  Hardware monitor design and prototype Uses compact DFA (less than 10% more states than NFA) Verification with single memory access per instruction Defense shown for Harvard architecture attack  Technique extended to multicore network processors  Promising defense against attacks on network infrastructure