Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Router/Classifier/Firewall Tables Set of rules—(F,A)  F is a filter Source and destination addresses. Port number and protocol. Time of day.  A is an.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Computer Networks20-1 Chapter 20. Network Layer: Internet Protocol 20.1 Internetworking 20.2 IPv IPv6.
Spring 2006CS 685 Network Algorithmics1 Principles in Practice CS 685 Network Algorithmics Spring 2006.
OpenFlow overview Joint Techs Baton Rouge. Classic Ethernet Originally a true broadcast medium Each end-system network interface card (NIC) received every.
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—4-1 Implementing Inter-VLAN Routing Deploying Multilayer Switching with Cisco Express Forwarding.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
CSIE NCKU High-performance router architecture 高效能路由器的架構與設計.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
Katz, Stoica F04 EECS 122: Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Chapter 4 Queuing, Datagrams, and Addressing
Computer Networks Switching Professor Hui Zhang
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
Workpackage 3 New security algorithm design ICS-FORTH Paris, 30 th June 2008.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 9 : MP3 Working Draft Washington University Fall 2002
Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.
Department of Computer Science and Engineering Applied Research Laboratory A TCP/IP Based Multi-Device Programming Circuit David V. Schuehler – Harvey.
The Layered Protocol Wrappers 1 Florian Braun, Henry Fu The Layered Protocol Wrappers: A Solution to Streamline Networking Functions to Process ATM Cells,
Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Reconfigurable.
Router Architecture Overview
Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.
ECE 526 – Network Processing Systems Design Networking: protocols and packet format Chapter 3: D. E. Comer Fall 2008.
Advance Computer Networking L-8 Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan.
CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 4 : Demonstration of Machine Problem 1 : CAM-based Firewall Washington.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Extensible Networking Platform Lockwood / Zuver - Applied Research Laboratory -- Extensible Networking Development of a System-On-Chip Extensible.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
Author : Ioannis Sourdis, Vasilis Dimopoulos, Dionisios Pnevmatikatos and Stamatis Vassiliadis Publisher : ANCS’06 Presenter : Zong-Lin Sie Date : 2011/01/05.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 7 : Demonstration of Machine Problem 2 : SPAM FILTER Washington University.
Washington WASHINGTON UNIVERSITY IN ST LOUIS 1 DTI Visit - John DeHart- 4/25/2001 Agenda l WU/ARL Background – John DeHart (15 minutes) l DTI Background.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Field Programmable Port Extender (FPX) 1 NCHARGE: Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #9 – Logic.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 11 : Priority and Per-Flow Queuing in Machine Problem 3 (Revision 2) Washington.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the Field Programmable Port Extender John Lockwood and David Taylor Washington University.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 10 : MP3 Working Draft Washington University Fall 2002
Field Programmable Port Extender (FPX) 1 Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied Research.
The FPX KCPSM Module 1 Henry Fu The FPX KCPSM Module: An Embedded, Reconfigurable Active Processing Module for the FPX Henry Fu Washington University.
Cisco I Introduction to Networks Semester 1 Chapter 6 JEOPADY.
Washington University
Addressing: Router Design
CprE / ComS 583 Reconfigurable Computing
What’s “Inside” a Router?
Washington University
Advance Computer Networking
Transport Layer Systems Packet Classification
Implementing an OpenFlow Switch on the NetFPGA platform
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Internet Protocol version 6 (IPv6)
Packet Switch Architectures
Chapter 4: outline 4.1 Overview of Network layer data plane
Presentation transcript:

Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Hardware assisted Simulated Annealing °Use FPGA to perform FPGA placement °Take advantage of parallelism and specialization °Some limitations Global view of cost Convergence Scalability °Lots of benefits Massive parallelism Self-contained reconfigurable system Courtesy: Wrighton/DeHon

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Systolic Architectures Memory Bottleneck Compute Memory Compute Memory Compute Memory Compute Memory Compute Memory Compute Memory Compute

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Strategy °Reformulate simulated annealing allowing only local swaps °Consider all swaps in parallel °Maintain information in “systolic cells” Represent current placement spatially Construct hardware to operate on entire placement at once

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Local Swaps Local Communication Local Swaps Massively Parallel Operation

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Individual Swap Element myX, myY counter myID Fanout0(id, x, y) PosChain(id, x, y) Fanout2(id, x, y) Fanout2N(id, x, y) Fanin0(id, x, y) Fanin1(id, x, y) Fanin2(id, x, y) FaninN(id, x, y) Position chain in Left data in Position chain out Right Data In Right data out Left data out Fanout1(id, x, y) Up data in/out Down Data in/out Randomness Arithmetic Unit

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Linear Wirelength Improvement

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Choosing 400 Cooling Steps

Lecture 13: Reconfigurable Computing Applications October 10, 2013 VPR Comparison Methodology

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Speedups °VPR on 2.2 GHz Xeon Workstation °500x for ex5p 18% channel growth °1200x for spla 41% channel growth °More opportunity for speedups with better cooling schedules °Better quality with better cost functions °Feasible on a Virtex2000E part

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Networking Application: Reconfigurable Firewall °Networking hardware well suited for reconfigurable hardware Target signatures change often Massive quantities of stream-based data Repetitive operations °Connecting up to a realistic networking environment is hard Washington University experimental setup one of the best Shows importance of both memory and processing capability °Numerous experiments performed over the past five years Courtesy: Lockwood

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Network Routing FPGAs popular in network hardware New protocols implemented directly in silicon Easy to upgrade in the field Washington University Gigabit Switch (WUGS) -Switch provides up to 160 Gbps of bandwidth.

Lecture 13: Reconfigurable Computing Applications October 10, 2013 FPGA-based Router FPX module contains two FPGAs NID – network interface device -Performs data queuing RAD – reprogrammable application device -Specialized control sequences

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Reconfigurable Data Queuing Data may be congested. FPGA can be programmed for virtual channels.

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Hardware Setup Stacked boards part of system Scalable to multiple boards Allows for cooling, power.

Lecture 13: Reconfigurable Computing Applications October 10, 2013 IP Lookup Function RAD can be used to evaluate packet headers. Headers evaluated in groups of four bits

Lecture 13: Reconfigurable Computing Applications October 10, 2013 FPX Hardware Platform

Lecture 13: Reconfigurable Computing Applications October 10, 2013 FPX Hardware in WUGS-20 Switch

Lecture 13: Reconfigurable Computing Applications October 10, 2013 System-On-Chip Firewall

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Content Matching Module regex_app (given) 32 dataen_out_appl d_out_appl sof_out_appl eof_out_appl sod_out_appl tca_out_appl clk reset_l enable_l dataen_appl_in d_appl_in sof_appl_in eof_appl_in sod_appl_in tca_appl_in Matched ready_l 32 8 To extended Bits of CAM To existing MP1 circuit From Protocol Wrappers wrapper_module.vhd

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Packet matching w/ Content Addressable Memory °Sample Packet: -Source Address = (dotted.decimal) -Destination Address = (dotted.decimal) -Source Port = 4096 (decimal) -Destination Port = 50 (decimal) -Protocol = TCP (6) -Payload = “Consolidate your loans. CALL NOW” –Payload Lists = { General SPAM (0), Save Money SPAM (1) } –Content Vector = “ ” (binary) = x”03” (hex) Src IP (hex) = 80FC0505 Dest IP (hex) = 8D8E0202 Src Port = 1000 Dest Port = 0050 Proto = All values shown In hex Con- tent =

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Sample Filter -Source Address = / 16 -Destination Address = / 16 -Source Port = Don’t Care -Destination Port = 50 -Protocol = TCP (6) -Payload includes general SPAM (List 0) Src IP (hex) = 80FC0505 Dest IP (hex) = 8D8E0202 Src Port = 1000 Dest Port = 0050 Proto = Src IP value = 80FC0000 Dest IP (hex) = 8D8E0000 Src Port = 0000 Dest Port = 50 Proto = 06 Src IP (hex) = FFFF0000 Dest IP (hex) = FFFF0000 Src Port = 0000 Dest Port = FFFF Proto = FF Value Mask: 1=care 0=don’t care IP Packet Con- ten t= 01 Con- ten t= 01 Con- tent= = 03 DROP the packet : It matches the filter

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Packet Classifier with FlowID CAM MASK [1] CAM VALUE [1] CAM MASK [2] CAM VALUE [2] CAM MASK [3] CAM VALUE [3] CAM MASK [N] CAM VALUE [N] Flow ID [1] 112 bits Flow ID [2] Flow ID [3] Flow ID [N] Flow ID bits Value Comparators Mask Matchers Priority Encoder Resulting Flow Identifier Flow List Source Address Destination Address 16 bits Payload Match Bits Source Port Dest. Port Protocol - - CAM Table - - Bits in IP Header

Lecture 13: Reconfigurable Computing Applications October 10, 2013 Other Modules Implemented °IPv4 CAM Filter 104 Bit header matching °Fast IP Lookup (FIPL) Longest Prefix Match MAE-West at 10M pkts/second °Packet Content Scanner Reg. Expression Search °Data Queueing Per-flow queue in SDRAM °IPv6 Tunneling Module Tunnels IPv6 over IPv4 °Statistics Module Event counter °Traffic Generator Per-flow mixing °Video Recoder Motion JPEG °Embedded Processor KCPSM