Gigabit Routing on a Software-exposed Tiled-Microprocessor

Name: Gigabit Routing on a Software-exposed Tiled-Microprocessor
Uploaded: 2018-01-11T15:07:10+00:00
Duration: PTM11S15
Channel: Gordon Maxwell
Description: Gigabit Routing on a Software-exposed Tiled-Microprocessor

Gigabit Routing on a Software-exposed Tiled-Microprocessor
James W Anderson, Anthony Degangi, Anant Agarwal Umar Saif MIT Computer Science and AI Laboratory

Network Routers xKb/sec xGb/sec ~5 ports ~102 ports
Network “Switch” Network “Processor”

Three Challenges Performance Architectural Scalability Programmability
Gb/sec (OC-192) Architectural Scalability Throughput: x2.2/year Port count: for edge routers Programmability Network Services: NAT, firewalls, VPN “Layer 7” switches Monitoring: Loss rate, link utilization, traffic patterns

Network Processors Conventional Wisdom Tiled “all-purpose”
architectures

MIT RAW Microprocessor
Tiled-architecture Low-latency mesh networks Software-exposed pins Compute Pipeline 8 32-bit channels 2 DOR dynamic networks Memory Dynamic(MDN) General Dynamic(GDN) 2 Static Networks Streaming Tile-Multicast 8 stage 32b MIPS-style single-issue in-order compute processor 4-stage 32b pipelined FPU 32 KB DCache 32 KB IMem Routers and wires for three on-chip mesh networks Registered at input  longest wire = length of tile

RAW Microprocessor RAW Network Routing Parallel processing
Software-exposed tiled-architecture Software exposed Pins Software-exposed point-to-point networks Network Routing Parallel processing Flexible buffering Efficient, scalable switching

However .. Network Processors RAW Microprocessor Processing Switching
Special-purpose hardware Software running on RAW general-purpose tiles Switching Special-purpose switching fabric RAW general-purpose on-chip networks Buffering Centrally-accessible specialized memory-controllers - dedicated interconnects External to the chip, connected to Software-exposed pins Accessed via RAW on-chip networks

IPv4 Router: RFC 1812 Look-up Header verification
DIR-24-8-BASIC [Gupta98] Header verification TTL update, header re-compute Incremental Checksum [RFC 1141] Switch to destination

Evaluation Methodology
Maximum Loss Free Forwarding Rate MLFFR Minimum-sized 64-byte packets Millions of packets per second (mpps) Maximum-sized 1500-byte packets: Gigabit/sec Captured Internet Trace: ~128 bytes Packet Latency RAW Clocked at 425 Mhz Comparison with IXP1200 as a reference point

RAW Router, Take 1: Parallelism
Header Verify SRAM DRAM Packet Buffer Lookup tables Line Card Line Card Line Card Lookup 2 stage lookup Line Card Drain FIFO Line Card Line Card Line Card Line Card Packet Buffer Lookup tables SRAM DRAM Header recompute Interrupt Drain-tile

Flow of Packets L: Lookup V: Verify U: Update D: Drain L V U D Lookup
DRAM Line Card Line Card L V U D Line Card Line Card Line Card Line Card Line Card Line Card Lookup DRAM

RAW Router, Take 1 Static Network for Streaming Packets Static MDN GDN
SRAM DRAM Line Card Line Card Line Card Line Card Line Card Line Card Line Card Line Card Static Network for Streaming Packets Feed the pipeline Stream the payload to DRAM General Dynamic Network Header Forwarding 3 -> 4 Memory Dynamic Network From memory to line-card SRAM DRAM

Version I Performance 1.8 Gb/sec -- > 6.17Gb/sec
2.9 mpps -- > 6.23 mpps

Memory Dynamic Network
RAW Router Version 1 Shared Buffering SRAM DRAM Bus Contention Line Card Line Card Line Card Line Card Line Card Line Card Line Card Line Card Memory Dynamic Network DOR: x --> y SRAM DRAM

RAW Router, Take 2: Buffering and Switching
Line Card Line Card Line Card Line Card Drain FIFO Header recompute Interrupt Drain-tile SDRAM SDRAM Lookup 2 stage lookup SDRAM SDRAM Header Verify Lookup Lookup Line Card Line Card Line Card Line Card

RAW Router, Take 2 Static MDN GDN Line Card Line Card Line Card Line
Respects DOR No “bus contention” for DMAs (bottleneck is shared SDRAMs) 2x Memory BW No need to look at packet length Dynamic networks for “out-of-band” communication GDN SDRAM SDRAM SDRAM SDRAM Lookup Lookup Line Card Line Card Line Card Line Card

Optimized buffering and switching
6.17 Gb/sec -- > 8.68Gb/sec 6.17 mpps -- > 6.77 mpps

RAW Router, take 3: Reducing Memory Transactions
Streaming DDR No fragmentation of frames Line Card Line Card Line Card Line Card SDRAM DRAM Pipelined Memory Requests SDRAM SDRAM SDRAM SDRAM Line Card Line Card Line Card Line Card

Streaming packet buffers + 64-byte minimum buffering
8.68 Gb/sec -- > 9.57Gb/sec 6.77 mpps -- > 9.79 mpps

Buffering on Line-cards
9.57 Gb/sec -- > 15.03Gb/sec 9.79 mpps -- > 9.79mpps

All dynamic networks 9.57 Gb/sec -- > 8.50Gb/sec
9.57 mpps -- > 6.94 mpps

Evaluation with captured Trace

Packet Latency Router Packet size Cycles Time(ns) RAW null 64 416 177
RAW IPv4 690 293 1500 3490 1483 5394 2292

Conclusions Tiled-architectures = NPU performance + enhanced programmability RAW’s low-level software-control was vital for deriving performance: Layout of routing functions 30% improvement by altering layout Role and behavior of the on-chip networks 15% improvement by using GDN and static networks in place of MDN

Conclusions Network oblivious: 30-35% degradation
No Static networks: 10-30% degradation Buffering on line-cards: 35% improvement

Questions: umar@mit.edu
Thank you! Questions:

Gigabit Routing on a Software-exposed Tiled-Microprocessor

Similar presentations

Presentation on theme: "Gigabit Routing on a Software-exposed Tiled-Microprocessor"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gigabit Routing on a Software-exposed Tiled-Microprocessor

Similar presentations

Presentation on theme: "Gigabit Routing on a Software-exposed Tiled-Microprocessor"— Presentation transcript:

Similar presentations

About project

Feedback