ECE 526 – Network Processing Systems Design Hardware Architecture for Protocol Processing Chapter 8: D. E. Comer
Goal Understand hardware architecture of protocol processing Learn the key metric of protocol processing system: aggregated packet rate Learn the key requirements of protocol processing system High throughput Scalability Survey mechanisms to design scalable protocol processing systems Ning Weng ECE 526
Outline First generation of network processing system architecture Figure of metric of network processing system Possible ways to improve performance of network processing systems Ning Weng ECE 526
1st Generation Network System Traditional software-based router Using conventional hardware Single general-purpose processor handles most tasks Single shared memory I/O over a shared bus NIC use same design as other I/O devices Cheap but performance is poor! Ning Weng ECE 526
Figure of Metric of Network Systems Interface data rate Rate at which data enters /leaves Aggregate data rate Sum of interface rates Measure of total data rate system can handle Note: aggregate rate crucial if CPU handles traffic from all interfaces Could be misleading if packet size varying and processing cost constant Aggregate packet rate Sum of the number of packets enters / leaves system More important for protocol processing (no touch payload) Why? Packet rate vs. data rate CPU metric: per-packet rate Interface hardware metric: per-bit (data) rate Small packet is critical for constant data rate and constant processing cost per packet Header processing: forwarding Payload processing: encrypting or string matching Ning Weng ECE 526
Data Rate vs. Packet Rate Packet size: small 64 byte; large 1518 byte For protocol processing, with same data rate, which is more difficult for network processing system? Smallest packet or Biggest packet How to calculate the packet rate? Ning Weng ECE 526
Aggregate Packet Rate Ning Weng ECE 526
Time per Packet Aggregate packet rate determines time per packet Each packet processing requires in the order of 100s to 1000s instruction per packet Ning Weng ECE 526
Feasibility Analysis Design a software router data rate 10Gbps Assuming small packets (64B) Assuming each packet need 10,000 instruction to process Can Intel 80986@2009 do the job? CPU:24Ghz 1 billion transistors Address bus bit: 64 CPU is a RISC machine which can execute an instruction per clock cycle Hint: What is the packet rate? What is the processing requirement in MIPS? Single CPU router lacks scalability. How multi-core? Ning Weng ECE 526
Scalability The capability of a system that can be easily extended in “size” and performance E.g., CPU with more memory slots and disk slots; router can add more ports or faster links Why we care scalability? Design a new system is timing consuming and expensive Performance requirement increase fast Others How can we make a network system more scalable? Optimized processing engines Intelligent NICs Parallel processing by duplicating processing engines + NICs Ning Weng ECE 526
Processing Power Overcoming processing bottlenecks: Other improvements Specialized hardware (ASICs) Fine-grained parallelism Symmetric coarse-grain parallelism Asymmetric coarse-grain parallelism Special-purpose coprocessors Other improvements NICs with onboard processing Smart NICs => basically same as per-port processing engines Ning Weng ECE 526
Parallelism in Processors Fine-grained parallelism Exploits instruction-level parallelism Examples: VLIW, SMT, etc. Limited due to workload Symmetric coarse-grain parallelism Multiple parallel identical CPUs Inter-processor communication can limit performance Asymmetric coarse-grain parallelism Multiple parallel different CPUs E.g., one processor for layer 2, one for layer 3 Special-purpose coprocessors Custom logic for lookups, checksums, etc. High-performance but not (fully) programmable Key question: how such a system can be programmed Duplicate processing engines --- Advance router architecture Ning Weng ECE 526
Advanced Router Architecture S.Keshav etc. IEEE Communication 1998 Port: point of attachment for a physical link Switching Fabric (SF): interconnect input & output ports Line Card: the device connecting between link and SF Routing Processor: create forwarding tables using routing protocol Queues: buffers between input port and SF or SF and output port Ning Weng ECE 526
Advanced Router Architecture Changing requirements Increasing link speed Increasing number of ports Increasing routing tables Increasing processing complexity Scalable system design: Exploit parallelism wherever possible Per-port, per-flow, per-packet, instruction-level One Processing engine per port (instead of single CPU) Multiple processors per port “Better” processors Ning Weng ECE 526
Reminder Read Comer: chapter 11 & 12 Sep. 19: project group leader email me your group members for the project Sep. 24: homework 1 due Ning Weng ECE 526