ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer
Ning WengECE 5262 Goal Understanding the inefficiency of 1 st, 2 nd and 3 rd generation network processing systems ─ Scalability plus flexibility Recognizing the necessity of new solution: 4 th generation (network processor technology) Learning ─ courage to appreciate the challenges ─ skill to characterize the “real” problem ─ art to propose an engineering solution Be aware of current network processor is a conceptual and general term
Ning WengECE 5263 Recall 1 ST 1 st generation network processing system Feasibility study ─ Design a software router data rate 10Gbps Assuming small packets (64B) Assuming each packet need 10,000 instruction to process ─ Can Intel do the job? CPU:24Ghz MIPs:125,000 (Million Instruction Per Second) 1 billion transistors …. ─ Conclusion: not feasible What is the real problem here?
Ning WengECE 5264 Real Problem is Technology push: uneven ─ Link bandwidth scaling much faster than CPU and memory technology ─ Transistor scaling and VLSI technology help but not enough Application pull: harder ─ More complex applications are required ─ Processing complexity is defined as the number of instructions and number of memory access to process one packet
5 What is the ideal platform? Structured ASIC FPGA Network Processor Reconfigura ble Co- processors
Ning WengECE nd and 3 rd Generations 2 nd generation: offloading and decentralized 3 rd generation: further offloading and using specialized devices (ASIC + embedded processors) Problems: losing the flexibility and very cost, why?
Ning WengECE 5267 Why not ASIC? High cost to develop ─ Network processing moderate quantity market Long time to market ─ Network processing quickly changing services Difficult to simulate ─ Complex protocol Expensive and time-consuming to change Little reuse across products Limited reuse across versions No consensus on framework or supporting chips Requires expertise
Ning WengECE 5268 Network Processors Question: where does NP gain higher performance from, compared with conventional processor?
Ning WengECE 5269 Instruction Set: minimality Not general as RISC and CISC processor ─ E.g. no floating point instructions ─ Optimized for packet processing functions only Not specific to a protocol or part a protocol Seek a minimal set of instruction set of instructions sufficient to handle arbitrary protocol, ─ plus specific instructions for protocol processing Example : atomic operation ─ Hard problem and will cover later
Ning WengECE Architecture: multiprocessor Parallelism ─ The nature of workload network processing: high parallel Flow-level Queue-level Packet-level Protocol-level Pipelining ─ Pipeline will help system performance at cost of longer delay ─ Is this acceptable? System-on-chip ─ Processing: RISC core ─ Memory: register, cache, instruction store, scratch pad, SRAM and SDRAM ─ I/O: network /switch fabric interfaces Question: how hard to build and use this NPs?
Ning WengECE Typical Processing
12 Case Study: IPv4 Packet Forwarding aa ba e bbcdd FF E FFF 01F01F 01F Root Memory access 1 Memory access 2 Memory access 5 Memory access 6 a b c d e Prefix (hex : binary) : 0* 002 : * 002F : * FFE : 000* FFF : * From (0) To (0) From (1) To (1) Lookup IPRoute 2-port router (2 Gbps) IP Lookup: longest prefix match (trie lookup algorithm) Xilinx Virtex-II Pro FPGA (2VP30)
13 Multiprocessor for Header Processing Packet Reception Packet Transmission Lookup-1 Transmit Verify Lookup-1 Transmit Verify Lookup-1 Transmit Verify FS L BRAM RS232Timer LEDs Lookup-2 Lookup-1 Transmit Verify Lookup-2 BRAM OP B FIFO queues
Ning WengECE Typical using NPs
Ning WengECE System Implementation Space
Ning WengECE Memory Architecture Memory access bottleneck Memory is area consuming ─ Limited memory-on-chip ─ Limited bandwidth to off-chip memory: pin and package cost ─ Off-chip memory access is slow: 100 cycles Possible solutions ─ Profiling application memory access pattern ─ Propose heterogeneous memory architecture ─ Memory aware mapping ─ Transactional memory (project topic)
Ning WengECE Application Mapping Current approach: fixed topology, assembly coding & hand-tuning Mapping
18 Basic Steps for Mapping Application description High-level optimizations Task graph (platform specific) Architecture configuration HW / SW partitioning Task allocation Data layout Communication assignment Compilation / Synthesis Profile PEFPGA PEFPGA PEFPGA PEFPGA MEM From (0) To (0) From (1) To (1) Lookup IPRoute
Ning WengECE Summary Network Processor ─ Special purpose, programmable hardware device ─ Optimized for network processing ─ Building blocks of network processing systems ─ Fundamental ideas Flexibility through programmability Scalability with parallelism and pipelining Here, NP is a concept ─ We will learn example of network processor soon
Ning WengECE For Next Class & Announcement Read Comer: chapter 13 and 14 Lab 1 total grade reduce to 82 HW 1 due Wed. Project topic will be announced after Wed.