Download presentation
Presentation is loading. Please wait.
Published byBeverly Welch Modified over 9 years ago
1
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture Group http://www.cs.rice.edu/CS/Architecture/
2
2 Why programmable network interface? More complex functionality on network interface –TCP offload, iSCSI, etc. Easy maintenance –Bug fix, upgrade, customization, etc. Performance? –51% less web server throughput than ASIC NIC –Big problem
3
3 Improving Performance Increase clock speed and/or complexity –Typical solutions for general-purpose processors –Do not work for embedded processors Design constraints: limited power and area –Power proportional to C V² f –Higher f requires higher V –Thus, Power roughly proportional to f³ –Complexity increases C only for marginal gains Implication: simple and low frequency processor
4
4 Use Parallel Programming Use multiple programmable cores –Increase computational capacity –Achieve performance within power limit Consume far less power than higher-frequency core Improvements with two cores over single core –65-157% for bidirectional traffic –27-51% for web server workloads –Web server throughput comparable to ASIC NIC
5
5 Outline Background –Tigon Programmable Gigabit Ethernet Controller –Network Interface Processing: Send/Receive Parallelization of Firmware Experimental Results Conclusion
6
6 Tigon Gigabit Ethernet Controller Two programmable cores –Based on MIPS running at 88MHz –Small on-chip memory (scratch pad) per core Shared off-chip SRAM Supports event-driven firmware No interrupts –Event handlers run to completion –Handlers on same core require no synchronization Released firmware fully utilizes only one core No previous Ethernet firmware to utilize 2 cores
7
7 Send Processing Mailbox Send Buffer Descriptor Ready Tigon event: Main Memory CPU Network Interface Card Packet Descriptor 1. Create buffer descriptor PCI Bus Network Bridge 2. Alert: produced buffer descriptor Memory-mapped I/O Descriptor Direct Memory Access 3. Fetch buffer descriptor Packet 4. Transfer Packet 5. Transmit Packet Interrupt 6. Alert: consumed buffer descriptor DMA Read Complete Send Data Ready Update Send Consumer DMA Write Complete Index
8
8 Receive Processing: Pre-allocation Mailbox Receive Buffer Descriptor Ready DMA Read Complete Tigon event: Main Memory CPU Network Interface Card Descriptor 2. Create buffer descriptor PCI Bus Network Bridge 3. Alert: produced buffer descriptor Memory-mapped I/O Descriptor Direct Memory Access 4. Fetch buffer descriptor Receive Buffer 1. Allocate receive buffer
9
9 Receive Processing: Actual Receive Receive Complete Tigon event: DMA Write Complete Update Receive Return Producer Main Memory CPU Network Interface Card Descriptor 2. Create buffer descriptor PCI Bus Network Bridge 5. Alert: produced buffer descriptor Interrupt Descriptor Direct Memory Access 4. Transfer buffer descriptor Receive Buffer Packet 1. Store packet 3. Transfer packet Packet Index
10
10 Tigon Uniprocessor Performance Intel 100% over Tigon Decreasing maximum UDP throughput due to network headers and per-frame overhead Tigon with uniprocessor firmware Intel PRO/1000 MTNetgear 622T
11
11 Outline Background Parallelization of Firmware –Principles –Resource Sharing Patterns –Partitioning Process Experimental Results Conclusion
12
12 Principles Identify unit of concurrency –Event handler Analyze resource sharing patterns Profile uniprocessor firmware Partition event handlers so as to –Balance load –Minimize synchronization –Maximize on-chip memory utilization
13
13 Resource Sharing Patterns Mailbox DMA Read Complete Send Buffer Descriptor Ready Send Data Ready Update Send Consumer Receive Buffer Descriptor Ready Receive Complete Update Receive Return Producer DMA Write Complete Shared data objects Shared DMA read channel Shared DMA write channel
14
14 Partitioning Process Mailbox DMA Read Complete Send Buffer Descriptor Ready Send Data Ready Update Send Consumer Receive Buffer Descriptor Ready Receive Complete Update Receive Return Producer DMA Write Complete Shared data objects Shared DMA read channel Shared DMA write channel 6% 4% 3% 5% 14% 30% 1% 31% CPU A: CPU B: 30% 31% 52% 41% 47% 53%
15
15 Final Partition Mailbox DMA Read Complete Send Buffer Descriptor Ready Send Data Ready Update Send Consumer Receive Buffer Descriptor Ready Receive Complete Update Receive Return Producer DMA Write Complete Shared data objects Shared DMA read channel Shared DMA write channel CPU A: 47% CPU B: 53%
16
16 Outline Background Parallelization of Firmware Experimental Results –Improved Maximum Throughput –Improved Web Server Throughput Conclusion
17
17 Experimental Setup Network interface card –3Com 710024 Gigabit Ethernet interface card based on Tigon Firmware versions –Uniprocessor firmware: 12.4.13 from original manufacturer –Parallel firmware: modified version of 12.4.13 Benchmarks –UDP bidirectional, unidirectional, and ping traffic –Web server (thttpd) and software router (Click) Testbed –PC machines with AMD Athlon 2600+ CPU and 2GB RAM –FreeBSD 4.7
18
18 Overall Improvements 157% improvement 65% improvement
19
19 Sources of Improvements 37% improvement due to two processors 70% improvement due to scratch pads
20
20 Comparison to ASIC NICs 3Com 710024 Tigon (1997) Intel PRO/1000 MT Intel (2002) Netgear GA622T Nat. Semi. (2001) 3Com 710024 Tigon (1997) UNIPROCESSOR Intel only 21% over Tigon
21
21 Impact on Web Server Throughput Overall 27-51% improvement Comparable to ASIC NICs
22
22 Parallelization Makes Programmability Viable Programmability useful for complex functions Limited clock speed for embedded processor –Limited uniprocessor performance Use multiple cores to improve performance Two core vs. single core –65% increase for maximum throughput –51% increase for web server throughput –Web server throughput comparable to ASIC NICs
23
23
24
24 UDP Send: Overall Improvements
25
25 UDP Send: Sources of Improvements
26
26 UDP Receive: Overall Improvements
27
27 UDP Receive: Sources of Improvements
28
28 UDP Ping: Overall Improvements
29
29 UDP Ping: Sources of Improvements
30
30 Impact on Routing Throughput
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.