1 of 20 Smart-NICs: Power Proxying for Reduced Power Consumption in Network Edge Devices Karthikeyan Sabhanatarajan, Ann Gordon-Ross +, Mark Oden, Mukund Navada, Alan D. George + High Performance Computing and Simulation Research Laboratory Department of Electrical and Computer Engineering University of Florida, Gainesville This work was supported by the U.S. National Science Foundation + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing
2 of 20 2 Introduction INTERNET
3 of 20 3 Introduction Connected edge devices account for 2% of the total power consumed in the US [EPA- 06] –130 TWh/Year This is $1.3 $.10 per kWh 1 single-unit nuclear power plant outputs 8 TWh/Year Translates to 16 single-unit nuclear power plants! Why so much power? –PCs can consume up to 200 W –1 billion PCs worldwide by 2010 [Kanellos-04] What can we do? –PCs are idle 75% of the time [Purushothaman-06] –But only 10% of PCs are allowed to sleep during that time [EPA-06] –Sleeping reduces power consumption by 80% or more –If PCs were allowed to sleep, only 3 single-unit nuclear power plants would be required Question: Why aren’t these PCs asleep?!?!
4 of 20 4 Maintaining Network Connectivity INTERNET IDLE GNUTELLA FILE SHARING APPLICATION FILE QUERY PACKET FILE RESPONSE PACKET Bob Alice Alice checks to see if Bob has a file needed for p2p file sharing Z Z z z FILE QUERY PACKET Problem: PC must be awake to maintain network connectivity
5 of 20 5 A Solution – Power Proxying Primary challenge is to maintain network connectivity while the PC is power down to standby mode - sleeping Some packets do not require a complex response –Automated responses are sufficient –Network Interface Card (NIC) can act as proxy for the PC –Allow the PC to sleep while NIC services packets with automated responses –A technique known as power proxying –We call such a NIC a “Smart”-NIC - SNIC
6 of 20 6 Power Proxying INTERNET IDLE GNUTELLA FILE SHARING APPLICATION Alice Bob Z Z z z PC delegates power to the SNIC to handle to network traffic FILE QUERY PACKET FILE RESPONSE PACKET
7 of 20 7 Power Proxying INTERNET IDLE Proxiable Packet Response Z Z z z Chatter Packet Non-Proxiable/Wake up Packet SNIC Response Bob
8 of 20 8 What to Proxy? - Proxiable Protocols Proxiable protocols - Network protocols amenable to proxying –Responses may be automated –Keep alive packets, IP conflict avoidance, etc. Z Z z z IDLE FOUR Categories of Proxiable Packets ARP QUERY ARP RESPONSE PING PING RESPONSE P2P FILE QUERY P2P RESPONSE Mail Notification ARP (Address Resolution Protocol) ICMP (Internet Control Message Protocol) TCP (Transmission Control Protocol) UDP (User Datagram Protocol)
9 of 20 9 Response Power Proxying Operation z z z IDLE SNIC Packet Classifier Application Handler 1. PC decides to sleep 2. PC offloads power proxy rules to the SNIC 3. PC sleeps and SNIC proxy is activated Rules 4. Packet Arrives Rules source addr source port dest port ?= Match? No (not chatter) 7(a) Wake up PC 7(b) Discard No (chatter) Yes 7(c) Invoke app handler Payload Header 6. Rule checking 5. Header inspection Payload Header App ID 8. Determine response ? 9. Proxyied Response SW HW or SW? source addr source port dest port
10 of Packet Classifier Requirements PC-BASED CLASSIFIER ROUTER-BASED CLASSIFIER 3) Operates only during system inactivity 3) Continual operation 4) Process packets addressed only to a particular destination and Broad/MultiCast packets 4) Process packets to any destination 5) Limited processing resources - processors clocked in MHz 5) Processors clocked in GHz range 1) Must sustain link rates of 10/100/1000/10000 Mbps 1) Must sustain link rates of 10/100/1000/10000 Mbps 2) No packet loss allowed 6) Limited number of rules directly depend on number of proxiable applications running 6) Larger number rules with a wide complexity range 7) Packets match only one rule - rules are disjoint 7) Packets can match multiple rules
11 of Packet Classifier - SW vs. HW Software Classifier Hardware Classifier 1) Limited operating frequency between 66 MHz to 400 MHz 1) Custom hardware can be designed for the required frequency 2) Cannot meet the network throughput demands even for the fastest packet classification algorithms 2) Can easily meet the network throughput demands 3) High power even during idle period 3) Comparatively lower power
12 of Rules Header Processor Header Processor Incoming Packet (From MAC Core) Packet Class Application ID Source Port CAM Dest Port CAM Match Match Address Address Match ID MultiMatch Source Port Source IPDest Port Custom HW Packet Classification Source IP Address CAM Source Port Source Port CAM Source Port CAM Dest Port CAM Dest Port CAM Invokes application handler OR MultiMatch Source IP Address CAM Source IP Address CAM
13 of Packet Classifier Placement From PHY Packet Classifier Packet Descriptor FIFO Tx FIFO Rx FIFO MAC Core uP Application Handler Response No change to critical path
14 of Experimental Setup Software packet classifier –Implemented on RiceNIC platform using PowerPC405 RiceNIC is a programmable NIC –PowerPC clocked at 300 MHz and 100 MHz Hardware packet classifier –Xilinx IP cores to generate CAMs as block memory –Prototyped in Verilog HDL –System implemented and simulated using Xilinx ISE 9.1 and ModelSIM –Clocked at 1.25 MHz, 12.5 MHz, and 125 MHz corresponding to 10 Mbps, 100 Mbps, and 1000 Mbps –Power calculated using Xilinx XPower
15 of Results – Packet Classification Time Hardware classification outperforms software classification at 300 MHz and 100 MHz Worst-case packet classification time for each protocol class with 100 rules Increasing packet classification complexity
16 of Results – Classification Time Vs Rules As more applications are identified as proxyiable, rule set sizes will increase Thus scalability is important Software Logarithmic Constant Hardware
17 of Results – Packet Throughput Throughput is measured in Millions of Packets Per Second (MPPS) Minimum throughput for 1Gpbs Software cannot meet requirements! Hardware exceeds Gbps throughput
18 of Results – HW Speedup vs. SW 9x speedup to 2.5x speedup
19 of Results – Power Consumption SW classifier is 2.4x more power than HW –SW = mW and 441 mW for 100 MHz and 300 MHz respectively –HW = 180 mW for 100 rules. Link rate scalability –For SW to meet 1 Gpbs throughput Clocked at 500 MHz Require an additional 294 mW of power Resulting in 4x more power than HW
20 of Conclusions PCs consume a lot of power –Left powered on to maintain network connectivity Introduced power proxying –SNIC maintains network connectivity so PC can sleep –Can increase sleep time by 85% [Purushothamom-06] Low-power hardware-based packet classifier to enable power proxying –Exceeds Gigabit Ethernet throughput requirement –Up to 9x speedup in packet classification time over a software packet classifier –75% less power than a software packet classifier –Better scalability with respect to future rule set size and link rates than a software packet classifier