Guangdeng Liao, Xia Zhu, Steen Larsen, Laxmi Bhuyan, Ram Huggahalli University of California, Riverside Intel Labs.

Slides:



Advertisements
Similar presentations
1 May 19th, 2009 Announcement. 2 Drivers for Web Application Delivery Web traffic continues to increase More processing power at data aggregation points.
Advertisements

Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011.
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Performance Characterization of the Tile Architecture Précis Presentation Dr. Matthew Clark, Dr. Eric Grobelny, Andrew White Honeywell Defense & Space,
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Architectural Considerations for CPU and Network Interface Integration C. D. Cranor; R. Gopalakrishnan; P. Z. Onufryk IEEE Micro Volume: 201, Jan.-Feb.
1 K. Salah Module 4.0: Network Components Repeater Hub NIC Bridges Switches Routers VLANs.
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
Chapter 7 Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
ECE 526 – Network Processing Systems Design
1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
I/O Acceleration in Server Architectures
Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Chipset Introduction The chipset is commonly used to refer to a set of specialized chips on a computer's motherboard or.
Exercise 2 The Motherboard
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Accelerating Simulation of Agent-Based Models on Heterogeneous Architectures.
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Last Time Performance Analysis It’s all relative
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
The NE010 iWARP Adapter Gary Montry Senior Scientist
I/O management is a major component of operating system design and operation Important aspect of computer operation I/O devices vary greatly Various methods.
System Architecture Directions for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister Presented by Yang Zhao.
Vic Liu Lingli Deng Dapeng Liu China Mobile Speaker: Vic Liu China Mobile Gap Analysis on Virtualized Network Test draft-liu-dclc-gap-virtual-test-00.
Srihari Makineni & Ravi Iyer Communications Technology Lab
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
Data Logging Solution for Digital Signal Processors Brian Newberry Nekton Research, Inc. James M. Conrad University of North.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
ND The research group on Networks & Distributed systems.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
PentiumPro 450GX Chipset Synthesis Steen Larsen Presentation 1 for ECE572 Nov
Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking.
PC Internal Components Lesson 4.  Intel is perhaps the most recognizable microprocessor manufacturer. List some others.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
IT3002 Computer Architecture
JouleTrack - A Web Based Tool for Software Energy Profiling Amit Sinha and Anantha Chandrakasan Massachusetts Institute of Technology June 19, 2001.
$ 1000 COMPUTER. EVGA 132-CK-NF79-A1 NVIDIA nForce 790i Ultra SLI Socket 775 ATX MB w/RAID, 3-Way SLI, DDR3 & Core 2 Extreme Support Supports up to 8.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice ProLiant G5 to G6 Processor Positioning.
Hardware Architecture
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Intra-Socket and Inter-Socket Communication in Multi-core Systems Roshan N.P S7 CSB Roll no:29.
Computer Organization and Architecture + Networks Lecture 6 Input/Output.
NFV Compute Acceleration APIs and Evaluation
The Multikernel: A New OS Architecture for Scalable Multicore Systems
System On Chip.
Green Software Engineering Prof
CS 286 Computer Organization and Architecture
Some challenges in heterogeneous multi-core systems
Hui Chen, Shinan Wang and Weisong Shi Wayne State University
Directory-based Protocol
Presentation transcript:

Guangdeng Liao, Xia Zhu, Steen Larsen, Laxmi Bhuyan, Ram Huggahalli University of California, Riverside Intel Labs

 Motivation  Network Processing  Experiment Setup  Power Studies ◦ Intel Nehalem Server ◦ Niagara 2 Server  System Architecture Implications

 Network speed increases at a rate of 10X and rapidly transits to 10Gbps and beyond.  Existing studies on network processing focused on performance.  Power becomes increasingly important due to environmental and economic concerns.  Detailed power studies of network processing over high speed networks guide us to design a more power-efficient platform.

 Unlike traditional CPU/Memory-intensive apps, network processing involves various platform components.

 Focused on mainstream servers: Intel Nehalem servers interconnected with 10GbE.  Used Iperf to generate network traffic.  Used Data Acquisition System (DAQ) to measure power consumption on individual hardware components.  Studied power benefits of integrating 10GbE NICs into CPUs by using a Niagara 2 server.

 Sensing resistors are added to +12V, +5V, +3.3V power supply, as well as each DIMM and the 10GbE PCI-E NIC card. AC/DC power supply 120V AC CPU power and regulation Memory DIMM power PCIe NIC power Current + Voltage sensing data acquisition

 NIC Idling Power: ~9 Watts.  ~25 Watts and ~17 Watts are dissipated for small and large I/O sizes, respectively. CPU is the major power consumer, followed by memory.

 Small packets have very low power-efficiency due to inefficiency of processing small packets on current systems. CPU utilization breakdown shows the overhead is mainly from OS kernel.

 ~22 Watts and ~18 Watts are dissipated by small and large I/O sizes. CPU is the major consumer, followed by memory.

 Small packets have very low power-efficiency. Unlike the receive side, CPU utilization breakdown shows the overhead is mainly from SoftIRQ.

 Sun Niagara 2 integrates two 10GbE NICs into CPU die.  Conducted experiments on an integrated NIC (INIC) and a discrete NIC (DNIC), and compared power efficiency of network processing.

 DNIC has ~20 watts idling power with dual port, INIC has ~17 watts with dual port. Saving is from PCI-E interface.  INIC has better power efficiency than DNIC, mainly due to less CPU cycles consumed for processing received packets.

 INIC has similar power efficiency to DNIC.

 High speed NICs have high idling power and are not energy proportional at all.  Small packets have low power-efficiency. Optimizations lie in OS kernel such as buffer management, context switches etc.  CPU is the major contributor of power consumption of network processing, followed by memory.  Integrating NICs has a little bit better power efficiency.

 Motivation  Network Processing  Experiment Setup  Power Studies ◦ Intel Nehalem Server ◦ Niagara 2 Server  System Architecture Implications

 Reducing CPU power cost ◦ Small in-order cores have better power efficiency for network processing. ◦ Heterogeneous CPUs incorporating small cores for network processing. core Core Cache/Interconnect core…. atom NHM

 Reducing memory power cost ◦ In the receive side, there are two memory access for each packet. ◦ Solution: packets are delivered to caches and a new instruction is added to invalidate packets in caches.

 Reducing memory power cost ◦ In the transmit side, there are packet write-backs. ◦ Solution: transmitted packets are fed from caches and invalidated after data transfer.

 Reducing NIC power cost (Idling power) ◦ Integrate NICs into CPU to avoid PCI-E power consumption. ◦ Apply rate-adaption scheme into NICs to save power with low traffic rates.

Q & A