Power is Leading Design Constraint Direct Impacts of Power Management – IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market.

Slides:



Advertisements
Similar presentations
What is Cloud Computing? Massive computing resources, deployed among virtual datacenters, dynamically allocated to specific users and tasks and accessed.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
The Role of Environmental Monitoring in the Green Economy Strategy K Nathan Hill March 2010.
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Research Directions for On-chip Network Microarchitectures Luca Carloni, Steve Keckler, Robert Mullins, Vijay Narayanan, Steve Reinhardt, Michael Taylor.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Chris Shenton1 DSP Technology Options 4 th SKADS Workshop, Lisbon, 2-3 October 2008 DSP Technology Options Matching The Technology Platform To The Instrument.
Darema Dr. Frederica Darema NSF Dynamic Data Driven Application Systems (Symbiotic Measurement&Simulation Systems) “A new paradigm for application simulations.
A Research Agenda for Accelerating Adoption of Emerging Technologies in Complex Edge-to-Enterprise Systems Jay Ramanathan Rajiv Ramnath Co-Directors,
Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Tufts Wireless Laboratory School Of Engineering Tufts University “Network QoS Management in Cyber-Physical Systems” Nicole Ng 9/16/20151 by Feng Xia, Longhua.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Priority Research Direction (use one slide for each) Key challenges -Fault understanding (RAS), modeling, prediction -Fault isolation/confinement + local.
Brussels, 1 June 2005 WP Strategic Objective Embedded Systems Tom Bo Clausen.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Cloud Computing Energy efficient cloud computing Keke Chen.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented.
Wireless Networks Breakout Session Summary September 21, 2012.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Challenges and Opportunities in Using Wood to Pay for Fuels Treatments Guy Robertson USDA Forest Service.
Challenges towards Elastic Power Management in Internet Data Center.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.
Back-end (foundation) Working group X-stack PI Kickoff Meeting Sept 19, 2012.
Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.
MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam
ASCAC-BERAC Joint Panel on Accelerating Progress Toward GTL Goals Some concerns that were expressed by ASCAC members.
WP Strategic Objective Networked Audio Visual Systems and Home Platforms.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Networked Embedded and Control Systems WP ICT Call 2 Objective ICT ICT National Contact Points Mercè Griera i Fisa Brussels, 23 May 2007.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Application Heartbeats Henry Hoffmann, Jonathan Eastep, Marco Santambrogio, Jason Miller, Anant Agarwal CSAIL Massachusetts Institute of Technology Cambridge,
March 2004 At A Glance NASA’s GSFC GMSEC architecture provides a scalable, extensible ground and flight system approach for future missions. Benefits Simplifies.
Design, Optimization, and Control for Multiscale Systems
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan.
Data Center & Large-Scale Systems (updated) Luis Ceze, Bill Feiereisen, Krishna Kant, Richard Murphy, Onur Mutlu, Anand Sivasubramanian, Christos Kozyrakis.
Programming Sensor Networks Andrew Chien CSE291 Spring 2003 May 6, 2003.
Programmability Hiroshi Nakashima Thomas Sterling.
Internet of Things. IoT Novel paradigm – Rapidly gaining ground in the wireless scenario Basic idea – Pervasive presence around us a variety of things.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Mark Gilbert Microsoft Corporation Services Taxonomy Building Block Services Attached Services Finished Services.
Tackling I/O Issues 1 David Race 16 March 2010.
Priority Research Direction (use one slide for each) Key challenges What will you do to address the challenges?Brief overview of the barriers and gaps.
© 2013 IBM Corporation 1 Title of presentation goes Elisa Martín Garijo IBM Distinguish Engineer and CTO for IBM Spain. Global Technology.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Design and Planning Tools John Grosh Lawrence Livermore National Laboratory April 2016.
Developing IoT endpoints with mbed Client
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
FET Plans FET - Proactive 1.
Power is Leading Design Constraint
Priority Research Direction (use one slide for each)
Presented By: Darlene Banta
Priority Research Direction (use one slide for each)
Self-Managed Systems: an Architectural Challenge
Presentation transcript:

Power is Leading Design Constraint Direct Impacts of Power Management – IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market growing 44%/year 2013, HPC cluster will be largest fraction of server mkt. – dramatic power reduction for HPC will have enormous impact on power and carbon footprint Indirect Impacts of Power Management – Makes construction of exascale machines feasible – Direct power towards useful work 99% of energy use is not targeted at useful work Thermals dictate design limits Enables higher bandwidth and higher computational rate if power up part-time More performance for application – Broader impact across IT sector energy reduction

Computing Energy Consumption

State of the Art Power down underutilized components – DVFS (SW/HW) to power down components you are underutilizing – Memory can also be put in low power modes when underutilized – MAID disks can be powered down incrementally to reduce power Explicitly manage data movement – SSDs for lower I/O power while maintaining performance – Offload work to accelerators when more effective – Management of data movement through memory hierarchy (logistics) Current approaches are narrowly focused and not scalable

Problems No Scalable System-Level approaches – Power management services derived from commodity market make only local decisions – Locally optimal decisions are not globally optimal – Non-scalable data aggregation or filtering for control systems decisions Lack of standards for power monitoring, control, policy description – Required for both vertical and horizontal integration Control loop for system-scale optimization is fundamentally broken – Lack of predictive models for response to control decisions – No common expression of policy or objective – No comprehensive monitoring or data aggregation No tool support for integration of power management into application codes (apps people have enough to worry about)

Research Agenda Power Performance monitoring & aggregation that scales to 1B+ core system Control system that spans system software stack that can disseminate control decisions across 1B+ cores Scalable control algorithms to bridge gap between global and local models – analytical power models of system response – empirical models based on advanced learning theory Optimally tune system based on control loop – Comprehensive instrumentation that connects to the control system – Need Declarative objective function specification for control system – Both online and offline tuning options based on advanced search pruning heuristics Effective power-aware and scalable resource control – Managing heterogeneous computing resources as OS level – Manage data movement and locality in memory hierarchy – Adaptable software to handle diversity of hardware features/designs Power instrumentation & control standardization – For coordination of international effort – For horizontal integration (e.g. so library components can interoperate effectively) – For vertical integration: (e.g. so that local DVFS coordinates with global system scheduling)

Cross-Cutting Research Agenda Resource Management: OS and system management services – Policy description (standardized) to do fine-grained management on chip – Standardized monitoring interfaces for energy & resource utilization (PAPI for energy) – Standardized models of HW power impact and algorithm performance to make logistical decisions (when/where to move computation + response to adaptations) Algorithms: base order of complexity on energy cost of operations rather than #flops – communication-avoiding algorithms (how much to trade-off FLOPS for communication before it doesn't work) – Enable libraries to be annotated for parameterized model of energy to articulate a policy to manage those trade-offs (different architectures) – Standardized approach to lightweight models to predict response to resource adjustment Libraries: how do you build energy efficiency models / management interfaces in SW libraries standardized (software engineering) – how do you make sure SCALAPACK libraries use policy & strategy description & controls that are compatible with FFTW Compilers: automagically instrument code for programmability – Automatically expose “knobs for control” and “sensors” for monitoring – How to automatically generate models to predict response to resource adaptation Applications: effective declarative annotations to convey application characteristics and requirements

What Happens If We Do Nothing? HPC system power will be unfeasibly large – 100+ Megawatts by DARPA Projections or Design trade offs to keep power under control will – narrow application scope – Reduce delivered performance

Metrics / Benefits Performance: Reduce power without having corresponding impact on performance Programmability: The applications people cannot be expected to manage power explicitly – Transparency requires support from compiler, libraries, and system Composability: SCALAPACK must be able to work with FFTW Minimize number of incompatible ad-hoc approaches Organize international effort Scalability: Must be able to use common infrastructure for OS, system level resource manager, and applications for unified strategy to meet objectives Useful to embedded, departmental AND Exascale systems

Priority Research Direction for Power/Energy (PE) Efficiency Cross-Cut Key challenges Power Performance monitoring & aggregation that scales to 1B+ core system Control system that can disseminate control decisions across 1B cores Scalable control algorithms to bridge gap between global and local models Optimally tune system based on control loop Power-aware and scalable resource control Power instrumentation & control standardization Power Efficiency: is leading design constraint, but optimization strategy is complex objective Scalability: chip, node, system level objectives Optimal control: requires accurate predictive models Integration: cannot make policy decision without integrated & cohesive control, prediction, and monitoring approaches Energy Efficiency: Apply power exactly where needed (reduces total power) Performance: With power constraint, apply power where it matters most for performance Programmability: achieve these objectives without huge additional effort from apps. Makes delivery of exascale system feasible Active Power management reduces design trade- offs that limit delivered application performance Broader impact across entire HPC/server industry Local optimizations can see impact in 2-4 years and comprehensive system level benefit in 5-10 years Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

4.4.2 Power Energy Efficiency Adaptation Baseline Energy Monitoring Interface Standards Factor of 1.5x OS-level/Node Level Energy Efficency Adaptation Factor of 2x Compatible Energy Aware Library And standardized interfaces Factor of 5x power reduction Automated Code Instrumentation (compilers and code-generators) Factor of 10x Automated system Level adaptation for Energy efficiency Power Reduction over Baseline

Extra

Research Problems Optimal Control: ( sensors and actuators ) – Need to define policy objectives more complex than just “reduce power” Describe trade-off space and express it to control system – Model to accurate predict effect of actuators on performance and power Need to be able to predict energy impact of any change Need standard method for expressing predictive model – Must have accurate, scalable and standardized interfaces to monitor response to model driven adaptation (predictor/corrector method) Dynamic Response – Explicit software control is not fast enough (need to define as policy) – Must have standardized approach for expressing policy – Need scalable approach to data reduction to enable fast policy decisions – Need scalable approaches for strategy optimization to achieve: Optimizing energy efficiency is itself daunting optimization problem Scaling: Commodity market will give us chip-level adaptation – handle fine-grained (chip level), node level, and system level policy – Requires standardization of interfaces to express policy, model and collect sensor data to enable unified response strategy to achieve objective

What are the Problems Scalability – Depth and Breadth (horizontal & vertical integration) – Diversity in scale and response time is nontrivial Optimality – Devices can only make local decisions – Optimal local decisions are not optimally for global system – Data assimilation to make global decisions requires software Responsiveness – Software cannot make decisions fast enough – Data assimilation for control decisions is huge problem – Optimal point of control is not easy to find