Presentation is loading. Please wait.

Presentation is loading. Please wait.

The SEEC Computational Model Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011.

Similar presentations


Presentation on theme: "The SEEC Computational Model Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011."— Presentation transcript:

1 The SEEC Computational Model Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011

2 In the beginning…* 2 *The beginning, in this case, refers to the beginning of my career (1999) Performance Application programmers had one goal:

3 But Modern Systems Have Increased the Burden on Application Programmers 3 Performance Power Quality Resiliency Many additional constraints Even worse, constraints can change dynamically E.g. power cap, workload fluctuation, core failure Even worse, constraints can change dynamically E.g. power cap, workload fluctuation, core failure beat/s LoHi Power

4 4 Example: Developing a Video Encoder for Power and Performance Video Encoder LoHi Power Meter Allocate resources for best case Encoder must drop frames to keep up The power is low Video Encoder LoHi Power Meter Allocate resources for worst case Encoder Exceeds Goals The power is high, resources wasted Application developers must solve constrained optimization problems in dynamic environments

5 Most Execution Models Designed for Performance 5 Coherent Shared Memory Message Passing Communication Concurrency Coordination Multi-threadedMulti-process Through Memory Through Network LocksMessages Control Procedural Global Shared Cache Local Cache RF data store Local Cache RF data load Core 0Core 1 data Local Memory RF data store Local Memory RF Network Interface data load Core 0Core 1 Network Interface Network Procedural control insufficient to meet the needs of modern systems

6 6 Angstrom Replaces Procedural Control with Self-Aware Control Procedural ControlSelf-Aware Control Decide Act Run in open loop Assumptions made at design time Based on guesses about future Decide Act Observe Run in closed loop Understand user goals Monitor the environment Application optimized for system No flexibility to adapt to changes System optimizes for application Flexibly adapt behavior The self-aware model allows the system to solve constrained optimization problems dynamically The self-aware model allows the system to solve constrained optimization problems dynamically

7 7 Outline Introduction/Motivating Example The SEEC Model and Implementation Experimental Validation Conclusions 7

8 8 Angstrom’s SElf-awarE Computing (SEEC) Model Goal: Reduce programmer burden by continuously optimizing online Key Features: 1.Applications explicitly state goals and progress 2.System software and hardware state available actions 3.The SEEC runtime system dynamically selects actions to maintain goals Application Developer Systems Developer ObserveActDecide API SPI SEEC Runtime A unified decision engine adapts applications, system software, and hardware

9 9 Example Self-Aware System Built from SEEC Video Encoder Actuators Goals: 30 beat/s, Minimize Power 20 b/s30 b/s33 b/s AlgorithmCores FrequencyBandwidth Control and Learning System Observe DecideAct

10 10 Roles in the SEEC Model Application Developer Systems Developer SEEC Runtime System Express application goals and progress (e.g. frames/ second) Read goals and performance Determine how to adapt (e.g. How much to speed up the application) Provide a set of actions and a callback function (e.g. allocation of cores to process) Initiate actions based on results of decision phase Observe Decide Act

11 11 Registering Application Goals Performance –Goals: target heart rate and/or latency between tagged heartbeats –Progress: issue heartbeats at important intervals Quality –Goals: distortion (distance from application defined nominal value) –Progress: distortion over last heartbeat Power –Goals: target heart rate / Watt and/or target energy between tagged heartbeats –Progress: Power/energy over last heartbeat interval Observe Application LoHi Power SEEC Decision Engine SEEC Decision Engine Performance Power Quality Research to date focuses on meeting performance while minimizing power/maximizing quality

12 Actuators 12 Registering System Actions Each action has the following attributes: Estimated Speedup –Predicted benefit of taking an action Cost –Predicted downside of taking an action (increased power, lowered quality) Callback –A function that takes an id an implements the associated action Act SEEC Decision Engine SEEC Decision Engine Estimated Speedup Cost Callback AlgorithmCores FrequencyBandwidth

13 SEEC Decision Engine SEEC Decision Engine 13 Picking Actions to Meet Goals Decisions are made to select actions given observations: Read application goals and heartbeats Determine speedup with control system Translate speedup into one or more actions Decide Application System Services System Services LoLo HiHi Power The control system provides predictable and analyzable behavior Controller (Decide) Controller (Decide) Actuator (Act) Actuator (Act) Application (Observe) Application (Observe) - - Performance Goal Current Performance

14 14 Optimizing Resource Allocation with SEEC SEEC can observe, decide and act How does this enable optimal resource allocation? Let’s implement the video encoder example from the introduction

15 15 Performance/Watt Adaptation in Video Encoding Performance goal 6810121416 Time (s) Power (W) PerformancePower

16 16 System Models in SEEC 16 SEEC’s control system takes actions based on models (of speedup and cost per action) associated with actions What if the models are inaccurate? SEEC’s control system takes actions based on models (of speedup and cost per action) associated with actions What if the models are inaccurate?

17 SEEC Decision Engine 17 Updating Models in SEEC After every action, SEEC updates application and system models Kalman filters used to estimate true state of application and system –We are exploring a number of different alternatives here Decide SEEC combines predictability of control systems with adaptability of learning systems Controller (Decide) Controller (Decide) Actuator (Act) Actuator (Act) Application (Observe) Application (Observe) - - Performance Goal Application Model (Decide) Application Model (Decide) System Model (Decide) System Model (Decide) Current Performance Adaptation Level 2 Adaptation Level 1 Adaptation Level 0 System Services System Services Application LoLo HiHi Power

18 18 SEEC Online Learning of Speedup Model for Application with Local Minima

19 SEEC Decision Engine 19 Handling Multiple Applications 19 Decide Control actions computed separately for each application For finite resources, several alternatives: Priorities (for real-time, admission control) Weighting/Centroid method (for overall system throughput) Controller (Decide) Controller (Decide) Actuator (Act) Actuator (Act) Application (Observe) Application (Observe) - - Performance Goal Application Model (Decide) Application Model (Decide) System Model (Decide) System Model (Decide) Current Performance Adaptation Level 2 Adaptation Level 1 Adaptation Level 0 Application LoLo HiHi Power Application LoLo HiHi Power Application LoLo HiHi Power Application LoLo HiHi Power System Services System Services

20 20 Outline Introduction/Motivating Example The SEEC Model and Implementation Experimental Validation Conclusions 20

21 21 Systems Built with SEEC 21 SystemActionsTradeoffBenchmarks Dynamic Loop Perforation Skip some loop iterationsPerformance vs. Quality 7/13 PARSECs Dynamic KnobsMake static parameters dynamic Performance vs. Quality bodytrack, swaptions, x264, SWISH++ Core SchedulerAssign N cores to application Compute vs. Power11/13 PARSECs Clock ScalerChange processor speedCompute vs. Power11/13 PARSECs Bandwidth AllocatorAssign memory controllers to application Memory vs. PowerSTREAM (doesn’t make a difference for PARSEC) Power ManagerCombination of the three above Performance vs. Power PARSEC, STREAM, simple test apps (mergesort, binary search) Learned ModelsPower Manager with speedup and cost learned online Performance vs. Power PARSECs Multi-App ControlPower Manager with multiple applications Performance vs. Power and Quality for multiple applications Combinations of PARSECs

22 22 Systems Built with SEEC SystemActionsTradeoffBenchmarks Dynamic Loop Perforation Skip some loop iterationsPerformance vs. Quality 7/13 PARSECs Dynamic KnobsMake static parameters dynamic Performance vs. Quality bodytrack, swaptions, x264, SWISH++ Core SchedulerAssign N cores to application Compute vs. Power11/13 PARSECs Clock ScalerChange processor speedCompute vs. Power11/13 PARSECs Bandwidth AllocatorAssign memory controllers to application Memory vs. PowerSTREAM (doesn’t make a difference for PARSEC) Power ManagerCombination of the three above Performance vs. Power PARSEC, STREAM, extra test apps (mergesort, binary search) Learned ModelsPower Manager with speedup and cost learned online Performance vs. Power PARSECs Multi-App ControlPower Manager with multiple applications Performance vs. Power and Quality for multiple applications Combinations of PARSECs 22

23 23 Dynamic Knobs: Creating Adaptive Applications 23 Application Goals System Actions Experiment Turn static command line parameters into dynamic structure Detail in Hoffmann et al. “Dynamic Knobs for Power Aware Computing” ASPLOS 2011 Maintain performance and minimize quality loss Adjust memory locations to change application settings Benchmarks: bodytrack, swaptions, SWISH++, x264 Maintain performance when clock speed changes

24 24 Results Enabling Dynamic Applications bodytrack Clock drops 2.4-1.6GHz w/o SEEC perf. drops w/ SEEC perf. recovers Clock rises 1.6-2.4 GHz SEEC returns quality to baseline

25 25 swaptions SWISH++ x264 Dynamic knobs automatically enable dynamic response for a range of applications using a single mechanism Maintains performance despite noise Perfect behavior Maintains baseline performance Results Enabling Dynamic Applications

26 26 Optimizing Performance per Watt for Video Encoding 26 Application Goals System Actions Experiment Adapt system behavior to needs of individual inputs Maintain 30 frame/s while minimizing power Change cores, clock speed, and memory bandwidth Benchmark: x264 w/ 16 different 1080p inputs Compare performance/Watt w/ SEEC to best static allocation of resources for each input

27 Average Performance per Watt for Video Encode 27

28 Performance per Watt for All Inputs 28

29 29 Learning Models Online 29 Application Goals System Actions Experiment Adapt system behavior when initial models are wrong Four targets: Min, Median, Max, Max+ Minimize power consumption for each target Change cores, clock speed, and mem. bandwidth Initial models are incredibly optimistic (Assume linear speedup with any resource increase) Benchmark: STREAM Converge to within 5% of target performance, measure power and convergence time

30 Complex Response to Resources 30 Adding Memory Bandwidth Can Slow Down Adding Cores Can Slow the App Down

31 Power Savings 31

32 Convergence Time 32

33 33 Managing Application and System Resources Concurrently 33 Application Goals System Actions Experiment Manage multiple applications when clock frequency changes bodytrack: maintain performance, minimize power x264: maintain performance, minimize quality loss Change core allocation to both applications Change x264’s algorithms Maintain performance of both applications when clock frequency changes

34 34 Results SEEC Management of Multiple Applications bodytrackx264 34 Clock drops 2.4-1.6GHz w/o SEEC app misses goals SEEC allocates cores to bodytrack w/o SEEC app exceeds goals SEEC removes cores from x264 SEEC adjusts algorithm to meet goals

35 Summary of Case Studies ExperimentDemonstrated Benefit of SEEC Dynamic KnobsMaintains application performance in the face of loss of compute resources Performance/WattOut-performs oracle for static allocation of resources by adapting to fluctuations in input data Performance/Watt with learning Learns models online starting from initial model which is ~10x off true values Multi-App controlMaintains performance of multiple apps by managing algorithm and system resources to adapt to loss of compute resources 35

36 36 Outline Introduction/Motivating Example The SEEC Framework Experimental Validation Conclusions 36

37 Angstrom’s Hardware Support for Self-Awareness “If you want to change something, measure it.” We need sensors to measure everything we want to manage: –Performance (there is a rich body of work in this area) –Power –Quality –Resiliency –… 37

38 Need Performance-level Support for Power Analysis 1.gprof -> pprof? 38 index % time self children called name Power?? Energy??? ----------------------------------------------- 0.00 0.05 1/1 main [2] [1] 100.0 0.00 0.05 1 report [3] 0.00 0.03 8/8 timelocal [6] 0.00 0.01 1/1 print [9] 0.00 0.01 9/9 fgets [12] 0.00 0.00 12/34 strncmp [40] 0.00 0.00 8/8 lookup [20] 0.00 0.00 1/1 fopen [21] 0.00 0.00 8/8 chewtime [24] 0.00 0.00 8/16 skipspace [44] ----------------------------------------------- [2] 59.8 0.01 0.02 8+472 [4] 0.01 0.02 244+260 offtime [7] 0.00 0.00 236+1 tzset [26] -----------------------------------------------

39 Angstrom’s Hardware Support for SEEC Low-power core for SEEC runtime system 39

40 40 Angstrom’s System Software and Hardware Support for SEEC Actuators AlgorithmCores FrequencyMemory Bandwidth Cache Registers Network Bandwidth ALU Width Enable tradeoffs wherever possible Support fine-grain allocation of resources TLB Size …

41 41 Conclusions SEEC is designed to help ease programmer burden –Solves resource allocation problems –Adapts to fluctuations in environment SEEC has three distinguishing features –Incorporates goals and feedback directly from the application –Allows independent specification of adaptation –Uses an adaptive second order control system to manage adaptation Demonstrated the benefits of SEEC in several experiments –SEEC can optimize performance per Watt for video encoding –SEEC can adapt algorithms and resource allocation to meet goals in the face of power caps or other changes in environment SEEC navigates tradeoff spaces –So build more tradeoffs into apps, system software, system hardware, etc. 41

42 42 SEEC References Application Heartbeats Interface: –Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats: A Generic Interface for Specifying Program Performance and Goals in Autonomous Computing Environments. ICAC 2010 Controlling Heartbeat-enabled Systems: –Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Controlling software applications within the Heartbeats framework. CDC 2010 –Maggio, Hoffmann, Agarwal, Leva. Control theoretic allocation of CPU resources. FeBID 2011 Creating Adaptive Applications: –Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209- 042. 2009 –Hoffmann, Sidiroglou, Carbin, Misailovic, Agarwal, Rinard. Dynamic Knobs for Power-Aware Computing. ASPLOS 2011. The SEEC Framework: –Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware Management of Multicore Resources. MIT-CSAIL-TR-2011-016 2011. –Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware Computing. MIT-CSAIL-TR-2010-049 2010. 42


Download ppt "The SEEC Computational Model Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011."

Similar presentations


Ads by Google