1 A Run-Time Feedback Based Energy Estimation Model for Embedded Systems Selim Gürün Chandra Krintz Department of Computer Science U.C. Santa Barbara International.

Slides:



Advertisements
Similar presentations
Zhou Peng, Zuo Decheng, Zhou Haiying Harbin Institute of Technology 1.
Advertisements

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.
NTPT: On the End-to-End Traffic Prediction in the On-Chip Networks Yoshi Shih-Chieh Huang 1, June 16, Department of Computer Science, National Tsing.
An Energy Consumption Framework for Distributed Java-Based Systems Chiyoung Seo Software Architecture Research Group University of Southern California.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
SimGate: Full-System, Cycle-Close Simulation of the Stargate Sensor Network Intermediate Node Ye Wen, Selim Gurun, Navraj Chohan, Chandra Krintz, Rich.
ThinkAir: Dynamic Resource Allocation and Parallel Execution in Cloud for Mobile Code Offloading Sokol Kosta, Pan Hui Deutsche Telekom Labs, Berlin, Germany.
Microcontroller: Introduction
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Component-Level Energy Consumption Estimation for Distributed Java-Based Software Systems Sam Malek George Mason University Chiyoung Seo Yahoo! Nenad Medvidovic.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
Computer Processing of Data
The Computer Systems By : Prabir Nandi Computer Instructor KV Lumding.
Tufts Wireless Laboratory School Of Engineering Tufts University “Network QoS Management in Cyber-Physical Systems” Nicole Ng 9/16/20151 by Feng Xia, Longhua.
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Low-Power Wireless Sensor Networks
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
André Seznec Caps Team IRISA/INRIA HAVEGE HArdware Volatile Entropy Gathering and Expansion Unpredictable random number generation at user level André.
An Intelligent and Adaptable Grid-Based Flood Monitoring and Warning System Phil Greenwood.
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
System Architecture Directions for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister Presented by Yang Zhao.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
Srihari Makineni & Ravi Iyer Communications Technology Lab
AutoDVS: An Automatic, General- Purpose, Dynamic Clock Scheduling System for Hand-Held Devices Selim Gurun Chandra Krintz Lab for Research on Adaptive.
Application Heartbeats Henry Hoffmann, Jonathan Eastep, Marco Santambrogio, Jason Miller, Anant Agarwal CSAIL Massachusetts Institute of Technology Cambridge,
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Power and Control in Networked Sensors E. Jason Riedy and Robert Szewczyk Presenter: Fayun Luo.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
JouleTrack - A Web Based Tool for Software Energy Profiling Amit Sinha and Anantha Chandrakasan Massachusetts Institute of Technology June 19, 2001.
E-MOS: Efficient Energy Management Policies in Operating Systems
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Best detection scheme achieves 100% hit detection with
1 of 14 Lab 2: Design-Space Exploration with MPARM.
The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.
1
Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.
Cache Memory and Performance
Jacob R. Lorch Microsoft Research
Memory COMPUTER ARCHITECTURE
Dynamo: A Runtime Codesign Environment
ECE354 Embedded Systems Introduction C Andras Moritz.
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Today’s agenda Hardware architecture and runtime system
Introduction to Computer Systems
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
COMP755 Advanced Operating Systems
Phase based adaptive Branch predictor: Seeing the forest for the trees
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

1 A Run-Time Feedback Based Energy Estimation Model for Embedded Systems Selim Gürün Chandra Krintz Department of Computer Science U.C. Santa Barbara International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS) Seoul, Korea October 22-25, 2006

CODES-ISSS’06 2 Power-Aware Execution: Big Picture Power-aware methods divide task execution into operations, and prepare an execution plan for each Operation: smallest user-visible unit of execution Typical operation: Rendering a scene, translating a sentence, calculating a shortest path in a map Need to know energy cost of each plan Knowing future energy cost of operations requires profiling them at run-time Identify Operations Profile at Runtime Predict Future Costs Develop Power-Aware Execution Strategy

CODES-ISSS’06 3 Outline Extant run-time power profiling techniques Power profiling methodologies for embedded computers Proposed model Overview Model construction Capturing system dynamics Evaluation Summary and Conclusion

CODES-ISSS’06 4 Run-Time Energy Profiling: Overview OS Interfaces like ACPI: + Provides simple API to battery voltage sensors + Ok for different hw. power levels - Very coarse - Not precise Execution Time: + Simple to measure + Fast and precise - Not correlated to power - Not suitable when hw. power levels change: DVS, sleep HPMs: + Fast access + Quite accurate - Architecture dependent - Not designed for power estimation --many events missing

CODES-ISSS’06 5 Run Time Energy Profiling: HPMs CPU counters provide unparalleled insight into program behavior Cache, TLB misses Instructions executed per cycle (IPC) How can we accurately gather program energy consumption by monitoring key parameters? Use HPMs as pseudo CPU component access counters: Energy Consumption = I Cache * a0 + D Cache * a1 + ALU * a2 +…

CODES-ISSS’06 6 Run Time Energy Profiling: HPMs CPU counters provide unparalleled insight into program behavior Cache, TLB misses Instructions executed per cycle (IPC) How can we accurately gather program energy consumption by monitoring key parameters? Useful but not enough: CPU consumes a portion of total energy; power-aware strategies need to know full picture. Fails when hardware changes its behavior: DVS, sleep states A different strategy needed! Use HPMs as pseudo CPU component access counters: Energy Consumption = I Cache * a0 + D Cache * a1 + ALU * a2 +…

CODES-ISSS’06 7 Proposed Energy Profiling Model Offline Analysis Continuous model improvement at run-time Fine-Grain Energy Estimation

CODES-ISSS’06 8 Case Study: Intel XScale on Stargate 32 bit XScale – 400MHz 64 MB RAM Runs Familiar Linux No Display Wireless Compact Flash XScale Major HPM Events Inst/Data cache misses Data dependency stalls Inst/Data TLB misses Brach mispredicted Instruction executed SCL

CODES-ISSS’06 9 Constructing Model Are there any correlations between HPM values and full system power consumption? Absolutely! --but some challenges exist. Good correlation in memory/CPU subsystem High IPC -> CPU intensive application High cache misses/hits -> memory intensive application But I/O is the problem! Some heuristics possible, e.g. Low memory activity and low IPC -> possible I/O wait state Better to use software counters embedded into drivers

CODES-ISSS’06 10 Model Coefficients E = X1 a1 + X2 a2 + X3 a3 +… XI : Independent Variables aI : Coefficients Estimate coefficients using least squares linear regression (LSQ) Stable and simple Linearity assumption Only MajorAll related LSQ Model: Which variables? Efficient, clear Easier to understand Less accurate More accurate Run-time overhead Modeling difficulties due to variable dependencies

CODES-ISSS’06 11 Parameter Selection & Dependencies Hard to include all variables: Too many parameters clutter model Parameter dependencies  unstable parameter estimations E.g. Volume = a0 + a1 * pounds + a2 * grams Work-around is non-trivial; HPM characteristics e.g.: TLB miss  more CPU cycles & cache miss Memory Stall  Fewer instruction executed Multicollinearity!

CODES-ISSS’06 12 Run-Time Energy Estimation ComputationCommunication Simple Core Clock Cycles Data Stalls Core Clock Cycles Bytes Transmitted Bytes Received Complex Core Clock Cycles Instruction Cache Misses Instructions NDelivered Data Stalls ITLB Misses DTLB Misses Core Clock Cycles Bytes Transmitted Bytes Received Packets Transmitted Packets Received

CODES-ISSS’06 13 Run-Time Model Improvement Global coefficients Compute using off-line model Continuously update coefficients Improve using most recent data Gradually phase out previous measurements Recursive least squares with exponential decay Smaller decay factor-> more agile Global Coefficients Measure Power Update with RLS Model Parameters: Decay factor Update period Measurement error

CODES-ISSS’06 14 Feedback Source: DS 2760 Measures current flow in and out of battery Internally: A small A/D converter attached to a high precision internal resistor Pros/Cons: + Highly Available e.g. iPAQs, sensor network gateways, cell phones - Not precise enough for monitoring task energy consumption 0.25 mAh error in each reading - Slow, one-wire serial interface

CODES-ISSS’06 15 Stargate and Our Evaluation Bench PowerTool VPerfmon VPMon SCL High-precision Data Acquisition Device Programmable Power Supply

CODES-ISSS’06 16 Methodology Collect energy consumption every so often Every 10 million instructions ( a so-called interval) Validate model accuracy on imprecise measurement data Inject uniformly distributed random error Evaluate various precision (error) levels: 1X – 8X Predict energy consumption of each interval Continuously improve model parameters every 10M * K intervals Use a large group of workload Computational benchmarks Computational + communication oriented benchmarks

CODES-ISSS’06 17 Static vs. Adaptive Models

CODES-ISSS’06 18 Average Error Rates Interval Size 1X2X4X8XBest %29.9%14.5%16.9%3.8% %24.1%12.9%7.3%2.7% %22.0%9.1%7.7%2.8% Error rates and Interval sizes –Simple Model Measurement Precision

CODES-ISSS’06 19 Average Error Rates-Complex Model Interval Size 8X Best %28.1%3.8%4.3% %33.3%2.7%3.8% %24.0%2.8%4.1% Measurement imprecision reduce complex model quality more than the simple one! Simple Model

CODES-ISSS’06 20 Related Work High-End CPU Power Models Define CPU component access rate using HPM access heuristics OS calls power consumption as a function of IPC Embedded CPU Power Models Five HPM counters for XScale Also evaluated memory model Memory models UltraSparc memory subsystem All above are static models Power profiling setups Powerscope

CODES-ISSS’06 21 Summary & Conclusions Our Goal: An accurate, efficient run-time power profiling system Hardware counters are key Define software counters for I/O Smart battery monitors expose dynamics in power behavior We propose a hybrid system that combine both Lessons learned Dynamic models are much better than static ones in power modeling Models should decay old measurements conservatively when measurement errors are present Measurement errors in the presence of multicollinearity can be deadly

CODES-ISSS’06 22 Backup Slides

CODES-ISSS’06 23 Decay Factor vs. Accuracy

CODES-ISSS’06 24 Execution Cost

CODES-ISSS’06 25 Benefit from an Offline Profiler

CODES-ISSS’06 26 Power-Aware Execution: Case Study Speech Recognition Execution Plans src: Flinn’01 Baseline Local Reduced Remote Reduced Requires run-time power prediction of different execution plans!