Hardware/Software Integration in Portable Systems Trevor Pering University of California Berkeley.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Subthreshold SRAM Designs for Cryptography Security Computations Adnan Gutub The Second International Conference on Software Engineering and Computer Systems.
A Framework for Dynamic Energy Efficiency and Temperature Management (DEETM) Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas University of Illinois.
L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동
Chapter 13 Embedded Systems
VADA Lab.SungKyunKwan Univ. 1 Dynamic Voltage Scaling.
High-level System Modeling and Power Management Techniques Jinfeng Liu Dept. of ECE, UC Irvine Sep
Chia-Yen Hsieh Laboratory for Reliable Computing Microarchitecture-Level Power Management Iyer, A. Marculescu, D., Member, IEEE IEEE Transaction on VLSI.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Performance and Energy Bounds for Multimedia Applications on Dual-processor Power-aware SoC Platforms Weng-Fai WONG 黄荣辉 Dept. of Computer Science National.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Figure 1.1 Interaction between applications and the operating system.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
VADA Lab.SungKyunKwan Univ. 1 Lower Power Voltage Scaling 성균관대학교 조 준 동
CprE 458/558: Real-Time Systems
Processor Frequency Setting for Energy Minimization of Streaming Multimedia Application by A. Acquaviva, L. Benini, and B. Riccò, in Proc. 9th Internation.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Spring 2000, 4/27/00 Power evaluation of SmartDust remote sensors CS 252 Project Presentation Robert Szewczyk Andras Ferencz.
Computer System Architectures Computer System Software
1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.
Low-Power Wireless Sensor Networks
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
Management for IP-based Applications Mike Fisher BTexaCT Research
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Performance Characterization and Architecture Exploration of PicoRadio Data Link Layer Mei Xu and Rahul Shah EE249 Project Fall 2001 Mentor: Roberto Passerone.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
ATtiny23131 A SEMINAR ON AVR MICROCONTROLLER ATtiny2313.
BridgePoint Integration John Wolfe / Robert Day Accelerated Technology.
Operating System Requirements for Embedded Systems Rabi Mahapatra.
University of Toronto at Scarborough © Kersti Wain-Bantin CSCC40 system architecture 1 after designing to meet functional requirements, design the system.
Power and Control in Networked Sensors E. Jason Riedy and Robert Szewczyk Presenter: Fayun Luo.
A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
Runtime Software Power Estimation and Minimization Tao Li.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
Full and Para Virtualization
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
JouleTrack - A Web Based Tool for Software Energy Profiling Amit Sinha and Anantha Chandrakasan Massachusetts Institute of Technology June 19, 2001.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
System Architecture Directions for Networked Sensors.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Embedded Systems. What is Embedded Systems?  Embedded reflects the facts that they are an integral.
INTRODUCTION TO WIRELESS SENSOR NETWORKS
Jacob R. Lorch Microsoft Research
Andrea Acquaviva, Luca Benini, Bruno Riccò
Dynamic Voltage Scaling
A High Performance SoC: PkunityTM
Computer Evolution and Performance
Presentation transcript:

Hardware/Software Integration in Portable Systems Trevor Pering University of California Berkeley

Outline ¶Background: The InfoPad Project ·Energy Efficient Microprocessors ¸System Design Environment This talk describes several research projects over the last six years that have relied heavily on integrated hardware/software design.

InfoPad Overview Perform all computation in the network to minimize client energy dissipation Centralized Application Compute Server Wireless Basestation Internet Database InfoPad Workstation capabilities on a portable device! High-bandwidth radio connection

InfoPad Software Architecture Communicate through centralized server to provide transparent ‘wired’ semantics Speech Recognizer “PadServer” Wireless Basestation InfoPad Maintain state in the network, not on the Pad Transmit audio and raw bitmaps across the wireless link Web Browser Internet Example: Hand-held speech-enabled web-browser

InfoPad Hardware Flexibility Use hardware/software integration to provide energy-efficient high-level functionality Only header sent to microprocessor 10 MIPS μProcessor Control Statistics Reliability Debugging Entire packet routed to dedicated hardware RX Packet Packet Header Frame- buffer update Embedded software responsible for high-level functions Main data-flow handled by custom low-power ASICs Radio Frame Buffer

InfoPad Evolution Total Power: ~7 W High-level system design optimizes complete solution and drives new research Where did the power go? No local computation? Commercial radios Commercial DC/DC Inefficient implementation Intercom Energy- Efficient Processors InfoPad

Outline ¶The InfoPad Project: Energy-efficient integrated system design ·Energy Efficient Microprocessors: Dynamic Voltage Scaling ¸System Design Environment

Trade-off energy and speed through voltage to minimize energy consumed Dynamic Voltage Scaling (DVS) E  V 2 f max  (V-c)/V E  f max Energy ~ Work Speed

DVS vs. Fixed-Voltage Reduce both speed and voltage to minimize both power and energy 10x energy savings DVS: Voltage:3x Speed:10x Energy:10x Power:100x

DVS Project Charter Design microprocessor system to support low-power devices I/O operations independent of processor architecture SRAM lpARM I/O Dynamic Voltage Regulator Scale voltage of entire microprocessor system! lpARM Intercom General-purpose software controls system voltage

DVS Scheduling Framework Use real-time framework to constrain task voltage scheduling µProc. Speed Time StartDeadlineStartDeadline Idle time represents wasted energy Lower speed, Lower voltage, Lower energy Energy ~ Work Speed Work

DVS Scheduling Schedule all tasks so as to minimize system energy dissipation Similar to minimizing  x i 2 with constant  x i µProc. Speed Time S1S1 S2S2 S3S3 D2D2 D3D3 D1D1 W1W1 W2W2 W3W3 W1W1 Task runs faster to meet timing constraints

DVS Simulation Simulate run-time scheduler to fully understand voltage-scaling behavior Speed Time S1S1 S2S2 S3S3 D1D1 D3D3 D2D2 Task Variance Weather Interrupts User Input Cache Behavior Scheduling Overhead Intercom RealityTheory Implementation

Simulation Benchmarks Model accurate I/O interaction to evaluate effects of voltage scaling Audio Decryption Graphical UI MPEG Decode Run-Time Support Audio Decryption Graphical UI MPEG Decode Run-Time Support Intercom SPEC

Simulation Infrastructure Develop support environment to model complete software system GUI Run-time Scheduler Voltage Scheduler Application support libraries MPEG  Priority 80 GUI  Priority 23 MPEG  Priority 80 GUI  Priority 23 Speed  Priority { Frame_Start(deadline); Decode_MPEG_Frame(); Frame_Finish(); } { Frame_Start(deadline); Decode_MPEG_Frame(); Frame_Finish(); } Windowing Cryptography I/O Support lpARM MPEG

Simulation Run-Time Algorithm Relax scheduling constraints to schedule efficiently in real-time µProc. Speed Time S1S1 S2S2 S3S3 D2D2 D3D3 D1D1 W2W2 W3W3 W1W1 Present time Schedule all tasks as if they were currently runnable: O(n log n) Speed = Work / Time Execute W 1 because W 2 is not yet runnable O(n 3 )

Run-Time Scheduling Dynamics Periodically re-evaluate schedule to adjust for unforeseen events µProc. Speed Time Thread accomplishing more than expected, reduce speed Deadline exceeded, increase speed Higher-priority task Run faster to make up lost time Initial speed estimate Optimal schedule E(work) Workload calculated to be average of previous frames

Run-Time Execution Trace Simulate the entire system to measure overhead and effectiveness System Idle Voltage Scheduler MPEG Decoder Interrupt Handler Time Frame Deadlines μProcessor Speed Scheduling Overhead < 3%

Results: Run-Time Voltage Scaling Dynamic Voltage Scaling significantly reduces energy dissipation! Normalized to 3.3V fixed-voltage processor Combination of independent benchmarks Includes 10% DVS implementation overhead

Run-Time Performance Analysis Application characteristics strongly affect voltage scaling performance AudioMPEGGUI Software can automatically recognize and adjust for bi-modal GUI distribution 0 2x deadline Normalized to deadline at max processor speed

Beyond Dynamic Voltage Scaling Voltage scheduling framework can be applied to many different designs and technologies Speed Time S1S1 S2S2 S3S3 D1D1 D3D3 D2D2 Intercom DSP CPU mem Disk * + lpARM

Outline ¶The InfoPad Project: Energy-efficient integrated system design ·Dynamic Voltage Scaling: Software control to minimize energy ¸System Design Environment: Top-Down Microprocessor Design

The lpARM Project Combine diverse backgrounds to develop an energy-efficient microprocessor 0.6  m DVS ARM8 processor with 16 kB on-chip cache Speed: MHz Voltage: V Energy: nJ/cycle Power: mW Control & Software Processor Design Dynamic Voltage Regulator Trevor Pering Tom Burd Tony Stratakos Processor validation & optimization Silicon expected May 1999 SRAM I/O Dynamic Voltage Regulator lpARM

lpARM Top-Down Design Use top-down design flow to optimize and verify design Cycle-level Instruction Simulation VHDL/Layout Hardware Simulation ANSI C Functional Simulation =?=? Intercom Functional Specification lpARM Iterative design

lpARM Feature Specification Simulate high-level system to discover desired implementation features Energy-saving processor features: Dynamic speed control Execution cycle counter Low-power sleep mode Interrupt speed control … Functional Specification Scale voltage to minimize energy System Simulation

lpARM End-to-End Verification Compare inter-simulation results to verify end-to-end design Frame 1 Chk: 0x2dbf92c2 Frame 2 Chk: 0x32fe4cda Frame 3 Chk: 0x3aa0d4ac Frame 4 Chk: 0x93efa7c8 Frame 5 Chk: 0x28f4efa9 Frame 1 Chk: 0x2dbf92c2 Frame 2 Chk: 0x32fe4cda Frame 3 Chk: 0x3aa0d4ac Frame 4 Chk: 0x93efa7c8 Frame 5 Chk: 0x28f4efa9 Application-level frame checksum VHDL Simulation Functional Simulation Instruction Simulation Transistor Simulation Memory hierarchy coherency Strict cycle-level comparison lpARM SRAM =?=? Functional Specification lpARM

lpARM Application Evaluation Evaluate target applications to accurately represent system behavior Direct-mapped cache is very application sensitive Intra-group normalized to 32-CAM ‘DVS energy’ includes system performance

lpARM System-Level Optimization Evaluate the complete system early-on to direct architectural design Other parameters analyzed: Write-back/Write-through Allocation policy Write-buffer size Associativity

lpARM Design Summary Simulating top-down hardware/software design improves end result Scale voltage to minimize energy Intercom Control & Software Processor Design Voltage Regulator Hardware and software components combine to form a system solution Top-down Speed Time S1S1 S2S2 S3S3 D1D1 D3D3 D2D2 lpARM

Conclusion ¶The InfoPad Project Energy-efficient integrated system design ·Dynamic Voltage Scaling Software control to minimize energy ¸Top-Down Microprocessor Design Application-driven energy optimization Effective energy-efficient systems require complete top-to-bottom integrated design