Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Virtual Memory: Page Replacement
Advertisements

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
1 BUFFERING APPROACH FOR ENERGY SAVING IN VIDEO SENSORS Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of Illinois at Urbana-Champaign.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
1 Virtual Memory Management B.Ramamurthy. 2 Demand Paging Main memory LAS 0 LAS 1 LAS 2 (Physical Address Space -PAS) LAS - Logical Address.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
ENERGY-PROPORTIONAL IMAGE SENSING FOR Robert LiKamWa Bodhi Priyantha Matthai Philipose Victor Bahl Lin Zhong CONTINUOUS MOBILE VISION
Wenye Wang Xinbing Wang Arne Nilsson Department of Electrical and Computer Engineering, NC State University March 2005 A New Admission Control Scheme under.
Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.
Cross-Layer Optimization for Video Streaming in Single- Hop Wireless Networks Cheng-Hsin Hsu Joint Work with Mohamed Hefeeda MMCN ‘09January 19, 2009 Simon.
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
Power Analysis of WEP Encryption Jack Kang Benjamin Lee CS252 Final Project Fall 2003.
Low Power Design for Wireless Sensor Networks Aki Happonen.
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Virtual Memory Management B.Ramamurthy. Paging (2) The relation between virtual addresses and physical memory addres- ses given by page table.
1 Virtual Memory Management B.Ramamurthy Chapter 10.
Chapter 4 Memory Management 4.1 Basic memory management 4.2 Swapping
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
Energy Aware Network Operations Authors: Priya Mahadevan, Puneet Sharma, Sujata Banerjee, Parthasarathy Ranganathan HP Labs IEEE Global Internet Symposium.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
1 HW-SW Framework for Multimedia Applications on MPSoC: Practice and Experience Adviser : Chun-Tang Chao Adviser : Chun-Tang Chao Student : Yi-Ming Kuo.
Chapter 3 Memory Management: Virtual Memory
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
1 Efficient Reference Frame Selector for H.264 Tien-Ying Kuo, Hsin-Ju Lu IEEE CSVT 2008.
Low-Power Wireless Sensor Networks
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Mobile Relay Configuration in Data-Intensive Wireless Sensor Networks.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
Mobile Middleware for Energy-Awareness Wei Li
1 Adaptable applications Towards Balancing Network and Terminal Resources to Improve Video Quality D. Jarnikov.
Modified OSI Architecture for Low-Power Wireless Networks Jay Bruso Mike Matranga.
A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
NUS.SOC.CS5248 A Time Series-based Approach for Power Management in Mobile Processors and Disks X. Liu, P. Shenoy and W. Gong Presented by Dai Lu.
Fast motion estimation and mode decision for H.264 video coding in packet loss environment Li Liu, Xinhua Zhuang Computer Science Department, University.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
IT-SOC 2002 © 스마트 모빌 컴퓨 팅 Lab 1 RECONFIGURABLE PLATFORM DESIGN FOR WIRELESS PROTOCOL PROCESSORS.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 15: Basic Parallel Programming Concepts.
NISC set computer no-instruction
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
Networks and Mobile Systems Research Group MIT Laboratory for Computer Science nms.lcs.mit.edu RadioActive Networks: Robust Wireless Communications John.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
1 Aphirak Jansang Thiranun Dumrongson
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Energy Aware Network Operations
Pavlos Petoumenos, Hugh Leather, Björn Franke
Dynamo: A Runtime Codesign Environment
Andrea Acquaviva, Luca Benini, Bruno Riccò
Computing and Compressive Sensing in Wireless Sensor Networks
Instruction Scheduling for Instruction-Level Parallelism
Tosiron Adegbija and Ann Gordon-Ross+
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Smita Vijayakumar Qian Zhu Gagan Agrawal
Computer Architecture
The O-GEHL branch predictor
Presentation transcript:

Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign

Introduction  Wireless multimedia increasingly common  Recent advances reduce constraints:  2GHz+ processors  High-speed wireless networks  Systems now Energy limited  Energy management essential

Adaptation  Adaptation key to energy management  Hardware adaptation already common  Software adaptation also possible  Challenges  How do we control adaptations?  How do we coordinate different adaptations?

GRACE Project  Target mobile multimedia devices.  Coordinated adaptation of all system layers  Hardware, application, network, OS  Complete cross-layer adaptation framework  Preserves separation between layers

Goals of this work  Target wireless video transmission  Adapt application: Adaptive video encoder  Adapt hardware: Adaptive CPU  Implement part of GRACE framework  Trade off between CPU and network energy

Contributions  Apply existing adaptive-CPU research  Energy-adaptive video encoder  Trades off between network, CPU  Allows adaptation with fixed QoS  Cross-layer adaptation framework  Coordinate app and CPU adaptation  Preserves logical separation between layers  20% Energy savings over existing systems

Presentation Overview  System model  System architecture and design  Cross-layer adaptation process  Results

System Model  Total Energy = CPU Energy + Network Energy Adaptive CPU Adaptive Video Encoder Control Wireless Network Video Capture

CPU Hardware Adaptation [Micro]  Reduce performance to save energy  Voltage and frequency scaling  Lower freq  lower voltage  lower energy  Architecture adaptation  Issue width  Active functional units (ALUs, etc.)  Instruction window size

Adaptive Encoder  Based on TMN H.263 encoder  Changed to logarithmic motion search  Encoder adapts for energy  Trade off between network and CPU energy  More computation  fewer bits  Adapt Motion Search and DCT  Computationally expensive  Elimination affects primarily rate

Adaptive Encoder Details  Motion Search and DCT thresholds  Terminate MS early when SAD under threshold  Skip DCT if SAD of block under threshold  Transmit “DCT flag” bit for each 8x8 block  Extends H.263 standard  Adaptation effect:  Setting thresholds at infinity  Reduces CPU load by ~50%  Increases data rate by 2x or more

Adaptation Control  When do we adapt?  What configurations do we choose?

Adaptation Control  When do we adapt?  Adapt before every frame  What configurations do we choose?

Adaptation Control  When do we adapt?  Adapt before every frame  What configurations do we choose?  Must minimize total CPU+network energy  Must complete frame within its allocated time

Adaptation Control  When do we adapt?  Adapt before every frame  What configurations do we choose?  Must minimize total CPU+network energy  Must complete frame within its allocated time  How do we find the optimal configurations?

Optimization  Application, CPU reconfiguration linked  Application reconfiguration changes workload  CPU reconfiguration changes performance  App config affects optimal CPU configuration … and vice versa  Two stage approach 1. For each app config, find CPU config, energy 2. Pick lowest-energy application configuration

Optimization Algorithm 1. For each app config, find  Best CPU config  CPU energy  Network energy  Total energy = CPU energy + network energy 2. Pick app config with lowest total energy

Optimization Algorithm 1. For each app config, find  Best CPU config – completes in time, with least energy [MICRO’01]  CPU energy  Network energy  Total energy = CPU energy + network energy 2. Pick app config with lowest total energy

Optimization Algorithm 1. For each app config, find  Best CPU config – completes in time, with least energy [MICRO’01]  CPU energy  Network energy  Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires instruction count

Optimization Algorithm 1. For each app config, find  Best CPU config – completes in time, with least energy [MICRO’01]  CPU energy = Instruction count x Energy per instruction [MICRO’01]  Network energy  Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires instruction count

Optimization Algorithm 1. For each app config, find  Best CPU config – completes in time, with least energy [MICRO’01]  CPU energy = Instruction count x Energy per instruction [MICRO’01]  Network energy = Byte count x Energy per byte [WaveLAN measured]  Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires instruction count

Optimization Algorithm 1. For each app config, find  Best CPU config – completes in time, with least energy [MICRO’01]  CPU energy = Instruction count x Energy per instruction [MICRO’01]  Network energy = Byte count x Energy per byte [WaveLAN measured]  Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires byte count Requires instruction count

Adaptation Process: Stage 1 App. Conf. 1 CPUNet Predict Next Instr. Count Predict Next Byte. Count Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy App configuration energy table

Adaptation Process: Stage 1 App. Conf. 1 CPUNet Predict Next Instr. Count Predict Next Byte. Count Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy App configuration energy table Find CPU Configuration CPU Optimizer

Adaptation Process: Stage 1 App. Conf. 1 CPUNet Predict Next Instr. Count Predict Next Byte. Count Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy App configuration energy table CPU Energy Estimator Predict CPU Energy Predict Net Energy Find CPU Configuration Network Energy Estimator CPU Optimizer

Adaptation Process: Stage 1 App. Conf. 1 CPUNet Predict Next Instr. Count Predict Next Byte. Count + Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy App configuration energy table CPU Energy Estimator Predict CPU Energy Predict Net Energy Find CPU Configuration Network Energy Estimator CPU Optimizer

Adaptation Process: Stage 1 App. Conf. 1 CPUNet Predict Next Instr. Count Predict Next Byte. Count + Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy CPU Energy Estimator Predict CPU Energy Predict Net Energy Find CPU Configuration Network Energy Estimator CPU Optimizer

Adaptation Process: Stage 1 App. Conf. 1 CPUNet Predict Next Instr. Count Predict Next Byte. Count + Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy CPU Energy Estimator Predict CPU Energy Predict Net Energy Find CPU Configuration Network Energy Estimator CPU Optimizer

Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy

Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy Pick Lowest Energy

Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy Pick Lowest Energy CPU Adaptor Chosen Configuration Application Adaptor

Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy... Conf n Energy Pick Lowest Energy CPU Adaptor Chosen Configuration Application Adaptor Capture, Encode, and Transmit Frame

Predictors  How do we predict instructions and bytes?  Fixed software  use previous frame data  Adaptive software  no longer works!  Solution: Offline profiling  Encode reference sequences offline  Transition randomly between app. configs  Fit predictors to transitions between configs  Map last instruction, bytes to new app. config  Linear, 1 st -order predictors

Experiments  RSIM CPU simulator  State-of-the-art CPU, memory  Princeton Wattch energy model  Reported energy typical of modern CPUs  Simulation Conditions:  Fixed and adaptive CPU  Fixed and adaptive software  Foreman sequence

Fixed vs Adaptive Systems  Adaptive hardware saves 70% over fixed system  Adaptive application saves  30% on fixed hardware  20% on adaptive hardware (total savings of 80%) Net CPU Adaptive H/W Adaptive S/W Adaptive Sys Fixed System Energy (J)

Algorithm Comparison  Baseline: Fixed software, adaptive hardware  Adaptive software:  Adaptive DCT/motion thresholds  Instruction, byte count for next frame predicted  Oracle  Instruction and byte count for next frame exact  Adapt-Once  Adapt once at start of encoding  Minimize total energy across entire sequence

Algorithm Comparison Energy (J) Net CPU Adapt Once Fixed Adaptive Oracle  Energy consumption of Adaptive within 3% of Oracle  Simple predictors sufficient for energy savings  Adaptive saves 5% over Adapt-Once  Frame-by-frame adaptation can save energy

Other test cases  Low Power CPU  Network energy dominated  Software adaptation did not save energy  Carphone  Little inter-frame variation  One-shot adaptation was sufficient  Adapt-Once, Adaptive, Oracle same energy  Adaptive software saved ~15%

Conclusions  A new framework for coordinated CPU/application adaptation  Combined benefits of both adaptations  Preserves separation between layers  Adaptive applications save energy:  Up to 20% on adaptive hardware  Up to 30% on fixed hardware