1 Targeted execution enabling increased power efficiency John Goodacre Director, Program Management ARM Processor Division August 2009 MPSoC 2009 Anirban.

Slides:



Advertisements
Similar presentations
Zhou Peng, Zuo Decheng, Zhou Haiying Harbin Institute of Technology 1.
Advertisements

Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare
4. Workload directed adaptive SMP multicores
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
System Level Benchmarking Analysis of the Cortex™-A9 MPCore™ John Goodacre Director, Program Management ARM Processor Division October 2009 Anirban Lahiri.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Energy-Efficient System Virtualization for Mobile and Embedded Systems Final Review 2014/01/21.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.
Mathew Paul and Peter Petrov Proceedings of the IEEE Symposium on Application Specific Processors (SASP ’09) July /6/13.
Platforms, ASIPs and LISATek Federico Angiolini DEIS Università di Bologna.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
1 Breakout thoughts (compiled with N. Carter): Where will RAMP be in 3-5 Years (What is RAMP, where is it going?) Is it still RAMP if it is mapping onto.
Microsoft Virtual Server 2005 Product Overview Mikael Nyström – TrueSec AB MVP Windows Server – Setup/Deployment Mikael Nyström – TrueSec AB MVP Windows.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Projects Using gem5 ParaDIME (2012 – 2015) RoMoL (2013 – 2018)
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
A Survey on Virtualization Technologies
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Low-Power Wireless Sensor Networks
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
System Design with CoWare N2C - Overview. 2 Agenda q Overview –CoWare background and focus –Understanding current design flows –CoWare technology overview.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.
Moshovos © 1 RegionScout: Exploiting Coarse Grain Sharing in Snoop Coherence Andreas Moshovos
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Centre d’Excellence en Technologies de l’Information et de la Communication Evolution dans la gestion d’infrastructure de type Cloud (SDI)
Operating System 4 THREADS, SMP AND MICROKERNELS.
Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling.
Operating Systems: Internals and Design Principles
Chair MPSoC MPSoC Programming Solution “ CoreManager” hardware unit for:  Dependency checking  Task scheduling  Local memory management of PEs  C programmable.
PowerMixer IP : IP-Level Power Modeling for Processors Shan-Chien Fang 1 Jia-Lu Liao 2 Chen-Wei Hsu 2 Chia-Chien Weng 2 Shi-Yu Huang 2 Wen-Tsan Hsieh 3.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
IMPROVING THE PREFETCHING PERFORMANCE THROUGH CODE REGION PROFILING Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC.
Sunpyo Hong, Hyesoon Kim
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Lucas De Marchi sponsors: co-authors: Liria Matsumoto Sato
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Overview Motivation (Kevin) Thermal issues (Kevin)
CS161 – Design and Architecture of Computer Systems
Multi-Processing in High Performance Computer Architecture:
Green Software Engineering Prof
Architecture Background
Hardware Support for Embedded Operating System Security
Hardware Multithreading
A Survey on Virtualization Technologies
A High Performance SoC: PkunityTM
Operating System 4 THREADS, SMP AND MICROKERNELS
What is Computer Architecture?
rePLay: A Hardware Framework for Dynamic Optimization
Presentation transcript:

1 Targeted execution enabling increased power efficiency John Goodacre Director, Program Management ARM Processor Division August 2009 MPSoC 2009 Anirban Lahiri Adya Shrotriya Nicolas Zea Technology Researchers This project in ARM is in part funded by ICT-eMuCo, a European project supported under the Seventh Framework Programme (7FP) for research and technological development

2 Interesting System Configurations…

3 Background  Smooth transition between energy and performance levels  Reduced loss due to leakage power as cores can be switched off  Addresses the application performance diversity Power Voltage (V) MIPS Disclaimer : The plots are indicative of practical architectures and systems. Big Core Little Core Big Core Little Core Power Voltage (V) Big Core Tasks Little Core Tasks High Performance Applications Low-power Applications Fig 1c : Diversity of Multitask Workloads Fig 1b : Power-Performance Diversity of Single Task Workloads Fig 1a : Power at Peak performance per Operating Voltage 1.2 Big-Switch Power Aware SMP Scheduled

4 Analyzing Diversity  Code compatibility (due to uniform ISA) ensures easy dynamic task migration (Fig 2a)  Task migration for power efficiency based on required performance (Fig 2b). Example shows a set of tasks T 1 – T 5 MIPS Time Performance (Avg. MIPS) Power (mW) Big Core Little Core Power (mW) Core Executing the Task MIPS Fig 2b : Task migrations over time based on performance requirement in a Multitask Workload  Prevents smaller tasks from corrupting high performance task execution. E.g. Task T 1 in Fig 2b.  Important to further analyse temporal effects of SoC power Fig 2a : Single task migrating across cores over time T1T1 T2T2 T3T3 T4T4 T5T5 T3T3 T4T4 Time T2T2 T5T5 T3T3 T2T2 T1T1 T3T3 T4T4 T1T1 T5T5

5 Methodologies Being Utilized C Code App / OS (Parallel / SMP ?) ARMCC / GCC System Generator RTL Netlist Emulator (Palladium) Simulation (PowerTheater) On-board Execution Simulator RV Cache Models Execution Trace Cache Hit / Miss Ratio OS Behaviour OS - Profiler Execution Trace Power per H/w Unit Cache Hit / Miss Ratio Power Estimation Model Power Estimation Model Compare Accuracy Power Estimation Model Flow 1 (High-level Simulation) Flow 2 (Hardware Emulation) Flow 3 (Low-level Simulation) Using RVT Statistical Methods

6 Software Model Considerations Power Aware SMPBig-Switch Level of OS modificationRequires affinity to be driven by performance requirement Potentially no changes required Maximum power saveCan operate as big- switch too Little and big core need performance continuum Level of task diversity and peak performance Enable better scalabilityLimited to performance of single CPU Implementation complexity OS needs a speculative understanding of performance demands Invisible to OS, operates similar to interrupt service routine Management Responsibility OS performance monitorApplication dependent FlexibilitySMP / AMP designsSingle CPU only

7 Summary Expectations Application Scenario Power-Aware SMP Scheduled Big-SwitchBig-Core OnlyLittle Core Only Big-Task (700MIPS) 520mW Big-Core 0.8V) + Little-Core (20mW Leakage) Big-Core 0.8V) Big-Core 0.8V) - Small-Task (350MIPS) 250mW Big-Core (50mW Leakage) + Little-Core 0.8V) Little-Core (200mW) Big-Core (500mW) Little-Core (200mW) 1 Big-Tasks + 3 Small Tasks (1100MIPS) 700mW Big-Core 0.8V) + Little-Core 0.8V) Big Core 1.1V) Big Core 1.1V) - 3 Big-Tasks + 5 Small Tasks (1400MIPS) 950mW Big-Core 1.1V) + Little-Core 0.8V) --- Operating Voltage (Volts) Big-Core MIPS at Peak Frequency Little-Core MIPS at Peak Frequency Big-Core Power at Peak Frequency (mW) Little-Core Power at Peak Frequency (mW) Possible Power savings up to 50% Performance enhancements up to 30% seen by reducing corruption of high performance tasks Key to still understand the costs of migration

8 Thank you