Hardware/Software Mechanisms for Cross-Layer Power Proportionality “Power Prop” Alex Yakovlev, Andrey Mokhov, Sascha Romanovsky, Max Rykunov, Alexei Iliasov.

Slides:



Advertisements
Similar presentations
Subthreshold SRAM Designs for Cryptography Security Computations Adnan Gutub The Second International Conference on Software Engineering and Computer Systems.
Advertisements

LEIT (ICT7 + ICT8): Cloud strategy - Cloud R&I: Heterogeneous cloud infrastructures, federated cloud networking; cloud innovation platforms; - PCP for.
Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Computer Abstractions and Technology
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Introduction to Systems Architecture Kieran Mathieson.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
CIS 314 : Computer Organization Lecture 1 – Introduction.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Chapter 1 Sections 1.1 – 1.3 Dr. Iyad F. Jafar Introduction.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Welcome Aboard – Chapter 1 COMP 2610 Dr. James Money COMP
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Professor: Chi-Jo Wang Student : Nguyen Thi Hoai Nam DIGITAL SIGNAL PROCESSOR AND ENERGY CONTROL.
Overview Introduction The Level of Abstraction Organization & Architecture Structure & Function Why study computer organization?
Andrey Mokhov, Victor Khomenko Arseniy Alekseyev, Alex Yakovlev Algebra of Parameterised Graphs.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Introduction CSE 410, Spring 2008 Computer Systems
Intro to Architecture – Page 1 of 22CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Introduction Reading: Chapter 1.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
ICCD Conversion Driven Design of Binary to Mixed Radix Circuits Ashur Rafiev, Julian Murphy, Danil Sokolov, Alex Yakovlev School of EECE, Newcastle.
CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
ELEC692/04 course_des 1 ELEC 692 Special Topic VLSI Signal Processing Architecture Fall 2004 Chi-ying Tsui Department of Electrical and Electronic Engineering.
Computer Organization and Design Computer Abstractions and Technology
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Computer Organization & Assembly Language © by DR. M. Amer.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
Power and Control in Networked Sensors E. Jason Riedy and Robert Szewczyk Presenter: Fayun Luo.
Computer Architecture 2 nd year (computer and Information Sc.)
Integrated Microsystems Lab. EE372 VLSI SYSTEM DESIGNE. Yoon 1-1 Panorama of VLSI Design Fabrication (Chem, physics) Technology (EE) Systems (CS) Matel.
CSIE30300 Computer Architecture Unit 01: Introduction Hsin-Chou Chi [Adapted from material by and
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Full Design. DESIGN CONCEPTS The main idea behind this design was to create an architecture capable of performing run-time load balancing in order to.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
CS150: Computer Organization and Architecture Michael D. Wilder, Ph.D.
Introduction CSE 410, Spring 2005 Computer Systems
Measuring Performance II and Logic Design
Overview Motivation (Kevin) Thermal issues (Kevin)
QUANTUM COMPUTING: Quantum computing is an attempt to unite Quantum mechanics and information science together to achieve next generation computation.
CSE 410, Spring 2006 Computer Systems
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Instructor: Dr. Phillip Jones
Architecture & Organization 1
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Welcome Aboard 1.
Architecture & Organization 1
Dynamically Reconfigurable Architectures: An Overview
Today’s agenda Hardware architecture and runtime system
HIGH LEVEL SYNTHESIS.
NetPerL Seminar Hardware/Software Co-Design
Presentation transcript:

Hardware/Software Mechanisms for Cross-Layer Power Proportionality “Power Prop” Alex Yakovlev, Andrey Mokhov, Sascha Romanovsky, Max Rykunov, Alexei Iliasov and Danil Sokolov, Schools of EEE and CS, Newcastle University Power Prop The more you get The more you give!

Moore’s law and Power trends

Part I: Power Proportionality

Power Proportionality Issues reported in literature: Source: S. Dawson-Haggerty et al. Power Optimization – Reality Check, UC Berkeley, 2009 Performance -power tradeoff for commodity systems is linear; the best strategy is “Race to sleep”; additional “run” power states are of little use; changes in existing commodity operating systems have little influence The focus should be on the time to transition to and from sleep! For a new type of systems such as WSN there is a non-linear region – the slogan is: learn how to run CMOS slowly and exploit scheduling optimizations Core i7 power drawn at different frequencies

Power proportionality Service-modulated processing Energy-modulated processing

Part II: Reconfigurable Processors

Achieving Power Proportionality Support for wide range of voltages – Asynchronous design – Unstable voltage supply (energy harvesting) Components optimised for different modes – Survival mode (power) – Mission mode (energy efficiency) – Emergency mode (performance) Reconfigurable instructions – Altering instruction behaviour in runtime

Pathway from a high-level specification a low-level MCU implementation CS + EE EE CS + EE Chip design Chip tapeout Chip Testing

DP3(x, y) = x 1 y 1 + x 2 y 2 + x 3 y 3 Reconfigurable Instructions

Resource-level refinement Functionality: DP3(x, y) = x 1 y 1 + x 2 y 2 + x 3 y 3 Abstract specification: Initialisation: c := 0 Invariant: (c = 1) => (res = x 1 y 1 + x 2 y 2 + x 3 y 3 ) Event: if (c = 0) then (res := x 1 y 1 + x 2 y 2 + x 3 y 3 & c := 1) Open the black box and show what is inside: - Perform multiplications by 2-input fast multipliers - Perform addition by 3-input adder

Fastest 2 multipliers Least peak power Dedicated component Balanced Reconfigurable Instructions

x=1 y=0 z=1

Part III: Intel 8051

Final remarks Towards power proportionality – Voltage range: 0.2V – 1.5V – Performance range: 2.7K – 67M instructions/sec Survival of components – Full capability mode: 0.89V – 1.5V – RAM fails at 0.89V – Program counter unreliable below 0.74V – Asynchronous control survives until 0.2V

PCB board for evaluation 16 PCB board with FPGA

Conference and journal papers: – Towards Reconfigurable Processors for Power-Proportional Computing, A. Mokhov, M. Rykunov, D. Sokolov and A. Yakovlev, Proceedings of the 12th IEEE Low Voltage Low Power Conference (FTFC), Paris, France, – Design-for-Adaptivity of Microarchitectures, M. Rykunov, A. Mokhov, D. Sokolov, A. Yakovlev and A. Koelmans, Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors, Washington D.C., USA, – Synthesis of processor instruction sets from high-level ISA specifications, A. Mokhov, M. Rykunov, D. Sokolov, A. Yakovlev, A. Iliasov, and A. Romanovsky. IEEE Transactions on Computers, – Design of Processors with Reconfigurable Microarchitecture, A. Mokhov, M. Rykunov, D. Sokolov, and A. Yakovlev, Journal of Low Power Electronics and Applications, (Under review). 17 Project outcomes:

18 Project outcomes (cont.): Several MSc projects PhD thesis – “Design of Asynchronous Microprocessor for Power Proportionality” (Nov. 2013). The PowerProp project established several important industrial connections, e.g. Maxeler Technologies, IBM Research, etc. Some PowerProp theory, tool support and software ideas have moved to a new Programme Grant -- PRiME (EP/K034448/1). CPU design ideas will be used in SAVVIE project ( EP/K012908/1). Helped to promote joint CS+EE developments in Workcraft (graph-based EDA environment), used in several EPSRC projects.

Thank you!

20 Parameterised Graphs for formal specification of Multi-modal systems DP3 instruction computes dot product x·y = x1·y1 + x2·y2 + x3·y3. –declaration of the functional units a = unit "2-input adder" b = unit “3-input adder" c = unit “2-input multiplier" d = unit "fast 2-input multiplier" e = unit "dedicated DP3 unit“ –specification of each instruction inst_a = (d1 + d2 + d3) -> b inst_b = c1 -> c1 1 -> c1 -> b inst_c = e inst_d = (c2 + c1) -> a + c1 -> c1 -> a inst_e = d1 -> d1 -> (a + c1) -> a

23 Parameterised Graphs for formal specification of Multi-modal systems DP3 instruction computes dot product x·y = x1·y1 + x2·y2 + x3·y3. –declaration of the functional units a = unit "2-input adder" b = unit “3-input adder" c = unit “2-input multiplier" d = unit "fast 2-input multiplier" e = unit "dedicated DP3 unit“ –specification of each instruction inst_a = (d1 + d2 + d3) -> b inst_b = c1 -> c1 1 -> c1 -> b inst_c = e inst_d = (c2 + c1) -> a + c1 -> c1 -> a inst_e = d1 -> d1 -> (a + c1) -> a

Intel 8051 Instruction Set

CJNE Instruction

Branch takenBranch not taken

Measurements: Current & Latency

Measurements: Power

Measurements: Energy Efficiency

Some measurements… V to 1.5V: full capability mode. 0.74V to 0.89V: at 0.89V the RAM starts to fail, so the chip can only operate using internal registers. 0.22V to 0.74V: at 0.74V the program counter starts to fail, however the control logic synthesised using the CPOG model continues to operate correctly down to 0.22V 67 MIPS at 1.2 V. ~2700 instructions per second at 0.25V.