Todd Austin University of Michigan X-Stack Energy Optimization: Fact or Fiction.

Slides:



Advertisements
Similar presentations
Cross-stack Energy Optimization Fact or Fiction? WEED-ESSA Panel Discussion 2012 Technology Circuits Architecture Applications Hypervisor BIOS Micro-architecture.
Advertisements

Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mapping for Better Than Worst-Case Delays In LUT-Based FPGA Designs Kirill Minkovich and Jason Cong VLSI CAD Lab Computer Science Department University.
Bridging the Moore’s Law Performance Gap with Innovation Scaling Todd Austin University of Michigan.
CS61C L23 Synchronous Digital Systems (1) Garcia, Fall 2011 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.
Opportunities and Challenges for Better Than Worst­Case Design Todd Austin (presenter) Valeria Bertacco David Blaauw Trevor Mudge University of Michigan.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
1 Operating System Design Chapter The nature of the design problem 12.2 Interface design 12.3 Implementation 12.4 Performance 12.5 Project management.
Midterm Thursday let the slides be your guide Topics: First Exam - definitely cache,.. Hamming Code External Memory & Buses - Interrupts, DMA & Channels,
COM181 Computer Hardware Ian McCrumRoom 5B18,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1-1 Introduction to Computing Systems: From Bits and Gates.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
CS3350B Computer Architecture Winter 2015 Lecture 5.2: State Circuits: Circuits that Remember Marc Moreno Maza [Adapted.
Finite State Machines. Binary encoded state machines –The number of flip-flops is the smallest number m such that 2 m  n, where n is the number of states.
Introduction to Computing Systems from bits & gates to C & beyond Chapter 3 Digital Logic Structures Transistors Logic gates & Boolean logic Combinational.
CS61C L23 Synchronous Digital Systems (1) Garcia, Fall 2011 © UCB Senior Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day3:
Automated Design of Custom Architecture Tulika Mitra
Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &
Programming Model and Synthesis for Low-power Spatial Architectures Phitchaya Mangpo Phothilimthana Nishant Totla University of California, Berkeley.
Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
University of Michigan Electrical Engineering and Computer Science 1 Streamroller: Compiler Orchestrated Synthesis of Accelerator Pipelines Manjunath Kudlur,
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
University of Michigan Electrical Engineering and Computer Science 1 Compiler-directed Synthesis of Multifunction Loop Accelerators Kevin Fan, Manjunath.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
System-on-Chip Design Homework Solutions
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing.
Yuxi Liu The Chinese University of Hong Kong Circuit Timing Problem Driven Optimization.
CS 352H: Computer Systems Architecture
Computer Operations Part 2.
Dynamo: A Runtime Codesign Environment
Variable Word Width Computation for Low Power
Stateless Combinational Logic and State Circuits
ISA's, Compilers, and Assembly
Morgan Kaufmann Publishers
CDA 3101 Spring 2016 Introduction to Computer Organization
Master’s Thesis: Fast Flexible Architectures for Secure Communication
DynaMOS: Dynamic Schedule Migration for Heterogeneous Cores
Advantages of Dynamic Scheduling
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember Hello to James Muerle in the.
Solving MIPS Exam Problems
Instruction Scheduling for Instruction-Level Parallelism
Morgan Kaufmann Publishers The Processor
Operating System Design
Computer Organization & Compilation Process
COSC121: Computer Systems
Digital Logic Structures Logic gates & Boolean logic
CS 704 Advanced Computer Architecture
Operating System Design
Control unit extension for data hazards
Jun Chen and Changbo Long
Sampoorani, Sivakumar and Joshua
Instruction Level Parallelism (ILP)
Operating System Design
Digital Design Verification
Reducing pipeline hazards – three techniques
Lecturer PSOE Dan Garcia
Computer Organization & Compilation Process
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Chapter 4 The Von Neumann Model
Presentation transcript:

Todd Austin University of Michigan X-Stack Energy Optimization: Fact or Fiction

2 Austin’s Theory of Inevitable H/W Progression Hand coding Basic compilation Static optimization Profile- guided optimization Run-time code generation and optimization Software Construction Hand-drawn circuits Basic logic synthesis Static circuit optimization Hardware Construction Typical-case optimization

Razor Typical-Case Optimization  Circuit-level Razor latch detects timing errors  Architecture-level pipeline recovery sequence fixes pipeline state after timing error  Operating system-level voltage control sets energy level to minimize error rates  50% energy savings for < 1% performance loss Error_L Error comparator RAZOR FF clk_del Main Flip-Flop clk Shadow Latch Q1 D1 0 1 re co ve r IF Razor FF ID Razor FF EX Razor FF MEM (read-only) WB (reg/mem) error b u b bl e recover Razor FFStabilizer FFPC recover flus hID b u b bl e error b u b bl e flus hID error b u b bl e flus hID Flush Control flus hID error

CryptoManiac Typical-Case Optimization  Circuit-level functional unit design tucks pre- and post- Boolean ops into clock cycle  Architecture-level ISA extension exposes pre/post-ops  Application-level programming re-expresses algorithms to leverage optimization  20% performance benefit (could recast as energy benefit) CM Proc CM Proc CM Proc Keystore Req Scheduler In QOut Q requests results Pipelined 32-Bit MUL 1K Byte SBOX Cache 32-Bit Adder 32-Bit Rotator XORAND Logical Unit XORAND Logical Unit {tiny} {short} {tiny} {long}

X-Stack Optimization Thoughts  Opportunities – Big design wins to be found for x-stack optimizations  Challenges – Approach destroys abstractions, places burdens on layers up the stack (e.g., architects and programmers) – Typically need to work outside of your comfort zone – Reviewers are often skeptical of x-stack modeling fidelity  Advice – Play well with others (seek effective collaborations) – Be open to learning new technologies/applications/fields – Be prepared to pursue physical demonstrations

6 Austin’s Theory of Inevitable H/W Progression Hand coding Basic compilation Static optimization Profile- guided optimization Run-time code generation and optimization Software Construction Hand-drawn circuits Basic logic synthesis Static circuit optimization Hardware Construction Typical-case optimization Composable H/W acceleration