1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.

Slides:

Advertisements

Similar presentations

COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.

Advertisements

Final Project : Pipelined Microprocessor Joseph Kim.

1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.

Power Reduction Techniques For Microprocessor Systems

Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang.

Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.

Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.

Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Appendix A Pipelining: Basic and Intermediate Concepts

Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.

University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.

© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.

Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic Style Sumeer Goel, Ashok Kumar, and Magdy A. Bayoumi.

Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.

A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,

Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &

A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.

ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

General Concepts of Computer Organization Overview of Microcomputer.

Houman Homayoun, Sudeep Pasricha, Mohammad Makhzan, Alex Veidenbaum Center for Embedded Computer Systems, University of California, Irvine,

Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.

Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.

ECE 445 – Computer Organization

Computer Organization & Programming Chapter 6 Single Datapath CPU Architecture.

IT253: Computer Organization Lecture 9: Making a Processor: Single-Cycle Processor Design Tonga Institute of Higher Education.

EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)

UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.

PATMOS’02 Energy-Efficient Design of the Reorder Buffer* *supported in part by DARPA through the PAC-C program and NSF Dmitry Ponomarev, Gurhan Kucuk,

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.

16 Bit Logarithmic Converter Tinghao Liang and Sara Nadeau.

WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.

Seok-jae, Lee VLSI Signal Processing Lab. Korea University

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

Introduction to Computer Organization Pipelining.

COM181 Computer Hardware Lecture 6: The MIPs CPU.

8085 FAQ R.RAJKUMAR DEPT OF CSE SRM UNIVERSITY. FAQ What is a Microprocessor? - Microprocessor is a program-controlled device, which fetches the instructions.

Copyright 2006 by Timothy J. McGuire, Ph.D. 1 MIPS Programming Model CS 333 Sam Houston State University Dr. Tim McGuire.

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.

Load-Sensitive Flip-Flop Characterization Seongmoo Heo and Krste Asanović MIT Laboratory for Computer Science WVLSI 2001.

PipeliningPipelining Computer Architecture (Fall 2006)

Dynamic Zero Compression for Cache Energy Reduction Luis Villa Michael Zhang Krste Asanovic

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

Computer Organization and Architecture + Networks

Variable Word Width Computation for Low Power

SECTIONS 1-7 By Astha Chawla

Morgan Kaufmann Publishers

PowerPC 604 Superscalar Microprocessor

Morgan Kaufmann Publishers The Processor

CDA 3101 Spring 2016 Introduction to Computer Organization

Latches and Flip-flops

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Morgan Kaufmann Publishers The Processor

Masamitsu Tanaka, Nagoya Univ.

A High Performance SoC: PkunityTM

The Processor Lecture 3.1: Introduction & Logic Design Conventions

Guest Lecturer TA: Shreyas Chand

COMS 361 Computer Organization

Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy

Presentation transcript:

1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000

2 Introduction Motivation –Hand-held portable multimedia wireless device. (ex: Palm-pilots and wireless-phone) –Register files represent a substantial portion of energy budget in modern microprocessor. (ex: Motorola’s M.CORE--16% of total processor power, 42% of datapath power) Objective –Software invisible techniques to reduce switching activity frequency and switching capacitance of a register file.

3 Overview Methodology / Background. 7 Energy Saving Techniques. Combining Techniques. Conclusion.

4 Methodology Use dynamic benchmark traces of SPECint95 and g721 (total of 14 billion instructions) to evaluate modification to a conventional register file for energy efficiency. Custom layout the register file and bypass network in Magic layout program—0.25  m TSMC CMOS process. Use SPACE 2D extractor to extract layout parasitic for circuit simulation. Use HSpice to simulate the extracted netlist and to determine the effective switching capacitance. Use Matlab to build energy evaluation model.

5 Single Issue MIPS-II Compatible RISC Microprocessor For low-power embedded application

6 Register File High performance dynamic design. (target for nominal processor clock rate of 400 MHz). Three ports integer register file. (two singled-ended read ports and one differential write port) Static address decoder.

7 Modified Storage Cell On average, 82% of the bits fetched from the register file are zeros. 17% storage cell area increase (9% overall regfile area increase) and no read delay penalty. 27% energy saving.

8 Precise Read Control On average, each instruction only requires 1.3 operands. Gating the read word line enable pulse prevents discharge of bitlines. No area overhead and no access time penalty, assuming the decoding of required operands completes in the first half of the cycle. Saving ranges from 15% to 27% with an average of 21%.

9 Latch Clock Gating Only 81% of instructions use values held in the rs or rt latches. 10% of instructions use values held in sd latches. Not clocking latches whose values are not needed. No area overhead and no read delay penalty. 8% energy saving consistently.

10 Bypass Skip 36% of all necessary operands are supplied by the bypass network. Similar implementation to precise-read control. No area overhead and no increase in latency if we can finish determining the bypass control in the first half of the cycle. Saving ranges from 11% to 23% with an average of 16%.

11 Bypass Register 0 40% of regfile write accesses and 25% of regfile read accesses are Register 0. Remove the zero cells from the register file and provide zero input to the bypass mux. Avoid bitline swings on reading R0 and writing R0. 3% total area increase and no access time penalty. Saving ranges from 7% to 17% with an average of 14%.

12 Split Bitline The 8 most popular register account for 83% of all regfile accesses. Particular registers are always accessed more frequently than others. (MIPS assembler conventions) Use n-type transistor to separate the two partitions. 2% total area increase and 3% delay penalty to access the least-frequently-used registers. Saving ranges from 11% to 13% with an average of 12%.

13 Read Caching In some cases, two successive instructions read the same register from the register file. –add r4, r1, r6 –xor r9, r1, r2 Not clocking the latch and not reading the register file for the second instruction. 9% of accesses to the rs latch can be supplied via read cashing and only 2% for rt and sd latches. 1% total area increase with no access time penalty. Only 1% energy saving due to the high control logic overhead.

14 Conclusion An energy saving of 59% can be achieved by combining these seven methods.