An instruction buffer for a low-power DSP 1 An Instruction Buffer for a Low Power DSP Mike Lewis AMULET group.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Machine cycle.
Computer Organization and Architecture
The 8085 Microprocessor Architecture
Microprocessor and Microcontroller
The 8085 Microprocessor Architecture. Contents The 8085 and its Buses. The address and data bus ALU Flag Register Machine cycle Memory Interfacing The.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
1 TK2633TK Microprocessor Architecture DR MASRI AYOB.
Chapter 12 Pipelining Strategies Performance Hazards.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
1 Sec (2.3) Program Execution. 2 In the CPU we have CU and ALU, in CU there are two special purpose registers: 1. Instruction Register 2. Program Counter.
Pipelining By Toan Nguyen.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.
ARM Processor Architecture
Dr. Rabie A. Ramadan Al-Azhar University Lecture 6
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits II.
Pipelining and Parallelism Mark Staveley
TEAM FRONT END ECEN 4243 Digital Computer Design.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Reconfigurable Computing - Pipelined Systems John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
ALU (Continued) Computer Architecture (Fall 2006).
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Introduction to Microprocessors - chapter3 1 Chapter 3 The 8085 Microprocessor Architecture.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Computer Architecture Lecture 5 by Engineer A. Lecturer Aymen Hasan AlAwady 25/11/2013 University of Kufa - Informatics Center for Research and Rehabilitation.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
Gandhinagar Institute of Technology
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
8085 Microprocessor Architecture
Microprocessor Communication and Bus Timing
The 8085 Microprocessor Architecture
Computer Architecture Chapter (14): Processor Structure and Function
Low-power Digital Signal Processing for Mobile Phone chipsets
ARM Organization and Implementation
William Stallings Computer Organization and Architecture 8th Edition
Edexcel GCSE Computer Science Topic 15 - The Processor (CPU)
Morgan Kaufmann Publishers
The 8085 Microprocessor Architecture
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
8085 Microprocessor Architecture
..
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
The fetch-execute cycle
Rocky K. C. Chang 6 November 2017
8085 Microprocessor Architecture
Pipelining Basic concept of assembly line
* From AMD 1996 Publication #18522 Revision E
The 8085 Microprocessor Architecture
Computer Architecture
Pipelining Basic concept of assembly line
8085 Microprocessor Architecture
Pipelining Basic concept of assembly line
Computer Architecture
COMPUTER ARCHITECTURE
Sec (2.3) Program Execution.
Presentation transcript:

An instruction buffer for a low-power DSP 1 An Instruction Buffer for a Low Power DSP Mike Lewis AMULET group

An instruction buffer for a low-power DSP 2 A low-power DSP architecture n Targeted for digital mobile phones Microprocessor + DSP combination n Multi-level power reduction strategy… Asynchronous Large register file Parallel structure Parallel instructions cached

An instruction buffer for a low-power DSP 3 A low-power DSP architecture n Fetch unit- autonomous instruction fetch Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

An instruction buffer for a low-power DSP 4 A low-power DSP architecture n Instruction buffer: 32 entry FIFO Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

An instruction buffer for a low-power DSP 5 A low-power DSP architecture n Decode instruction, read VLIW operand Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

An instruction buffer for a low-power DSP 6 A low-power DSP architecture n Substitute and update index registers Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

An instruction buffer for a low-power DSP 7 A low-power DSP architecture n Read registers and VLIW opcode Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

An instruction buffer for a low-power DSP 8 A low-power DSP architecture n Perform operation Register Bank (2x128x16 bit) Load-store unit ALU Index register values Opcode X/Y mem P mem int0, int1, nmi Operand BufferDecodeIndex reg.Fetch VLIW mem

An instruction buffer for a low-power DSP 9 The instruction buffer n Stores pre-fetched instructions n Performs hardware-based loops Instructions read from memory into buffer Subsequent iterations use stored copies Buffer manages loop counter 32 instructions, with up to 16 nested loops

An instruction buffer for a low-power DSP 10 Requirements n Low power consumption n Minimise latency n Low cycle time: 25ns max

An instruction buffer for a low-power DSP 11 Asynchronous buffer designs n Micropipeline Very good cycle time Poor latency and power consumption Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Ain Rin Aout Rout En Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Latch Ain Rin Aout Rout En Ain Rin Aout Rout En Ain Rin Aout Rout En Ain Rin Aout Rout En

An instruction buffer for a low-power DSP 12 Asynchronous buffer designs n Word-slice FIFO Latches arranged in parallel Write token Read token Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Write disable Write request Read acknowledge Read request Data in Data out Write token Read token EnOE Full wr rdRd_req Read token Write token EnOE Full wr rdRd_req Read token Write token

An instruction buffer for a low-power DSP 13 Asynchronous buffer designs Writes disabled by ANDing full indications Read requested by ORing all read requests Write token Read token Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Write disable Write request Read acknowledge Read request Data in Data out

An instruction buffer for a low-power DSP 14 Write token Read token Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Tristate Latch EnOE Full wr rd Rd_req Write disable Write request Read acknowledge Read request Data in Data out Word-slice FIFO operation Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Full wr rd Rd_req Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Tristate Latch Full wr rd Rd_req Write disable Tristate Latch Full wr rd Rd_req

An instruction buffer for a low-power DSP 15 Looping behaviour n Loops require Changing the flow of the read token Preventing stages from being emptied –but making sure that they appear to be empty Read token Write token Loop startLoop end Full End of loop Full

An instruction buffer for a low-power DSP 16 Evaluation n Power efficiency, latency, cycle-time What defines ‘good’ performance? n Compare with a known design 32-entry micropipeline FIFO chosen Compare operation in non-looping mode

An instruction buffer for a low-power DSP 17 Evaluation n Powermill used to gather results Test harness feeds identical random instructions in both tests, at various speeds –and also ensures correct outputs Energy per transfer measured –at maximum throughput for each design –at a rate much less than the maximum

An instruction buffer for a low-power DSP 18 Results n Cycle time 6.0ns (167MHz) for instruction buffer. 2.0ns (488MHz) for micropipeline FIFO. –The expected result: micropipeline FIFO is know to have good cycle time Instruction buffer well within 25ns target

An instruction buffer for a low-power DSP 19 Results n Latency 2.7ns for instruction buffer 26ns for micropipeline FIFO –Big benefit from parallel structure

An instruction buffer for a low-power DSP 20 Results n Energy consumption per transfer Maximum speed –0.32nJ for instruction buffer –0.67nJ for micropipeline FIFO 50MHz (well below maximum) –0.48nJ for instruction buffer –0.77nJ for micropipeline FIFO Instruction buffer consumes 48%-62% of the energy of the simpler micropipeline

An instruction buffer for a low-power DSP 21 Conclusions n Cycle time well within specification n Good latency achieved n Low power consumption Outperforms much simpler FIFO design –Study on full extracted layout suggests word- slice FIFO still better with wiring added [13]