NISC set computer no-instruction

Slides:



Advertisements
Similar presentations
Circuitos Digitales II The General Computer Architecture The MIPS single-cycle datapath Semana No.8 Semestre Prof. Eugenio Duque
Advertisements

Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Computer Science and Engineering Laboratory, Transport-triggered processors Jani Boutellier Computer Science and Engineering Laboratory This.
Chapter 8: Central Processing Unit
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
Instruction Level Parallelism (ILP) Colin Stevens.
Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,
Introduction to Systems Architecture Kieran Mathieson.
Computer Architecture Lecture 3 Coverage: Appendix A
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
Computer ArchitectureFall 2008 © September 17th, 2008 Majd F. Sakr CS-447– Computer Architecture.
CPEN Digital System Design Chapter 9 – Computer Design
Chapter 6 Memory and Programmable Logic Devices
What’s on the Motherboard? The two main parts of the CPU are the control unit and the arithmetic logic unit. The control unit retrieves instructions from.
SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Computer Architecture. “The design of a computer system. It sets the standard for all devices that connect to it and all the software that runs on it.
The CPU (or Central Processing Unit. Statistics Clock speed – number of instructions that can be executed per second Data width – The number of bits held.
Automated Design of Custom Architecture Tulika Mitra
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
EXECUTION OF COMPLETE INSTRUCTION
Software Defined Radio 長庚電機通訊組 碩一 張晉銓 指導教授 : 黃文傑博士.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Design Basics
Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &
A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Chapter 4 Computer Design Basics. Chapter Overview Part 1 – Datapaths  Introduction  Datapath Example  Arithmetic Logic Unit (ALU)  Shifter  Datapath.
CHAPTER 4 The Central Processing Unit. Chapter Overview Microprocessors Replacing and Upgrading a CPU.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
ECE 445 – Computer Organization
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Computer Structure & Architecture 7b - CPU & Buses.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,
EKT 221 : Chapter 4 Computer Design Basics
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
MARIE:An Introduction to a Simple Computer Michael Dougherty September 17, 2009.
CPU The Central Processing Unit (CPU), has 3 main parts: Control Unit Arithmetic and Logic Unit Registers. These components are connected to the rest.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,
Lecture 7: Overview Microprocessors / microcontrollers.
Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
EKT 221 : Digital 2 Computer Design Basics Date : Lecture : 2 hrs.
1 3 Computing System Fundamentals 3.2 Computer Architecture.
The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
Variable Word Width Computation for Low Power
Introduction CPU performance factors
Morgan Kaufmann Publishers The Processor
Architecture & Organization 1
Morgan Kaufmann Publishers
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
Chapter 1: Introduction
Architecture & Organization 1
Dynamically Reconfigurable Architectures: An Overview
Morgan Kaufmann Publishers The Processor
T.H.A.D.D. GROUP TOM DUAN HELEN YU ANDY LEE DANNY HUANG DAWEY HUANG
ARM ORGANISATION.
Presentation transcript:

NISC set computer no-instruction by: saleh shakhsi khazeni

Contents: what is NISC? Why NISC? Implementing methods     what is NISC? Why NISC? Implementing methods    Software implementation Hardware implementation ASIP implementation NISC implementation case study :Designing a custom hardware for DCT NISC benefits references      

What is NISC ?

Why NISC? 7 times speedup, 1.64 times power reduction, 12.5 times energy savings, and more than 3 times area reduction compared to a general purpose CPU

Implementing methods Software implementation Hardware implementation ASIP implementation NISC implementation

Software implementation General purpose CPUs Flexible (+) Low cost (+) Short time-to-market (+) Low performance (-) High energy consumption (-)

Hardware implementation Application Specific Integrated Circuits (ASICs) Not Flexible (-) High cost (-) Long time-to-market (-) High performance (+) Low energy consumption (+)

ASIP implementation Application Specific Instruction set Processors (ASIPs) One ALU and some custom function units on a CPU Needing a compiler to generate custom instructions Needing a decoder to decode custom instructions

General Overview of ASIP Architecture 400680 subiu $25,$25,1 400688 lbu $13,0($7) 400690 lbu $2,0($4) 400698 sll $2,$2,0x18 4006a0 sra $14,$2,0x18 4006a8 addiu $4,$4,1 4006b0 srl $8,$2,0x1c 4006b8 sll $2,$8,0x2 4006c0 addu $2,$2,$25 4006c8 lw $2,0($2) 4006d0 xori $13,$13,1 4006d8 addu $10,$10,$2 400680 subiu $25,$25,1 4006a0 sra $14,$2,0x18 4006e0 bgez $10,4006f0 . Register File ID/EXE Reg CFU ALU MUX EXE/MEM Reg GPP Augmented HW GPP: General Purpose Processor CFU: Custom Functional Unit

NISC implementation NISC compiler Generate controller and control words Using C code and given datapath

NISC scalability

case study :Designing a custom hardware for DCT The Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (IDCT) are important parts of JPEG and MPEG standards Its algorithm contains two for loop and Add, And, Multiply and Not-equal (!=) operations Simple General Purpose Datapath: (GPD)

case study :Designing a custom hardware for DCT Optimized NISC implementation needs some transformations: Software transformations: two for loops can be merged to one by combining the loops’ counters

case study :Designing a custom hardware for DCT Initial Custom datapath: CDCT1 operation chaining: reduces RF file accesses improves the energy consumption and performance

case study :Designing a custom hardware for DCT CDCT2: Bus customization replace all the global buses, with point to point Connections adding a pipeline register to the datapath

case study :Designing a custom hardware for DCT CDCT3: simplify the ALU and comparator Eliminating the unused parts of ALU, comparator and RF CDCT4 and CDCT5: Controller pipelining Adding CW and status registers

case study :Designing a custom hardware for DCT CDCT6: bit-width reduction Because the address-calculation pipeline stage does not need the 16-bit operations , the bit width of RF, OR, ALU, and Comp can be reduced to 8 bits

case study :Designing a custom hardware for DCT Comparing performance, power, energy and area of the NISCs

case study :Designing a custom hardware for DCT total power consumption In CDCT4, the power consumption increases, because of: (1) higher clock frequency and higher number of pipeline registers; (2) the higher logic power due to CW register gates;

case study :Designing a custom hardware for DCT execution time, power, energy and area of the designs

NISC benefits : Easy for hardware description using C code Eliminating the complexity of controller design Better performance Lower power Less area High speed up by more pipelining

references: 1. NISC Technology home page, www.ics.uci.edu/~nisc   1. NISC Technology home page, www.ics.uci.edu/~nisc 2. Daniel D. Gajski, “NISC: The Ultimate Reconfigurable Component”, CECS Technical Report TR 03-28 3. M. Reshadi, B. Gorjiara, D. Gajski, "NISC Technology and Preliminary Results", CECS Technical Report 05-11, August 2005 4. Mehrdad Reshadi and Daniel Gajski, “NISC Modeling and Compilation” , CECS Technical Report 04-33,December 2004