Hamid Noori*, Farhad Mehdipour†, Norifumi Yoshimastu‡,

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
ELEN 468 Advanced Logic Design
CSE 340 Computer Architecture Spring 2014 MIPS ISA Review
Lab Assignment 2: MIPS single-cycle implementation
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
VHDL Synthesis of a MIPS-32 Processor Bryan Allen Dave Chandler Nate Ransom.
Chapter 6 Memory and Programmable Logic Devices
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B.
Generating and Executing Multi-Exit Custom Instructions for an Adaptive Extensible Processor Hamid Noori †, Farhad Mehdipour ‡, Kazuaki Murakami †, Koji.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
CS35101 Computer Architecture Spring 2006 Week 1 Slides adapted from: Mary Jane Irwin ( Course url:
The structure COMPUTER ARCHITECTURE – The elementary educational computer.
An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit Farhad Mehdipour †, Hamid.
Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Electrical and Computer Engineering University of Cyprus LAB 2: MIPS.
Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,
High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,
EEL5708/Bölöni Lec 3.1 Fall 2006 Sept 1, 2006 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Electrical and Computer Engineering University of Cyprus
Basic Computer Organization and Design
CS161 – Design and Architecture of Computer Systems
CS 230: Computer Organization and Assembly Language
Lecture Topics: 11/1 Processes Process Management
/ Computer Architecture and Design
ELEN 468 Advanced Logic Design
Parallel Shared Memory
Pipelining.
von Neumann Architecture CPU
Prof. Sirer CS 316 Cornell University
Computer Architecture
Processor (I).
ECE232: Hardware Organization and Design
Design of the Control Unit for Single-Cycle Instruction Execution
Pipelining.
Pipelining: Advanced ILP
Number Representations and Basic Processor Architecture
Lecture 4: MIPS Instruction Set
Dynamically Reconfigurable Architectures: An Overview
MIPS Processor.
ECE232: Hardware Organization and Design
Control Unit Introduction Types Comparison Control Memory
Topic 5: Processor Architecture Implementation Methodology
von Neumann Architecture CPU
The Processor Lecture 3.1: Introduction & Logic Design Conventions
Guest Lecturer TA: Shreyas Chand
Topic 5: Processor Architecture
Instruction Execution Cycle
Prof. Sirer CS 316 Cornell University
MIPS I/O and Interrupt.
ECE 352 Digital System Fundamentals
Review: The whole processor
CS352H Computer Systems Architecture
CSc 453 Final Code Generation
Chapter 7 Microprogrammed Control
Computer Architecture
COMS 361 Computer Organization
Presentation transcript:

A Reconfigurable Functional Unit for an Adaptive Dynamic Extensible Processor Hamid Noori*, Farhad Mehdipour†, Norifumi Yoshimastu‡, Kazuaki Murakami*, Koji Inoue* and Morteza Saheb Zamani† *Department of Informatics, Kyushu Univ., Japan ‡Fukuoka Laboratory for Emerging & Enabling Technology of SoC, Japan †Computer Engineering and Information Technology Department, Amirkabir Univ. of Technology, Iran E-mail: noori@c.csce.kyushu-u.ac.jp, nyoshimatsu@fleets.jp, {murakami,inoue}@i.kyushu-u.ac.jp, {mehdipur,szamani}@aut.ac.ir Operation Modes General Overview of the architecture Normal mode Profiling (optional) Executing Custom Instructions on the RFU and other parts of the code on the base processor Training mode Profiling Detecting start address of Hot Basic Blocks (HBBs) Generating Custom Instructions Generating Configuration Data for the RFU Binary rewriting Initializing the Sequencer Table ♦ Online Needs a simple hardware for profiling All tasks are run on the base processor ♦ Offline Needs a PC trace after taken branches/jumps Adaptive Dynamic Extensible Processor Base Processor Reg File Fetch Decode Execute Memory Write Augmented Hardware RFU Profiler Sequencer N-way in-order general RISC Detects start addresses of Hot Basic Blocks (HBBs) Executes Custom Instructions Switches between main processor and RFU Training Mode Training Mode Normal Mode Running Tools for Generating Custom Instructions, Generating Configuration Data for ACC and Initializing Sequencer Table Monitors PC and Switches between main processor and ACC Detecting Start Address of HBBs Applications Applications Applications Binary-Level Profiling Processor Profiler Profiler Processor Profiler Processor Profiler RFU Sequencer RFU Sequencer RFU Sequencer Binary Rewriting Executing CIs Tool Chain Custom instructions 1- Exclude floating point, multiply, divide and load instructions 2- Include at most one STORE, at most one BRANCH/JUMP and all other fixed point instructions Generating Custom instructions Finding the biggest sequence of instructions in the HBB that can be executed on the ACC Moving the instructions and appending supportable instructions to the head of the detected instruction sequence after checking flow-dependency and anti-dependency Moving the instructions and appending supportable instructions to the tail of the detected instruction sequence after checking flow-dependency and anti-dependency Rewriting object code if instructions have been moved Moving instructions, should not modify the logic of the application Custom instruction generation is done without considering any other constraints. 4052c0 addiu $29,$29,-32 4052c8 mov.d $f0,$f12 4052d0 sw $18,24($29) 4052d8 addu $18,$0,$6 4052e0 sw $31,28($29) 4052e8 sw $16,16($29) 4052f0 mfc1 $16,$f0 4052f8 mfc1 $17,$f1 405300 srl $6,$17,0x14 405308 andi $6,$6,2047 405310 sltiu $2,$6,2047 405318 addu $6,$6,$18 405320 sltiu $2,$6,2047 405328 lui $2,32783 405330 and $17,$17,$2 405338 andi $2,$6,2047 405340 sll $2,$2,0x14 405348 or $17,$17,$2 405350 mtc1 $16,$f0 405358 mtc1 $17,$f1 405360 lw $31,28($29) 405370 lw $16,16($29) 405378 addiu $29,$29,32 405380 jr $31 Speedup RFU Architecture : Functional Unit : Base connection : Optimized connection Input from register file DEC/EXE Pipeline Registers FU1 FU2 FU3 FU4 ACC Reg0 ……………………………………………………………… . Reg31 Sequencer EXE/MEM Pipeline Registers Config Mem Decoder RFU Integrating RFU with the Base Processor Output to register file