An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.

Slides:

Advertisements

Similar presentations

ECE-777 System Level Design and Automation Hardware/Software Co-design

Advertisements

Parallell Processing Systems1 Chapter 4 Vector Processors.

Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.

Azad Madni Professor Director, SAE Program Viterbi School of Engineering Platform-based Engineering: Rapid, Risk-mitigated Development.

1 Enhancing a Reconfigurable Instruction Set Processor with Partial Predication and Virtual Opcode Support Nikolaos Vassiliadis, George Theodoridis and.

Architecture and Compilation for Reconfigurable Processors Jason Cong, Yiping Fan, Guoling Han, Zhiru Zhang Computer Science Department UCLA Nov 22, 2004.

Chapter 13 Embedded Systems

The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.

Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)

Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma

Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.

Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

COMPUTER ORGANIZATIONS CSNB123 May 2014Systems and Networking1.

1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.

Yongjoo Kim*, Jongeun Lee**, Jinyong Lee*, Toan Mai**, Ingoo Heo* and Yunheung Paek* *Seoul National University **UNIST (Ulsan National Institute of Science.

Basics and Architectures

A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.

Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Paper Review: XiSystem - A Reconfigurable Processor and System

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.

Automated Design of Custom Architecture Tulika Mitra

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.

Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.

designKilla: The 32-bit pipelined processor Brought to you by: Victoria Farthing Dat Huynh Jerry Felker Tony Chen Supervisor: Young Cho.

Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.

A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.

Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.

Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,

IT-SOC 2002 © 스마트 모빌 컴퓨 팅 Lab 1 RECONFIGURABLE PLATFORM DESIGN FOR WIRELESS PROTOCOL PROCESSORS.

High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,

EKT303/4 Superscalar vs Super-pipelined.

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.

Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.

Sunpyo Hong, Hyesoon Kim

CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.

Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.

The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

Floating-Point FPGA (FPFPGA)

Evaluating Register File Size

Application-Specific Customization of Soft Processor Microarchitecture

Jason Cong, Guoling Han, Zhiru Zhang VLSI CAD Lab

Jinquan Dai, Long Li, Bo Huang Intel China Software Center

Application-Specific Customization of Soft Processor Microarchitecture

Presentation transcript:

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, Greece Aristotle University of Thessaloniki, Greece AUTH Introduction  Broad diversity of algorithms  Rapid evolution of standards  High-performance demands Characteristics of modern applications Requirements to amortize cost over high production volumes  High levels of flexibility => fast time-to-market  Adaptability => increased reusability An appealing option: Reconfigurable Instruction Set Processor (RISP)  Couple a standard processor with Reconfigurable Hardware (RH)  Processor = bulk of the flexibility  RH = adaptation to the targeted application through dynamic Instruction Set Extensions (ISEs) Motivation New features in order to program such hybrid architectures  Partition the application between the RH and the processor  Configure the RH and feed the compiler with RH related information (e.g. execution and reconfiguration latencies)  Introduce extra code to pass data and control directives to and from the RH Transparent incorporation of these new features is a must  In order to preserve the time-to-market close to that of the traditional software design flow  and continue to target software-oriented groups of users with small or no experience in hardware design Objective Design a development framework for a dynamic Reconfigurable Instruction Set Processor (RISP)  Fully automated  Hides all RH related issues requiring limited interaction with the user  Appropriate for exploration and fine-tune of the architecture towards a target application  Allow different values for various architectural parameters  Retargetable to different instances of the architecture Target Architecture Core processor  Single issue, 5 Pipeline stages, 32-bit RISC  Can issue one reconfigurable instruction per execution cycle  Reconfigurable instructions explicitly encoded in the instruction word Interface  Provides control and data communication Reconfigurable Functional Unit (RFU)  Tightly coupled to the core’s pipeline  Provides reconfigurable ISE  ISE = MISOs of the cores primitive instructions MISO: Multiple-Input-Single-Output  Multi-context config. mem  Can provide a different config. bitstream per execution cycle  1-D array of coarse-grain PEs  “Floating” between two concurrent pipeline stages  Exploit spatial/temporal computation Conclusions  An automated development framework for a target RISP has been introduced  The framework can be used both for fine-tune an instance of the RISP at design time but also to program it after fabrication  The framework hides all reconfigurable hardware related issues from the user  The target RISP architecture can be used by any software-oriented user with no grip of hardware design  Support of a new architecture instance by the framework is possible without any modification Acknowledgement This work was supported by the General Secretariat of Research and Technology of Greece and the European Union. Framework usage demonstration - Experimental Results Speedup vs. number of PEs Speedup vs. PEs/MISO inputs Speedup vs. number of reconfigurable instructions Code size and instruction fetches reduction 16 words of 134 bits Config. Memory Size ALU, Shift, Multiply PEs Functionality 8 Num of PEs 32-bits Granularity Value Configuration Exploration for fine-tuning of critical architecture parameters  A set of benchmarks derived from different benchmarking suites like MiBench and Powerstone were considered  The possibilities of using the framework for fine-tune critical architectural parameters and/or to program/evaluate a specific instance of the architecture are demonstrated  Comparisons results were performed considering the RISC processor of the architecture with and without support of the RFU unit Configuration of an evaluation instance Evaluation of the derived instance  x2.91 avg. speedup  38% avg. code size reduction => less memory requirements  62% avg. instruction memory accesses reduction => significant power savings Development Framework Flow  Instrument the CDFG with profiling annotations  Convert CDFG to equivalent C code  Compile and execute  Pattern generation to identify MISO cluster of operations  Mapping  Assign operations PEs / Route the 1-D array  Analyze data paths to minimize delay  Report candidate instruction semantics  Maximum number of inputs in the pattern (i.e. # of reg.file read ports)  Permitted types of operations in the pattern (i.e. PE functionality)  Maximum number of operations in the patterns (i.e. # of PEs)  Number of different ISEs  Graph isomorphism to discover identical instructions  Rank instructions based on the estimated offered speedup  Select the best ISEs