A Floating Point Divider for Complex Numbers in the NIOS II Presented by John-Marc Desmarais Authors: Philipp Digeser, Marco Tubolino, Martin Klemm, Daniel.

Slides:



Advertisements
Similar presentations
IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Instruction Set Extensions for Computation on Complex Floating Point Numbers Authors:
Advertisements

Vector Processing as a Soft-core CPU Accelerator Jason Yu, Guy Lemieux, Chris Eagleston {jasony, lemieux, University of British Columbia.
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
CPU Review and Programming Models CT101 – Computing Systems.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
The University of Adelaide, School of Computer Science
Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Project Testing; Processor Examples. Project Testing --thorough, efficient, hierarchical --done by “independent tester” --well-documented, repeatable.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
1 Automatically Generating Custom Instruction Set Extensions Nathan Clark, Wilkin Tang, Scott Mahlke Workshop on Application Specific Processors.
Configurable System-on-Chip: Xilinx EDK
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Shift Instructions (1/4)
CS61C L20 Introduction to Synchronous Digital Systems (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Computer Architecture and Organization
Computer Organization
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Impulse Embedded Processing Video Lab Generate FPGA hardware Generate hardware interfaces HDL files HDL files FPGA bitmap FPGA bitmap C language software.
B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Intro to Architecture – Page 1 of 22CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Introduction Reading: Chapter 1.
Computer Architecture and Organization Introduction.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Pipeline Extensions prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University MIPS Extensions1May 2015.
CSCI 211 Intro Computer Organization –Consists of gates for logic And Or Not –Processor –Memory –I/O interface.
CDA 3101 Fall 2013 Introduction to Computer Organization
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Lab 2 Parallel processing using NIOS II processors
80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.
1 The Instruction Set Architecture September 27 th, 2007 By: Corbin Johnson CS 146.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
MIPS Pipeline and Branch Prediction Implementation Shuai Chang.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
New Opportunities for Computer Architecture Research Using High-Density FPGAs and Design Tools Nahi Abdul-Ghani, Patrick Akl, Mohammad El-Majzoub, Maroulla.
Prototyping SoC-based Gate Drive Logic for Power Convertors by Generating code from Simulink models. Researchers Rounak Siddaiah, Graduate Student-University.
ECE354 Embedded Systems Introduction C Andras Moritz.
Design-Space Exploration
Introduction to Programmable Logic
Application-Specific Customization of Soft Processor Microarchitecture
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
Embedded Systems Design
The Central Processing Unit
Overview Introduction General Register Organization Stack Organization
Pipelining: Advanced ILP
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions Ramkumar Jayaseelan, Haibin Liu, Tulika Mitra School of Computing, National.
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Figure 16.1 Setting the Nios II IDE workspace to the Nios II reference design software directory.
CPU Structure CPU must:
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Computer Architecture
Application-Specific Customization of Soft Processor Microarchitecture
Computer Architecture Assembly Language
Presentation transcript:

A Floating Point Divider for Complex Numbers in the NIOS II Presented by John-Marc Desmarais Authors: Philipp Digeser, Marco Tubolino, Martin Klemm, Daniel Shapiro and Miodrag Bolic {dshap092,

Overview  Floating point division  Instruction Set Extensions (ISE)  NIOS II processor  Instruction hardware  Software interface  Experiment  Conclusion carg.site.uottawa.ca CARG 2010

Floating Point Division carg.site.uottawa.ca CARG 2010 Unlike real multiplication or real division, mathematical operations for complex numbers are usually provided by slow software. Consider complex division: Slow

Floating Point Division Fast complex dividers are necessary to drive an increasing number of applications such as signal processing systems for image and audio manipulation, GPS, and multi-antenna systems. Example: STSDAS offers math libraries for image analysis, including stsdas.analysis.fourier.carith, which is used to multiply or divide two complex images 1. carg.site.uottawa.ca CARG

Instruction Set Extensions carg.site.uottawa.ca ISE (Instruction Set Extensions) Instruction-Set Extensions, as the name implies, involves the addition of custom instructions to a processor’s instruction set. CARG 2010 Many market processors allow for the addition of these internal custom instructions: 1.Tensilica Xtensa (VLIW) 2.Altera NIOS II 3.Xilinx Microblaze 4.MIPS CorExtend In recent years there has been much research into the area of automatic identification of Instruction Set Extensions.

Instruction Set Extensions carg.site.uottawa.ca ISE (Instruction Set Extensions) These automated efforts vary in their approach. Some look at the functional C level of the program where hotspot functions are identified. Others look lower at the basic construct of the program as data and control flow graphs. CARG />> x y z Modify ISA Add Custom Hardware Modify Compiler, ASM & LD Regenerate Custom Program

Instruction Set Extensions carg.site.uottawa.ca An ISE candidate has limited IO access to the register file. The instruction width also poses an IO barrier. Possible Remedies: 1.Multiport Register File 2.Register File Replication 3.Shadow Registers 4.Multicycle Reads (Altera’s NIOS II) 5.Dedicated Data Links (Microblaze) CARG 2010 Solution (Pozzi05): We use multicycle reads/writes from/to the register bank in order to squeeze several operands into the two input- one-output register file.

NIOS II Processor carg.site.uottawa.ca CARG 2010 Generic custom instruction datapathOur custom logic block

Instruction Hardware carg.site.uottawa.ca CARG 2010  We can see in these figures that a sequence of three calls to the custom instruction results in a complex operation with four inputs and two outputs. Cycles

Instruction Hardware carg.site.uottawa.ca CARG 2010 Operation when n=0 above, n=1 at right.

Software Interface carg.site.uottawa.ca CARG 2010 The designed hardware for complex division can be used easily in assembly (by inline) or C/C++ code as shown below:

Experiment carg.site.uottawa.ca CARG 2010 We used a NIOS II processor and a PLL as the starting point for the design.

Experiment carg.site.uottawa.ca CARG 2010

Conclusion CARG 2010 carg.site.uottawa.ca We designed a complex divider instruction set extension for the NIOS II This instruction was able to accelerate the execution of code that uses complex division In the future we would like to implement additional complex operations, and publish the core on OPENCORES.org Applications can be accelerated with instruction set extensions, and complex division is one case where there is a tangible benefit.