NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design and implementation of a Multimedia Extension for a RISC Processor Eduardo.

Slides:



Advertisements
Similar presentations
CH10 Instruction Sets: Characteristics and Functions
Advertisements

Design of a Multimedia Extension for RISC Processor
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
PIPELINE AND VECTOR PROCESSING
Streaming SIMD Extension (SSE)
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Fall 2012SYSC 5704: Elements of Computer Systems 1 MicroArchitecture Murdocca, Chapter 5 (selected parts) How to read Chapter 5.
The University of Adelaide, School of Computer Science
ECE291 Computer Engineering II Lecture 24 Josh Potts University of Illinois at Urbana- Champaign.
Overview of Popular DSP Architectures: TI, ADI, Motorola R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
An Analysis of SIMD Instructions in the Pentium III Microprocessor By Alexander J. Aved 05 DEC 2000 CS689 Ball State University Muncie, Indiana.
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
Pentium 4 and IA-32 ISA ELEC 5200/6200 Computer Architecture and Design, Fall 2006 Lectured by Dr. V. Agrawal Lectured by Dr. V. Agrawal Kyungseok Kim.
Embedded Systems Programming
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Computer Organization and Assembly language
AMD Opteron - AMD64 Architecture Sean Downes. Description Released April 22, 2003 The AMD Opteron is a 64 bit microprocessor designed for use in server.
Computer performance.
Assembly Language for Intel-Based Computers, 4 th Edition Chapter 2: IA-32 Processor Architecture (c) Pearson Education, All rights reserved. You.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design of a Multimedia Extension for RISC Processor Ing. Eduardo Jonathan Martínez.
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design of a Multimedia Extension for RISC Processor Ing. Eduardo Jonathan Martínez.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.
COMPUTER ARCHITECTURE (P175B125)
5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Introduction to MMX, XMM, SSE and SSE2 Technology
CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.
November 22, 1999The University of Texas at Austin Native Signal Processing Ravi Bhargava Laboratory of Computer Architecture Electrical and Computer.
With a focus on floating point.  For floating point (i.e., real numbers), MASM supports:  real4  single precision; IEEE standard; analogous to float.
Introdution to SSE or How to put your algorithms on steroids! Christian Kerl
The Alpha Thomas Daniels Other Dude Matt Ziegler.
EEL5708/Bölöni Lec 8.1 9/19/03 September, 2003 Lotzi Bölöni Fall 2003 EEL 5708 High Performance Computer Architecture Lecture 5 Intel 80x86.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Chapter 2: Data Manipulation
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
ISA's, Compilers, and Assembly
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
MAC/VU-Advanced Computer Architecture Lecture 6- Instruction Set Principles (3) 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture.
SIMD Programming CS 240A, Winter Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures – usually both in same.
William Stallings Computer Organization and Architecture 6th Edition
Microprocessor Systems Design I
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 13 SIMD Multimedia Extensions Prof. Zhang Gang School.
Prof. Sirer CS 316 Cornell University
Phnom Penh International University (PPIU)
Basics Of X86 Architecture
Morgan Kaufmann Publishers
Vector Processing => Multimedia
Advanced Computer Architecture 5MD00 / 5Z033 Instruction Set Design
Advanced Computer Architecture 5MD00 / 5Z032 Instruction Set Design
MMX Multi Media eXtensions
Special Instructions for Graphics and Multi-Media
CS170 Computer Organization and Architecture I
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Chapter 2: Data Manipulation
A study on SIMD architecture
Prof. Sirer CS 316 Cornell University
Chapter 2: Data Manipulation
Introduction to Microprocessor Programming
Course Outline for Computer Architecture
Other Processors Having learnt MIPS, we can learn other major processors. Not going to be able to cover everything; will pick on the interesting aspects.
Chapter 2: Data Manipulation
Presentation transcript:

NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design and implementation of a Multimedia Extension for a RISC Processor Eduardo Jonathan Martínez Montes Prof. Marco Antonio Ramírez Salinas

I.Background 1.Motivation 2.Multimedia applications 3.State of the art II.Problem Description 1.Overview 2.SISD 3.SIMD 4.SISD vs SIMD 5.Saturation arithmetic 6.Example 7.Instruction format III. Objective 1.Main objective 2.Specific objectives OUTLINE IPN-CICMICROSE Lab2 IV. Hypothesis 1.Multimedia support MDMX Vector to vector arithmetic V. Technical Merits 1.Data path 2.Vector units

IPN-CICMICROSE Lab3 BACKGROUNDMotivation Lagarto is a superscalar embedded processor, now in develop by the HPC research team of CIC-IPN. The goal of this effort is to be used to help in the research and teaching. This processor require the design and build many blocks, so that, this project is part of a bigger project.

IPN-CICMICROSE Lab4 BACKGROUNDMotivation (cont.)

IPN-CICMICROSE Lab5 BACKGROUNDMotivation (cont.)

IPN-CICMICROSE Lab6 BACKGROUNDMultimedia applications Photo edition Video edition Rendering Video games

IPN-CICMICROSE Lab7 State of the art AVX2 - Intel 2013 Sandy Bridge y Bulldozer - Intel y AMD 2011 Advanced Vector Extensions (AVX) - Intel 2008 SSE4 - Intel 2006 SSE y SSE2 - AMD 2004 SSE3 - Intel 2004 Advance 3DNow! (3DNow! 2) - AMD 2003 AltiVec - IBM 2002 SSE2 - Intel DNow!. - AMD 2000 Streaming SIMD Extensions (SSE)- Intel 1999 Pentium II (MMX)- Intel 1998 AltiVec - Motorola BACKGROUND

IPN-CICMICROSE Lab8 PROBLEM DESCRIPTIONOverview Multimedia Extension is a vector machine that is embedded in situ with the main Superscalar Processor, it is used for deal with multimedia applications. Lagarto processor Main processor Multimedia extension

IPN-CICMICROSE Lab9 PROBLEM DESCRIPTIONSISD Single Instruction Single Data is a term referring to a computer architecture In which a single processor executes a single instruction stream.

IPN-CICMICROSE Lab10 PROBLEM DESCRIPTIONSIMD Single Instruction Multiple Data is a class of parallel computer. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. These machines exploit data level parallelism.

IPN-CICMICROSE Lab11 SISD vs SIMDPROBLEM DESCRIPTION

IPN-CICMICROSE Lab12 Saturation arithmeticPROBLEM DESCRIPTION It is a version of arithmetic in which all operations such as addition and multiplication are limited to a fixed range between a minimum and a maximum value. If the result of an operation is greater than the maximum, it is set to the maximum. On the other hand, if it is below the minimum, it is clamped to the minimum value = = =0

IPN-CICMICROSE Lab13 Example Example: get negative image PROBLEM DESCRIPTION

IPN-CICMICROSE Lab14 SISD Processing Example (cont.)PROBLEM DESCRIPTION

IPN-CICMICROSE Lab15 SIMD Processing Example (cont.)PROBLEM DESCRIPTION

IPN-CICMICROSE Lab16 Instruction formatHYPOTESIS Co-processor instruction COP1= COP2=010010

IPN-CICMICROSE Lab17 Instruction format (cont.)HYPOTESIS Data format and item chooser

IPN-CICMICROSE Lab18 HYPOTESIS Source 2 Source 1 Destination Instruction format (cont.)

IPN-CICMICROSE Lab19 HYPOTESISInstruction format (cont.)

IPN-CICMICROSE Lab20 OBJECTIVEObjectives General Objective Design a multimedia extension unit for a RISC processor (Lagarto). Specific Objectives  Design a vector adder w/wo saturation arithmetic.  Design a multiplier w/wo saturation arithmetic.  Implement the complete Instruction set of the MIPS Digital Media extension (MDMX).

IPN-CICMICROSE Lab21 HYPOTESISMIPS Digital Media Extension Lagarto II processor with:  MDMX supports video, audio, and graphics pixel processing. MDMX is not part of the MIPS Instruction Set. A processor that implements the MDMX must implement the MIPS-V ISA  MIPS MDMX is not intended for general purpose computing. Software support is via shared libraries and assembly language only.

IPN-CICMICROSE Lab22  MDMX shares a register file with the Floating Point Unit. Data is moved between the shared register file and memory with existing Floating Point Load and Store double operations.  Registers are interpreted in two formats: Quad Half and Oct Byte format.  MDMX also shared the 8 Floating Point Condition Code bites.  MDMX has a private 192 bit accumulator register. HYPOTESISMIPS Digital Media Extensión (cont.)

IPN-CICMICROSE Lab23 Vector to vector arithmeticHYPOTESIS

IPN-CICMICROSE Lab24 Data pathTECHNICAL MERITS

IPN-CICMICROSE Lab25 TECHNICAL MERITSVector units  Vector adder w/wo saturation arithmetic.  Vector subs tractor w/wo out saturation arithmetic.  Vector multiplier w/wo out saturation arithmetic.  Instruction vector Queue.  Vector Load/Store Queue.

IPN-CICMICROSE Lab26 Q&A