CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

Slides:



Advertisements
Similar presentations
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
DSPs Vs General Purpose Microprocessors
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
The University of Adelaide, School of Computer Science
ECE291 Computer Engineering II Lecture 24 Josh Potts University of Illinois at Urbana- Champaign.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Department of Computer Science University of the West Indies.
Slide 1 Exploiting 0n-Chip Bandwidth The vector ISA + compiler technology uses high bandwidth to mask latency Compiled matrix-vector multiplication: 2.
Compilation Techniques for Multimedia Processors Andreas Krall and Sylvain Lelait Technische Universitat Wien.
Slide 1Michael Flynn EE382 Winter/99 EE382 Processor Design Winter Chapter 7 and Green Book Lectures Concurrent Processors, including SIMD and.
CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design and implementation of a Multimedia Extension for a RISC Processor Eduardo.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
PROCESSOR ARCHITECTURES FOR MULTIMEDIA APPLICATIONS
Streaming SIMD Extensions CSE 820 Dr. Richard Enbody.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
09/27/2011CS4961 CS4961 Parallel Programming Lecture 10: Introduction to SIMD Mary Hall September 27, 2011.
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design of a Multimedia Extension for RISC Processor Ing. Eduardo Jonathan Martínez.
Telecommunications and Signal Processing Seminar Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * The University of Texas at.
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design of a Multimedia Extension for RISC Processor Ing. Eduardo Jonathan Martínez.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Multimedia Macros for Portable Optimized Programs Juan Carlos Rojas Miriam Leeser Northeastern University Boston, MA.
ECE Spring ‘02 Some material © Hill, Sohi, Smith, Wood (UW-Madison) © A. Moshovos Multimedia ISA Extensions Intel’s MMX –The Basics –Instruction.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.
Lecture 4: MIPS Subroutines and x86 Architecture Professor Mike Schulte Computer Architecture ECE 201.
CSCE 212 Review for Exam 1 Instructor: Jason D. Bakos.
MMX technology for Pentium. Introduction Multi Media Extension (MMX) for Pentium Processor Which has built in 80X87 Can be switched for multimedia computations.
1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set.
5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Bottlenecks of SIMD Haibin Wang Wei tong. Paper Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements One IEEE.
The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.
December 2, 2015Single-Instruction Multiple Data (SIMD)1 Performance Optimization, cont. How do we fix performance problems?
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Introduction to MMX, XMM, SSE and SSE2 Technology
November 22, 1999The University of Texas at Austin Native Signal Processing Ravi Bhargava Laboratory of Computer Architecture Electrical and Computer.
Introdution to SSE or How to put your algorithms on steroids! Christian Kerl
The Alpha Thomas Daniels Other Dude Matt Ziegler.
Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu.
11/13/2012CS4230 CS4230 Parallel Programming Lecture 19: SIMD and Multimedia Extensions Mary Hall November 13, 2012.
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
® GDC’99 Streaming SIMD Extensions Overview Haim Barad Project Leader/Staff Engineer Media Team Haifa, Israel Intel Corporation March.
EECS 583 – Class 22 Research Topic 4: Automatic SIMDization - Superword Level Parallelism University of Michigan December 10, 2012.
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
Xinsong1 Multimedia Extension Technology survey Xinsong Yang Electrical and Computer Engineering 734 Final Project 5/10/2002.
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
09/10/2010CS4961 CS4961 Parallel Programming Lecture 6: SIMD Parallelism in SSE-3 Mary Hall September 10,
SIMD Programming CS 240A, Winter Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures – usually both in same.
vector computer overlap arithmetic operation on the elements of the vectorinstruction-level.
Microarchitecture.
Visit for more Learning Resources
Why to use the assembly and why we need this course at all?
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 13 SIMD Multimedia Extensions Prof. Zhang Gang School.
Morgan Kaufmann Publishers
Vector Processing => Multimedia
Advanced Computer Architecture 5MD00 / 5Z032 Instruction Set Design
SIMD Programming CS 240A, 2017.
MMX technology for Pentium
MMX Multi Media eXtensions
STUDY AND IMPLEMENTATION
A study on SIMD architecture
EE 193: Parallel Computing
Samuel Larsen Saman Amarasinghe Laboratory for Computer Science
MMX technology for Pentium
Presentation transcript:

CS/EE 5810 CS/EE 6810 F00: 1 Multimedia

CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in computer architecture and microprocessor design” “… new media-rich applications … involve significant real-time processing of continuous media streams and make heavy use of vectors of packed 8-, 16-, and 32-bit integer and f.p.” –“How Multimedia Workloads will Change Processor Design,” Diefendorff & Dubey, IEEE Computer (9/97) Needs includes high memory bandwidth, high network bandwidth, continuous media data types, real-time response, fine-grain parallelism Also significant focus on system bus performance –Common bridge to the memory system and I/O –Critical performance component for SMP server platforms

CS/EE 5810 CS/EE 6810 F00: 3 Multimedia Workloads Multimedia –Video conferencing –Video authoring –Animation –Games Algorithms –Image compression (jpeg) –Video Compression (mpeg) –3-D graphics –encryption

CS/EE 5810 CS/EE 6810 F00: 4 Multimedia Characteristics Real-time response –Video, audio Continuous media data types –8-16 bits sufficient for many applications Data parallelism –E.g. share same operation to whole image –Vector or SIMD work well here Coarse-grained parallelism –E.g. video encoding/decoding, audio encoding/decoding Small loops –Most time spent in kernal –Amenable to hand-optimization High memory bandwidth –Video, 3d graphics –Caches not large enough

CS/EE 5810 CS/EE 6810 F00: 5 Multimedia ISA Extensions HP PA-RISC –MAX-2 SUN SPARC –VIS Intel x86 –MMX MIPS –MDMX PowerPC –Altivec

CS/EE 5810 CS/EE 6810 F00: 6 MMX “MMX Technology Extension to the Intel Architecture” Alex Peleg and Uri Weiser, IEEE Micro, August 1996 Goals –Improve performance of multimedia applications »Graphics, MPEG video »Image processing, speech recognition –Remain completely compatible with Intel x86 ISA –Minimize cost Approach –Use packed data types –Exploit SIMD parallelism –Make use of existing wide data paths

CS/EE 5810 CS/EE 6810 F00: 7 Data Types and Operands Three fixed-point integer types packed into 64 bit quad word –Packed Byte: 8 8-bit bytes –Packed Word: 4 16-bit words –Packed Doubleword: 2 32-bit words User-controlled fixed point Eight 64-bit GP registers (mm0-mm7) MMX shares FPU –Can’t do FP an MMX at the same time Random Access –Learned lesson from FP unit design.

CS/EE 5810 CS/EE 6810 F00: 8 MMX Operations 57 MMX instructions work on all data types Support for saturation arithmetic –Simplifies handling of underflow and overflow –Matches physical behavior Packed operations –Addition/subtraction, multiplication, compares, shifts Conversion operations –Pack/unpack Performance improvement –Fewer loads and stores –Fewer arithmetic operations, but more conversion

CS/EE 5810 CS/EE 6810 F00: 9 MMX Operations A3 A2 A1 A0 B3 B2 B1 B0 X X A3 X B3 A2 X B2 A1 X B1 A0 X B0 A3XB3 + A2XB2 Packed multiply-add To doubleword > > 00…0 11…100…011…1 Packed compare Greater-than word

CS/EE 5810 CS/EE 6810 F00: 10 Using MMX Assembly language coding Use of libraries –E.g. IDCT, DCT, matrix multiply… Use of C macros (“intrinsics”) –Generate optimized assembly code –Performs register allocation and instruction scheduling »MMX64 t0, t1; t0 = padd(t0, t1); –Requires intimate knowledge of MMX Could a compiler generate MMX code?

CS/EE 5810 CS/EE 6810 F00: 11 Chroma Keying Weatherman example »For (I = 0; I < imagesize; I++) new_image = (x[I] == blue) ? Y[I] : X[I]; –Movqmm3, mem1; load 8 pixels from weatherman movqmm4, mem2; load 8 pixels from map Pcmpeqmm1, mm3; generate select mask pandmm4, mm1; AND map with mask pandnmm1, mm3; AND weatherman with inverse mask pormm4, mm1; OR masked images together