Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 13 SIMD Multimedia Extensions Prof. Zhang Gang gzhang@tju.edu.cn School.

Slides:



Advertisements
Similar presentations
1 Review of Chapters 3 & 4 Copyright © 2012, Elsevier Inc. All rights reserved.
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Instruction Set Design
Goal: Write Programs in Assembly
Streaming SIMD Extension (SSE)
Dr. Ken Hoganson, © August 2014 Programming in R COURSE NOTES 2 Hoganson Language Translation.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Computer Architecture A.
The University of Adelaide, School of Computer Science
Chapter 2 Instructions: Language of the Computer
Computers Organization & Assembly Language Chapter 1 THE 80x86 MICROPROCESSOR.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Computer Architecture and Data Manipulation Chapter 3.
Assembly & Machine Languages
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design and implementation of a Multimedia Extension for a RISC Processor Eduardo.
Basics and Architectures
1 Chapter 04 Authors: John Hennessy & David Patterson.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.
5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Computer Organization and Architecture Instructions: Language of the Machine Hennessy Patterson 2/E chapter 3. Notes are available with photocopier 24.
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Introduction to MMX, XMM, SSE and SSE2 Technology
CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
SIMD Programming CS 240A, Winter Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures – usually both in same.
Computer Architecture. Instruction Set “The collection of different instructions that the processor can execute it”. Usually represented by assembly codes,
Answer CHAPTER FOUR.
Prof. Zhang Gang School of Computer Sci. & Tech.
Topics to be covered Instruction Execution Characteristics
Advanced Architectures
Overview of Instruction Set Architectures
William Stallings Computer Organization and Architecture 6th Edition
A Closer Look at Instruction Set Architectures
Instruction Set Architecture
Morgan Kaufmann Publishers
3- Parallel Programming Models
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 14 The Roofline Visual Performance Model Prof. Zhang Gang
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang
Overview Introduction General Register Organization Stack Organization
A Closer Look at Instruction Set Architectures
Prof. Zhang Gang School of Computer Sci. & Tech.
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 22 Similarities & Differences between Vector Arch & GPUs Prof. Zhang Gang.
Prof. Zhang Gang School of Computer Sci. & Tech.
Basics Of X86 Architecture
Morgan Kaufmann Publishers
Prof. Zhang Gang School of Computer Sci. & Tech.
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Computer Organization and Assembly Language (COAL)
Vector Processing => Multimedia
The University of Adelaide, School of Computer Science
Special Instructions for Graphics and Multi-Media
CS170 Computer Organization and Architecture I
Computer Programming Machine and Assembly.
Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*
The University of Adelaide, School of Computer Science
Multivector and SIMD Computers
ECEG-3202 Computer Architecture and Organization
ECEG-3202 Computer Architecture and Organization
EE 193: Parallel Computing
COMS 361 Computer Organization
Chapter 12 Pipelining and RISC
Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*
Computer Organization
Presentation transcript:

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 13 SIMD Multimedia Extensions Prof. Zhang Gang gzhang@tju.edu.cn School of Computer Sci. & Tech. Tianjin University, Tianjin, P. R. China

Characteristics of media applications Many media applications operate on narrower data types than the 32-bit processors were optimized for. Many graphics systems used 8 bits to represent each of the three primary colors plus 8 bits for transparency. Depending on the application, audio samples are usually represented with 8 or 16 bits. SIMD Multimedia Extensions started with these simple observations.

Figure 4.8 Summarizes typical multimedia SIMD instructions Partitioned adders There are carry chains within partitioned adder A processor using a 256-bit adder could perform simultaneous operations on short vectors of thirty-two 8-bit operands sixteen 16-bit operands eight 32-bit operands four 64-bit operands Figure 4.8 Summarizes typical multimedia SIMD instructions

What are the differences between vector and SIMD instructions? Like vector instructions, a SIMD instruction specifies the same operation on vectors of data. Unlike vector machines with large register files, SIMD instructions tend to specify fewer operands and hence use much smaller register files. VMIPS vector register can hold as many as sixty- four 64-bit elements in each of 8 vector registers

SIMD extensions have three major omissions Fix the number of data operands in the opcode Lead to the addition of hundreds of instructions in the MMX, SSE, and AVX extensions of the x86 architecture Does not offer the more sophisticated addressing modes of vector architectures Not have strided accesses and gather-scatter accesses. Does not offer the mask registers Not support conditional execution of elements These omissions make it harder for the compiler to generate SIMD code and increase the difficulty of programming in SIMD assembly language.

Explanation of abbreviations MMX--Multimedia Extensions(in 1996) Eight 8-bit integer ops or four 16-bit integer ops SSE--Streaming SIMD Extensions (in 1999) Eight 16-bit integer ops Four 32-bit integer/fp ops or two 64-bit integer/fp ops SSE2 in 2001/SSE3 in 2004/SSE4 in 2007 AVX--Advanced Vector Extensions (in 2010) Four 64-bit integer/fp ops

Why are Multimedia SIMD Extensions so popular? There are these weaknesses, why are Multimedia SIMD Extensions so popular? Cost little to add to the standard arithmetic unit and easy to implement Require little extra state compared to vector architectures Do not need a lot of memory bandwidth Does not have to deal with problems in virtual memory when a page fault in the middle of the vector

Exercises Why can a processor using a 256-bit adder perform simultaneous operations on short vectors of eight 32-bit operands? What is the meaning of MMX? What is the meaning of SSE? What is the meaning of AVX? What are the major omissions of Multimedia SIMD extensions? Why are Multimedia SIMD Extensions so popular?