Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 13 SIMD Multimedia Extensions Prof. Zhang Gang gzhang@tju.edu.cn School.

Slides:

Advertisements

Similar presentations

1 Review of Chapters 3 & 4 Copyright © 2012, Elsevier Inc. All rights reserved.

Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,

Instruction Set Design

Goal: Write Programs in Assembly

Streaming SIMD Extension (SSE)

Dr. Ken Hoganson, © August 2014 Programming in R COURSE NOTES 2 Hoganson Language Translation.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Computer Architecture A.

The University of Adelaide, School of Computer Science

Chapter 2 Instructions: Language of the Computer

Computers Organization & Assembly Language Chapter 1 THE 80x86 MICROPROCESSOR.

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

Computer Architecture and Data Manipulation Chapter 3.

Assembly & Machine Languages

NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design and implementation of a Multimedia Extension for a RISC Processor Eduardo.

Basics and Architectures

1 Chapter 04 Authors: John Hennessy & David Patterson.

Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.

5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.

5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.

Computer Organization and Architecture Instructions: Language of the Machine Hennessy Patterson 2/E chapter 3. Notes are available with photocopier 24.

Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.

Introduction to MMX, XMM, SSE and SSE2 Technology

CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.

Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.

Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.

SIMD Programming CS 240A, Winter Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures – usually both in same.

Computer Architecture. Instruction Set “The collection of different instructions that the processor can execute it”. Usually represented by assembly codes,

Answer CHAPTER FOUR.

Prof. Zhang Gang School of Computer Sci. & Tech.

Topics to be covered Instruction Execution Characteristics

Advanced Architectures

Overview of Instruction Set Architectures

William Stallings Computer Organization and Architecture 6th Edition

A Closer Look at Instruction Set Architectures

Instruction Set Architecture

Morgan Kaufmann Publishers

3- Parallel Programming Models

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 14 The Roofline Visual Performance Model Prof. Zhang Gang

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang

Overview Introduction General Register Organization Stack Organization

A Closer Look at Instruction Set Architectures

Prof. Zhang Gang School of Computer Sci. & Tech.

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 22 Similarities & Differences between Vector Arch & GPUs Prof. Zhang Gang.

Prof. Zhang Gang School of Computer Sci. & Tech.

Basics Of X86 Architecture

Morgan Kaufmann Publishers

Prof. Zhang Gang School of Computer Sci. & Tech.

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang

Computer Organization and Assembly Language (COAL)

Vector Processing => Multimedia

The University of Adelaide, School of Computer Science

Special Instructions for Graphics and Multi-Media

CS170 Computer Organization and Architecture I

Computer Programming Machine and Assembly.

Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*

The University of Adelaide, School of Computer Science

Multivector and SIMD Computers

ECEG-3202 Computer Architecture and Organization

ECEG-3202 Computer Architecture and Organization

EE 193: Parallel Computing

COMS 361 Computer Organization

Chapter 12 Pipelining and RISC

Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*

Computer Organization

Presentation transcript:

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 13 SIMD Multimedia Extensions Prof. Zhang Gang gzhang@tju.edu.cn School of Computer Sci. & Tech. Tianjin University, Tianjin, P. R. China

Characteristics of media applications Many media applications operate on narrower data types than the 32-bit processors were optimized for. Many graphics systems used 8 bits to represent each of the three primary colors plus 8 bits for transparency. Depending on the application, audio samples are usually represented with 8 or 16 bits. SIMD Multimedia Extensions started with these simple observations.

Figure 4.8 Summarizes typical multimedia SIMD instructions Partitioned adders There are carry chains within partitioned adder A processor using a 256-bit adder could perform simultaneous operations on short vectors of thirty-two 8-bit operands sixteen 16-bit operands eight 32-bit operands four 64-bit operands Figure 4.8 Summarizes typical multimedia SIMD instructions

What are the differences between vector and SIMD instructions? Like vector instructions, a SIMD instruction specifies the same operation on vectors of data. Unlike vector machines with large register files, SIMD instructions tend to specify fewer operands and hence use much smaller register files. VMIPS vector register can hold as many as sixty- four 64-bit elements in each of 8 vector registers

SIMD extensions have three major omissions Fix the number of data operands in the opcode Lead to the addition of hundreds of instructions in the MMX, SSE, and AVX extensions of the x86 architecture Does not offer the more sophisticated addressing modes of vector architectures Not have strided accesses and gather-scatter accesses. Does not offer the mask registers Not support conditional execution of elements These omissions make it harder for the compiler to generate SIMD code and increase the difficulty of programming in SIMD assembly language.

Explanation of abbreviations MMX--Multimedia Extensions(in 1996) Eight 8-bit integer ops or four 16-bit integer ops SSE--Streaming SIMD Extensions (in 1999) Eight 16-bit integer ops Four 32-bit integer/fp ops or two 64-bit integer/fp ops SSE2 in 2001/SSE3 in 2004/SSE4 in 2007 AVX--Advanced Vector Extensions (in 2010) Four 64-bit integer/fp ops

Why are Multimedia SIMD Extensions so popular? There are these weaknesses, why are Multimedia SIMD Extensions so popular? Cost little to add to the standard arithmetic unit and easy to implement Require little extra state compared to vector architectures Do not need a lot of memory bandwidth Does not have to deal with problems in virtual memory when a page fault in the middle of the vector

Exercises Why can a processor using a 256-bit adder perform simultaneous operations on short vectors of eight 32-bit operands? What is the meaning of MMX? What is the meaning of SSE? What is the meaning of AVX? What are the major omissions of Multimedia SIMD extensions? Why are Multimedia SIMD Extensions so popular?