Advanced Computer Architecture 5MD00 / 5Z032 Instruction Set Design

Slides:



Advertisements
Similar presentations
CH10 Instruction Sets: Characteristics and Functions
Advertisements

Instruction Set Design
Goal: Write Programs in Assembly
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
CEG3420 Lec2.1 ©UCB Fall 1997 ISA Review CEG3420 Computer Design Lecture 2.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
EECC551 - Shaaban #1 Lec # 2 Fall Instruction Set Architecture (ISA) “... the attributes of a [computing] system as seen by the programmer,
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
INSTRUCTION SET ARCHITECTURES
CMSC 611: Advanced Computer Architecture Instruction Set Architecture Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.
Computer Organization and Architecture
Computer Organization and Architecture
CSE378 ISA evolution1 Evolution of ISA’s ISA’s have changed over computer “generations”. A traditional way to look at ISA complexity encompasses: –Number.
COMP381 by M. Hamdi 1 Instruction Set Architectures.
Recap.
1 RISC Machines l RISC system »instruction –standard, fixed instruction format –single-cycle execution of most instructions –memory access is available.
Unit -II CPU Organization By- Mr. S. S. Hire. CPU organization.
Instruction Set Architecture
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix A Authors: John Hennessy & David Patterson.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
Lecture 4: MIPS Subroutines and x86 Architecture Professor Mike Schulte Computer Architecture ECE 201.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
MMX technology for Pentium. Introduction Multi Media Extension (MMX) for Pentium Processor Which has built in 80X87 Can be switched for multimedia computations.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
Computer Architecture and Organization
Chapter 4 Instruction Set Examples Advanced Computer Architecture COE 501.
Oct. 25, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 9: Alternative Instruction Sets * Jeremy R. Johnson Wed. Oct. 25, 2000.
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Introduction to MMX, XMM, SSE and SSE2 Technology
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Csci 136 Computer Architecture II – Summary of MIPS ISA Xiuzhen Cheng
CS 211: Computer Architecture Lecture 2 Instructor: Morris Lancaster.
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
INSTRUCTION SET PRINCIPLES. Computer Architecture’s Changing Definition  1950s to 1960s: Computer Architecture Course = Computer Arithmetic  1970s to.
Computer Architecture. Instruction Set “The collection of different instructions that the processor can execute it”. Usually represented by assembly codes,
1/11/02CSE ISA's part 2 Instruction Set Architectures Part 2 I/O systemInstr. Set Proc. Compiler Operating System Application Digital Design Circuit.
Overview of Instruction Set Architectures
A Closer Look at Instruction Set Architectures
ELEN 468 Advanced Logic Design
A Closer Look at Instruction Set Architectures
Prof. Sirer CS 316 Cornell University
Advanced Computer Architecture 5MD00 / 5Z033 Instruction Set Design
MMX technology for Pentium
CS170 Computer Organization and Architecture I
The University of Adelaide, School of Computer Science
Instruction Set Principles
Chapter 2 Instruction Set Principles
Computer Instructions
Chapter 2. Instruction Set Principles and Examples
Computer Architecture
Computer Architecture
Evolution of ISA’s ISA’s have changed over computer “generations”.
Introduction to Microprocessor Programming
Instruction Set Principles
Evolution of ISA’s ISA’s have changed over computer “generations”.
Evolution of ISA’s ISA’s have changed over computer “generations”.
Other Processors Having learnt MIPS, we can learn other major processors. Not going to be able to cover everything; will pick on the interesting aspects.
Lecture 4: Instruction Set Design/Pipelining
Chapter 11 Processor Structure and function
MMX technology for Pentium
Evolution of ISA’s ISA’s have changed over computer “generations”.
Computer Organization
Chapter 10 Instruction Sets: Characteristics and Functions
Presentation transcript:

Advanced Computer Architecture 5MD00 / 5Z032 Instruction Set Design Henk Corporaal www.ics.ele.tue.nl/~heco h.corporaal@tue.nl TUEindhoven 2007

Lecture overview ISA and Evolution Architecture classes Addressing Operands Operations Encoding RISC SIMD extensions 9/20/2018 ACA H.Corporaal

Instruction Set Architecture The instruction set architecture serves as the interface between software and hardware It provides the mechanism by which the software tells the hardware what should be done Architecture definition: “the architecture of a system/processor is (a minimal description of) its behavior as observed by its immediate users” instruction set architecture hardware software

Instruction Set Design Issues Where are operands stored? registers, memory, stack, accumulator How many explicit operands are there? 0, 1, 2, or 3 How is the operand location specified? register, immediate, indirect, . . . What type & size of operands are supported? byte, int, float, double, string, vector. . . What operations are supported? add, sub, mul, move, compare . . .

Operands How are operands designated? What is the format of the data? fixed – always in the same place by opcode – always the same for groups of instructions by a field in the instruction – requires decode first What is the format of the data? binary character decimal (packed and unpacked) floating-point – IEEE 754 (others used less and less) size – 8-, 16-, 32-, 64-, 128-bit What is the influence on ISA? 9/20/2018 ACA H.Corporaal

Operand Locations 9/20/2018 ACA H.Corporaal

Classifying ISAs Accumulator (before 1960): Stack (1960s to 1970s): 1 address add A acc ¬ acc + mem[A] Stack (1960s to 1970s): 0 address add tos ¬ tos + next Memory-Memory (1970s to 1980s): 2 address add A, B mem[A] ¬ mem[A] + mem[B] 3 address add A, B, C mem[A] ¬ mem[B] + mem[C] Register-Memory (1970s to present): 2 address add R1, A R1 ¬ R1 + mem[A] load R1, A R1 ¬ mem[A] Register-Register (Load/Store) (1960s to present): 3 address add R1, R2, R3 R1 ¬ R2 + R3 load R1, R2 R1 ¬ mem[R2] store R1, R2 mem[R1] ¬ R2 9/20/2018 ACA H.Corporaal

Evolution of Architectures Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language Based Concept of a Family (B5000 1963) (IBM 360 1964) General Purpose Register Machines Complex Instruction Sets Load/Store Architecture (CDC 6600, Cray 1 1963-76) (Vax, Intel 8086 1977-80) RISC (Mips,Sparc,88000,IBM RS6000, . . .1987+)

Addressing Modes Types Calculation of Effective Address Register – data in a register Immediate – data in the instruction Memory – data in memory Calculation of Effective Address Direct – address in instruction Indirect – address in register Displacement – address = register or PC + offset Indexed – address = register + register Memory Indirect – address at address in register What is the influence on ISA? 9/20/2018 ACA H.Corporaal

Types of Addressing Mode (VAX) Addressing Mode Example Action 1. Register direct Add R4, R3 R4 <- R4 + R3 2. Immediate Add R4, #3 R4 <- R4 + 3 3. Displacement Add R4, 100(R1) R4 <- R4 + M[100 + R1] 4. Register indirect Add R4, (R1) R4 <- R4 + M[R1] 5. Indexed Add R4, (R1 + R2) R4 <- R4 + M[R1 + R2] 6. Direct Add R4, (1000) R4 <- R4 + M[1000] 7. Memory Indirect Add R4, @(R3) R4 <- R4 + M[M[R3]] 8. Autoincrement Add R4, (R2)+ R4 <- R4 + M[R2] R2 <- R2 + d 9. Autodecrement Add R4, (R2)- R4 <- R4 + M[R2] R2 <- R2 - d 10. Scaled Add R4, 100(R2)[R3] R4 <- R4 + M[100 + R2 + R3*d] Studies by [Clark and Emer] indicate that modes 1-4 account for 93% of all operands on the VAX

Operations Types Addressing ALU – Integer arithmetic and logical functions Data transfer – Loads/stores Control – Branch, jump, call, return, traps, interrupts System – O/S calls, virtual memory management Floating point – Floating point arithmetic Decimal – Decimal arithmetic String – moves, compares, search, etc. Graphics – Pixel/vertex operations Vector – Vector (SIMD) functions Addressing Which addressing modes for which operands are supported? 9/20/2018 ACA H.Corporaal

80x86 Instruction Frequency

Relative Frequency of Control Instructions Design hardware to handle branches quickly, since these occur most frequently

Frequency of Operand Sizes on 32-bit Load-Store Machines For floating-point want good performance for 64 bit operands. For integer operations want good performance for 32 bit operands Recent architectures also support 64-bit integers

Instruction Encoding Variable Fixed Hybrid Instruction length varies based on opcode and address specifiers For example, VAX instructions vary between 1 and 53 bytes, while x86 instruction vary between 1 and 17 bytes. Good code density, but difficult to decode and pipeline Fixed Only a single size for all instructions For example MIPS, Power PC, Sparc all have 32 bit instructions Not as good code density, but easier to decode and pipeline Hybrid Have multiple format lengths specified by the opcode For example, IBM 360/370 Compromise between code density and ease of decode

Instruction Encoding 9/20/2018 ACA H.Corporaal

Example: MIPS 9/20/2018 ACA H.Corporaal

Compilers and ISA Compiler Goals Multiple Source Compilers All correct programs compile correctly Most compiled programs execute quickly Most programs compile quickly Achieve small code size Provide debugging support Multiple Source Compilers Same compiler can compile different languages Multiple Target Compilers Same compiler can generate code for different machines

Compilers Phases Compilers use phases to manage complexity: Front end Convert language to intermediate form High level optimizer Procedure inlining and loop transformations Global optimizer Global and local optimization, plus register allocation Code generator (and assembler) Dependency elimination, instruction selection, scheduling

Designing ISA to Improve Compilation Provide enough general purpose registers to ease register allocation ( more than 16) Provide regular instruction sets by keeping the operations, data types, and addressing modes orthogonal Provide primitive constructs rather than trying to map to a high-level language Allow compilers to help make the common case fast

A "Typical" RISC 32-bit fixed format instruction (few formats) 32 32-bit GPR 3-address, reg-reg arithmetic instruction Single address mode for load/store: base + displacement no indirection Simple branch conditions Pipelined implementation Separate Instruction and Data level-1 caches Delayed branch ? 9/20/2018 ACA H.Corporaal

Comparison MIPS with 80x86 How would you expect the x86 and MIPS architectures to compare on the following: CPI on SPEC benchmarks Ease of design and implementation Ease of writing assembly language & compilers Code density Overall performance What other advantages/disadvantages are there to the two architectures?

Graphics and Multimedia Instruction Set Extensions Support graphics and multimedia applications Intel’s MMX Technology Intel’s Internet Streaming SIMD Extensions AMD’s 3DNow! Technology Sun’s Visual Instruction Set Motorola’s and IBM’s AltiVec Technology These extensions improve the performance of Computer-aided design Internet applications Computer visualization Video games Speech recognition 9/20/2018 ACA H.Corporaal

MMX Data Types MMX Technology supports operations on the following 64-bit integer data types: Packed byte (eight 8-bit elements) Packed word (four 16-bit elements) Packed double word (two 32-bit elements) Packed quad word (one 64-bit elements) 9/20/2018 ACA H.Corporaal

PADD[W]: Packed add word SIMD Operations MMX Technology allows a Single Instruction to work on Multiple pieces of Data (SIMD) PADD[W]: Packed add word In the above example, 4 parallel adds are performed on 16-bit elements Most MMX instructions only require a single cycle A3 A2 A1 A0 B3 B2 B1 B0 A3+B3 A2+B2 A1+B1 A0+B0 9/20/2018 ACA H.Corporaal

Saturating Arithmetic Both wrap-around and saturating adds are supported With saturating arithmetic, results that overflow/underflow are set to the largest/smallest value PADD[W]: Packed wrap-around add PADDUS[W]: Packed saturating add 9/20/2018 ACA H.Corporaal

Pack and Unpack Instructions Pack and unpack instructions provide conversion between standard data types and packed data types PACKSS[DW]: Pack signed, with saturating, double to packed word 9/20/2018 ACA H.Corporaal

Multiply-Add Operations Many graphics applications require multiply-accumulate operations Vector Dot Products Matrix Multiplies Fast Fourier Transforms (FFTs) Filter implementations PMADDWD: Packed multiply-add word to double 9/20/2018 ACA H.Corporaal

Vector Dot Product A dot product on an 8-element vector can be performed using 9 MMX instructions Without MMX 40 instructions are required a0*c0+..+ a3*c3 a4*c4+..+ a7*c7 a0*c0+..+ a7*c7 9/20/2018 ACA H.Corporaal

Packed Compare Instructions Packed compare instructions allow a bit mask to be set or cleared This is useful when images with certain qualities need to be extracted 9/20/2018 ACA H.Corporaal

MMX Instructions MMX Technology adds 57 new instructions to the x86 architecture. Some of these instructions include PADD(b, w, d) Packed addition PSUB(b, w, d) Packed subtraction PCMPE(b, w, d) Packed compare equal PMULLw Packed word multiply low PMULHw Packed word multiply high PMADDwd Packed word multiply-add PSRL(w, d, q) Pack shift right logical PACKSS(wb, dw) Pack data PUNPCK(bw, wd, dq) Unpack data PAND, POR, PXOR Packed logical operations 9/20/2018 ACA H.Corporaal

Performance Comparison The following shows the performance of Pentium processors with and without MMX Technology 1.64 255.43 156.00 Overall 2.13 318.90 149.80 Audio 1.03 166.44 161.52 3D geometry 4.67 743.90 159.03 Image Processing 1.72 268.70 155.52 Video Speedup With MMX Without MMX Application 9/20/2018 ACA H.Corporaal

MMX Technology Summary MMX technology extends the Intel x86 architecture to improve the performance of multimedia and graphics applications. It provides a speedup of 1.5 to 2.0 for certain applications. MMX instructions are hand-coded in assembly or implemented as libraries to achieve high performance. MMX data types use the x86 floating point registers to avoid adding state to the processor. Makes it easy to handle context switches Makes it hard to perform MMX and floating point instructions at the same time Only increase the chip area by about 5%. 9/20/2018 ACA H.Corporaal

Questions on MMX What are the strengths and weaknesses of MMX Technology? How could MMX Technology potentially be improved? How did the developers of MMX preserve backward compatibility with the x86 architecture? Why was this important? What are the disadvantages of this approach? What restrictions/limitations are there on the use of MMX Technology? 9/20/2018 ACA H.Corporaal

Internet Streaming SIMD Extensions Intel’s Internet Streaming SIMD Extensions (ISSE) Help improve the performance of video and 3D applications Are designed for streaming data, which is used once and then discarded. 70 new instructions beyond MMX Technology Adds new 128-bit registers Provide the ability to perform parallel floating point operations Four parallel operations on 32-bit numbers Reciprocal and reciprocal root instructions - normalization Packed average instruction – Motion compensation Provide data prefetch instructions Make certain applications 1.5 to 2.0 times faster 9/20/2018 ACA H.Corporaal