Chapter 4 Instruction Set Examples Advanced Computer Architecture COE 501.

Slides:



Advertisements
Similar presentations
CH10 Instruction Sets: Characteristics and Functions
Advertisements

Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
EECC551 - Shaaban #1 Lec # 2 Fall Instruction Set Architecture (ISA) “... the attributes of a [computing] system as seen by the programmer,
INSTRUCTION SET ARCHITECTURES
ELEN 468 Advanced Logic Design
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
Computer Organization and Architecture
Computer Organization and Architecture
Pentium 4 and IA-32 ISA ELEC 5200/6200 Computer Architecture and Design, Fall 2006 Lectured by Dr. V. Agrawal Lectured by Dr. V. Agrawal Kyungseok Kim.
CSE378 ISA evolution1 Evolution of ISA’s ISA’s have changed over computer “generations”. A traditional way to look at ISA complexity encompasses: –Number.
COMP381 by M. Hamdi 1 Instruction Set Architectures.
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Classifying Instruction Set Architectures
S. Barua – CPSC 440 CHAPTER 2 INSTRUCTIONS: LANGUAGE OF THE COMPUTER Goals – To get familiar with.
Recap.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
What is an instruction set?
Unit -II CPU Organization By- Mr. S. S. Hire. CPU organization.
©UCB CPSC 161 Lecture 5 Prof. L.N. Bhuyan
CH12 CPU Structure and Function
Computer performance.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
1  2004 Morgan Kaufmann Publishers Instructions: bne $t4,$t5,Label Next instruction is at Label if $t4≠$t5 beq $t4,$t5,Label Next instruction is at Label.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix A Authors: John Hennessy & David Patterson.
CET 520/ Gannod1 The MIPS Architecture Section 2.12.
1 Appendix B Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set Control flow instructions Instruction format.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Digital System Architecture Instruction Set Architecture, the DLX and the 80x86 FALL 2000 Pradondet Nilagupta (orginal note from Dr. Robert F.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
Lecture 4: MIPS Subroutines and x86 Architecture Professor Mike Schulte Computer Architecture ECE 201.
MMX technology for Pentium. Introduction Multi Media Extension (MMX) for Pentium Processor Which has built in 80X87 Can be switched for multimedia computations.
Computer Architecture and Organization
Computer Architecture EKT 422
Oct. 25, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 9: Alternative Instruction Sets * Jeremy R. Johnson Wed. Oct. 25, 2000.
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Introduction to MMX, XMM, SSE and SSE2 Technology
EEL5708/Bölöni Lec 8.1 9/19/03 September, 2003 Lotzi Bölöni Fall 2003 EEL 5708 High Performance Computer Architecture Lecture 5 Intel 80x86.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
Choices in Designing an ISA
ELEN 468 Advanced Logic Design
Prof. Sirer CS 316 Cornell University
Advanced Computer Architecture 5MD00 / 5Z033 Instruction Set Design
Advanced Computer Architecture 5MD00 / 5Z032 Instruction Set Design
MMX technology for Pentium
CS170 Computer Organization and Architecture I
ECE232: Hardware Organization and Design
Chapter 9 Instruction Sets: Characteristics and Functions
Computer Instructions
Evolution of ISA’s ISA’s have changed over computer “generations”.
Instruction Set Principles
Evolution of ISA’s ISA’s have changed over computer “generations”.
Evolution of ISA’s ISA’s have changed over computer “generations”.
Other Processors Having learnt MIPS, we can learn other major processors. Not going to be able to cover everything; will pick on the interesting aspects.
Chapter 11 Processor Structure and function
MMX technology for Pentium
Evolution of ISA’s ISA’s have changed over computer “generations”.
Computer Organization
Chapter 10 Instruction Sets: Characteristics and Functions
Presentation transcript:

Chapter 4 Instruction Set Examples Advanced Computer Architecture COE 501

DLX Architecture Introduced by Hennessey and Patterson in 1990 –Derived from many different instruction set architectures from MIPS, Sun, IBM, Intel, HP, AMD, etc. DLX is a typical RISC architecture. –32-bit fixed length instructions –3 instruction formats –Load/store architecture –Simple branch conditions (no condition codes) DLX registers –32 32-bit general-purpose registers (R0 = 0) –32 32-bit (or bit) floating point registers –Special purpose registers (e.g., FP Status and PC)

DLX Design Decisions DLX is based on the following design decisions –Use general purpose registers with a load-store architecture –Support commonly used addressing modes »displacement, immediate, and register differed –Support simple instructions that occur frequently »load, store, add, subtract, move, and, shift, compare equal, branch, jump, call, and return –Support commonly required data sizes »8 (byte), 16 (half word), and 32-bit (word) integers »32 (float) and 64-bit (double) floating point –Use fixed length instructions that are easy to decode –Provide plenty of general purpose registers and separate floating point registers

DLX Instruction Formats Op rs1rd immediate Op Op rs1rs2 offset added to PC rd Register-Register (R-type)ADD R1, R2, R Register-Immediate (I-type)SUB R1, R2, #3 Jump / Call (J-type)JUMP end func (ALU imm. operations, loads and stores, conditional branch, jump (and link) (jump, jump and link, trap and return from exception) (ALI reg. operations, read/write special registers and moves)

Examples of DLX Instructions Data Transfer –LW R1, 30(R2) Regs[R1] <= Mem[30 + Regs[R2]] –SD F0, 40(R3) Mem[40 + Regs[3]] <= Regs[F0] Mem[44 + Regs[3]] <= Regs[F1] –Loads and stores also for bytes, half words, and floats –How would you perform a register move? a no-op? Arithmetic and Logic –SUB R1, R2, R3Regs[R1] <= Regs[R2] - Regs[R3] –SLLI R1, R2, #5Regs[R1] <= Regs[R2] << #5 –LHI R1, #42 Regs[R1] <= 42##0 16 –SLT R1, R2, R3if (Regs[R2] < Regs[R3]) Regs[1] <= 1 else Regs[1] <= 0 - How would you load a 32 bit immediate into a register? 16

Examples of DLX Instructions Control –JALR R2Regs[31] <= PC+4, PC <= Regs[R2] –JR R3PC <= Regs[R3] –BENZ R4, nameif (Regs[R4] != 0) PC <- name else PC <- PC + 4 –How would you implement a subroutine call and return? Floating Point –MULF F1, F2, F3Regs[F1] <= Regs[F2] * Regs[F3] –ADDD F0, F2, F4Regs[F0&F1] < Regs[F2&F3] + Regs[F4&F5] –What would be difficult about adding a floating-point multiply and add instruction to DLX?

DLX Instruction Set Appendix C.3 Data transfer –Load/store word –Load/store halfword or byte (singed/unsigned loads) –Load/store floating point single/double –Register moves Arithmetic and Logic –Add/subtract (signed or unsigned, reg. or imm.) –Multiply/divide (signed or unsigned, operands in FP reg.) –And, or, xor (reg. or imm.) –Load high word (loads upper half of a reg. with imm.) –Shifts (LL, RL, RA) (reg. or imm.) –Set conditionals (LT, GT, LE, GE, EQ, NE) (reg. or imm.)

DLX Instruction Set Control –Conditional branch on register (compare with zero) –Conditional on FP status bit (bit true or false) –Jump, jump register (26 bit imm. or reg.) –Jump and link, jump and link register (26 bit imm. or reg.) –Trap, return from exception (trap to and return from O.S.) Floating Point –Add, subtract, multiply, divide (single or double) –FP converts (convert between single, double, and integer) –FP compares (single or double, sets bit in FP status)

DLX Instruction Usage

DLX Summary Simple load/store architecture –Only accesses memory on loads/stores –All other operations use registers and immediate Designed for pipeline efficiency –Fixed length instruction encoding –Simple instructions Easy to compile to –Simple, frequently used instructions –Orthogonal instruction set –Few addressing modes Reduces execution time by –reducing CPI –reducing clock rate

History of the Intel 80x : Intel invents microprocessor : 8080 introduced –8-bit microprocessor, used in the Altair personal computer –Accumulator machine 1978: 8086 introduced –16 bit microprocessor –Accumulator plus dedicated registers 1980: IBM selects 8088 as basis for IBM PC –8088 is 8-bit external bus version of : 8087 floating point coprocessor –adds 60 floating point instructions –80 bit floating point registers –uses hybrid stack/register scheme

History of the Intel 80x : introduced –24-bit address –memory mapping & protection 1985: introduced –32-bit address, 32-bit GP registers –Support for multitasking 1989: introduced –Built in math coprocessor –More powerful cache and instruction pipelining 1992: Pentium introduced –Superscalar processor (multiple instructions per cycle) 1995: Pentium Pro introduced –More aggressive superscalar with register renaming, branch prediction, and speculative execution

History of the Intel 80x86 Pentium II introduced –Incoporated MMX Technology –57 new instructions for processing video, audio, and graphics –Support single instructions that operate on multiple data (SIMD) –8 data of 8 bits, 4 data of 16 bits, or 2 data by 32 bits Pentium III introduced –Features Internet Streaming SIMD extensions –Improve performance of 3D graphics and internet applications –Allow one instruction to executed on 4 pairs of 32-bit floating point data Itanium introduced –Release of IA-64 architecture –New 64-bit RISC-like architecture –128-bit instructions bundles (3 instructions per bundle) Intel architecture was due to the desire for backward compatability –Highly irregular architecture –Over 50 million sold per year

Intel 80x86 Integer Registers

X86 Operand Types x86 instructions typically have two operands, where one operand is both a source and a destination operand. Possible combinations include Source/destination typeSecond source typeRegister RegisterImmediate RegisterMemory MemoryRegister MemoryImmediate No memory-memory or immediate-immediate Immediates can be 8, 16, or 32 bits

Intel 80x86 Floating Point Registers Operations on the top of stack and one register within the stack

Usage of Intel 80x86 Floating Point Registers NASA 7Spice Stack (2nd operand ST(1))0.3%2.0% Register (2nd operand ST(i), i>1)23.3%8.3% Memory76.3%89.7% Above are dynamic instruction percentages (i.e., based on counts of executed instructions) Stack unused by Solaris compilers for fastest execution

80x86 Instruction Format Instructions sizes vary from 1 to 17 bytes

80x86 Instructions Data movement (move, push, pop) Arithmetic and logic (logic ops, tests CCs, shifts, integer and decimal arithmetic) Control flow (branches, jumps, calls, returns) String instructions (move and compare) FP data movement (load, load const., store) Arithmetic instructions (add, subtract, multiply, divide, square root, absolute value) Comparisons (can send result to ALU) Transcendental functions (sin, cos, log, etc.)

80x86 Addressing Mode Usage for 32-bit Mode Addressing ModeGccEspr.NASA7SpiceAvg. Register indirect10%10%6%2%7% Base + 8-bit disp46%43%32%4%31% Base + 32-bit disp2%0%24%10%9% Indexed1%0%1%0%1% Based indexed + 8b disp0%0%4%0%1% Based indexed + 32b disp0%0%0%0%0% Base + Scaled Indexed12%31%9%0%13% Base + Scaled Index + 8b disp2%1%2%0%1% Base + Scaled Index + 32b disp6%2%2%33%11% 32-bit Direct19%12%20%51%26%

80x86 Length Distribution

Instruction Counts: 80x86 vs. DLX SPEC pgm x86DLX DLX÷86 gcc 3,771,327,742 3,892,063, espresso 2,216,423,413 2,801,294, spice15,257,026,309 16,965,928, nasa715,603,040,963 6,118,740, DLX tends to perform more instructions for integer programs, while the 80x86 performs more instructions for floating point programs 80x86 performs many more data transfers –Two to four times more for floating point programs –About 1.25 times more for integer programs –Why so many more for floating point

Comparison How would you expect the x86 and MIPS architectures to compare on the following. –CPI on SPEC benchmarks –Ease of design and implementation –Ease of writing assembly language & compilers –Code density –Overall performance What other advantages/disadvantages are there to the two architectures.

Graphics and Multimedia Instruction Set Extensions Several companies have extended their computer’s instruction sets to better support graphics and multimedia applications. –Intel’s MMX Technology –Intel’s Internet Streaming SIMD Extensions –AMD’s 3DNow! Technology –Sun’s Visual Instruction Set –Motorola’s and IBM’s AltiVec Technology These extensions improve the performance of –Computer-aided design –Internet applications –Computer visualization –Video games –Speech recognition

MMX Data Types MMX Technology supports operations on the following 64-bit integer data types. Packed byte (eight 8-bit elements) Packed word (four 16-bit elements) Packed double word (two 32-bit elements) Packed quad word (one 64-bit elements)

SIMD Operations MMX Technology allows a Single Instruction to work on Multiple pieces of Data (SIMD). PADD[W]: Packed add word In the above example, 4 parallel adds are performed on 16-bit elements. Most MMX instructions only require a single cycle. A3A2A1A0 B3B2B1B0 A3+B3A2+B2A1+B1A0+B0

Saturating Arithmetic Both wrap-around and saturating adds are supported. With saturating arithmetic, results that overflow are set to the largest value. PADD[W]: Packed wrap-around addPADDUS[W]: Packed saturating add

Pack and Unpack Instructions Pack and unpack instructions provide conversion between standard data types and packed data types PACKSS[DW]: Packed signed with saturating double to packed word

Multiply-Add Operations Many graphics applications require multiply- accumulate operations –Vector Dot Products –Matrix Multiplies –Fast Fourier Transforms (FFTs) –Filter implementations PMADDWD: Packed multiply-add word to double

Vector Dot Product A dot product on an 8-element vector can be performed using 9 MMX instructions Without MMX 40 instructions are required a0*c0+..+ a3*c3a4*c4+..+ a7*c700 a0*c0+..+ a7*c7

Packed Compare Instructions Packed compare instructions allow a bit mask to be set or cleared This is useful when images with certain qualities need to be extracted.

MMX Instructions MMX Technology adds 57 new instructions to the x86 architecture. Some of these instructions include –PADD(b, w, d)Packed addition –PSUB(b, w, d)Packed subtraction –PCMPE(b, w, d)Packed compare equal –PMULLwPacked word multiply low –PMULHwPacked word multiply high –PMADDwdPacked word multiply-add –PSRL(w, d, q)Pack shift right logical –PACKSS(wb, dw) Pack data –PUNPCK(bw, wd, dq)Unpack data –PAND, POR, PXORPacked logical operations

Performance Comparison The following shows the performance of Pentium processors with and without MMX Technology Overall Audio D geometry Image Processing Video SpeedupWith MMX Without MMX Application

MMX Technology Summary MMX technology extends the Intel x86 architecture to improve the performance of multimedia and graphics applications. It provides a speedup of 1.5 to 2.0 for certain applications. MMX instructions are hand-coded in assembly or implemented as libraries to achieve high performance. MMX data types use the x86 floating point registers to avoid adding state to the processor. –Makes it easy to handle context switches –Makes it hard to perform MMX and floating point instructions at the same time Only increase the chip area by about 5%.

Questions on MMX What are the strengths and weaknesses of MMX Technology? How could MMX Technology potentially be improved? How did the developers of MMX preserve backward compatibility with the x86 architecture? –Why was this important? –What are the disadvantages of this approach? What restrictions/limitations are there on the use of MMX Technology?

Internet Streaming SIMD Extensions Intel’s Internet Streaming SIMD Extensions (ISSE) –Help improve the performance of video and 3D applications –Are designed for steaming data, which is used once and then discarded. –70 new instructions beyond MMX Technology –Adds new 128-bit registers –Provide the ability to perform parallel floating point operations »Four parallel operations on 32-bit numbers »Reciprocal and reciprocal root instructions - normalization »Packed average instruction – Motion compensation –Provide data prefetch instructions –Make certain applications 1.5 to 2.0 times faster.