CPE 631: Instruction Set Principles and Examples Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
CS 6461: Computer Architecture Instruction Set Architecture Instructor: Morris Lancaster.
CEG3420 Lec2.1 ©UCB Fall 1997 ISA Review CEG3420 Computer Design Lecture 2.
1 Instruction Set Principles and Examples 游象甫. 2 Outline Introduction Classifying instruction set architectures Memory addressing Type and size of operands.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
Lecture 3: Instruction Set Principles Kai Bu
EECC551 - Shaaban #1 Lec # 2 Fall Instruction Set Architecture (ISA) “... the attributes of a [computing] system as seen by the programmer,
INSTRUCTION SET ARCHITECTURES
Instruction Set Architecture Classification According to the type of internal storage in a processor the basic types are Stack Accumulator General Purpose.
CSE378 ISA evolution1 Evolution of ISA’s ISA’s have changed over computer “generations”. A traditional way to look at ISA complexity encompasses: –Number.
COMP381 by M. Hamdi 1 Instruction Set Architectures.
Execution of an instruction
What is an instruction set?
ECE 4436ECE 5367 ISA I. ECE 4436ECE 5367 CPU = Seconds= Instructions x Cycles x Seconds Time Program Program Instruction Cycle CPU = Seconds= Instructions.
Machine Instruction Characteristics
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Instruction Set Architecture
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix A Authors: John Hennessy & David Patterson.
1 Appendix B Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set Control flow instructions Instruction format.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
1  Modified from  1998 Morgan Kaufmann Publishers Chapter 2: Instructions: Language of the Machine citation and following credit line is included: 'Copyright.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
1 Computer Architecture COSC 3430 Lecture 3: Instructions.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
Execution of an instruction
Computer Architecture and Organization
Computer Architecture EKT 422
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Operand Addressing And Instruction Representation Cs355-Chapter 6.
Csci 136 Computer Architecture II – Summary of MIPS ISA Xiuzhen Cheng
Lecture 04: Instruction Set Principles Kai Bu
CS 211: Computer Architecture Lecture 2 Instructor: Morris Lancaster.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 5: MIPS Instructions I
Instruction Sets: Addressing modes and Formats Group #4  Eloy Reyes  Rafael Arevalo  Julio Hernandez  Humood Aljassar Computer Design EEL 4709c Prof:
Lecture 5 A Closer Look at Instruction Set Architectures Lecture Duration: 2 Hours.
What is a program? A sequence of steps
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 7, 8 Instruction Set Architecture.
Instruction Sets: Characteristics and Functions  Software and Hardware interface Machine Instruction Characteristics Types of Operands Types of Operations.
Lecture 12: 10/3/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
MAC/VU-Advanced Computer Architecture Lecture 6- Instruction Set Principles (3) 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture.
Instruction Set Principles
Overview of Instruction Set Architectures
A Closer Look at Instruction Set Architectures
A Closer Look at Instruction Set Architectures
A Closer Look at Instruction Set Architectures: Expanding Opcodes
Instruction Set Principles
Lecture 04: Instruction Set Principles
Instructions - Type and Format
Appendix A Classifying Instruction Set Architecture
The University of Adelaide, School of Computer Science
ECEG-3202 Computer Architecture and Organization
Chapter 9 Instruction Sets: Characteristics and Functions
A Closer Look at Instruction Set Architectures Chapter 5
Chapter 2. Instruction Set Principles and Examples
Computer Architecture
ECEG-3202 Computer Architecture and Organization
Evolution of ISA’s ISA’s have changed over computer “generations”.
Instruction Set Principles
Evolution of ISA’s ISA’s have changed over computer “generations”.
Evolution of ISA’s ISA’s have changed over computer “generations”.
Lecture 4: Instruction Set Design/Pipelining
Evolution of ISA’s ISA’s have changed over computer “generations”.
Chapter 10 Instruction Sets: Characteristics and Functions
Presentation transcript:

CPE 631: Instruction Set Principles and Examples Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,

 AM LaCASALaCASA 2 Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of Operations Instruction Encoding Role of Compilers

 AM LaCASALaCASA 3 Shift in Applications Area Desktop Computing – emphasizes performance of programs with integer and floating point data types; little regard for program size or processor power Servers - used primarily for database, file server, and web applications; FP performance is much less important for performance than integers and strings Embedded applications value cost and power, so code size is important because less memory is both cheaper and lower power DSPs and media processors, which can be used in embedded applications, emphasize real-time performance and often deal with infinite, continuous streams of data Architects of these machines traditionally identify a small number of key kernels that are critical to success, and hence are often supplied by the manufacturer.

 AM LaCASALaCASA 4 What is ISA? Instruction Set Architecture – the computer visible to the assembler language programmer or compiler writer ISA includes Programming Registers Operand Access Type and Size of Operands Instruction Set Addressing Modes Instruction Encoding

 AM LaCASALaCASA 5 Classifying ISA Stack Architectures - operands are implicitly on the top of the stack Accumulator Architectures - one operand is implicitly accumulator General-Purpose Register Architectures - only explicit operands, either registers or memory locations register-memory: access memory as part of any instruction register-register: access memory only with load and store instructions

 AM LaCASALaCASA 6 Classifying ISA (cont’d) For classes: Stack, Accumulator, Register-Memory, Load-store (or Register-Register) TOS Processor Memory... Stack Processor Memory... Accumulator Processor Memory... Register-Memory Processor Memory... Register-Register

 AM LaCASALaCASA 7 Example: Code Sequence for C = A+B StackAccumulatorRegister-MemoryLoad-store PushA PushB Add PopC Load A Add B Store C Load R1,A Add R3,R1,B Store C, R3 Load R1,A Load R2,B Add R3,R1,R2 Store C,R3 4 instr. 3 mem. op. 3 instr. 3 mem. op. 3 instr. 3 mem. op. 4 instr. 3 mem. op.

 AM LaCASALaCASA 8 Development of ISA Early computers used stack or accumulator architectures accumulator architecture easy to build stack architecture closely matches expression evaluation algorithms (without optimisations!) GPR architectures dominate from 1975 registers are faster than memory registers are easier for a compiler to use hold variables memory traffic is reduced, and the program speedups code density is increased (registers are named with fewer bits than memory locations)

 AM LaCASALaCASA 9 Programming Registers Ideally, use of GPRs should be orthogonal; i.e., any register can be used as any operand with any instruction May be difficult to implement; some CPUs compromise by limiting use of some registers How many registers? PDP-11: 8; some reserved (e.g., PC, SP); only a few left, typically used for expression evaluation VAX 11/780: 16; some reserved (e.g., PC, SP, FP); enough left to keep some variables in registers RISC: 32; can keep many variables in registers

 AM LaCASALaCASA 10 Operand Access Number of operands 3; instruction specifies result and 2 source operands 2; one of the operands is both a source and a result How many of the operands may be memory addresses in ALU instructions? Number of memory addresses Maximum number of operands Examples 03SPARC, MIPS, HP-PA, PowerPC, Alpha, ARM, Trimedia 12Intel 80x86, Motorola 68000, TI TMS320C54 2/3 VAX

 AM LaCASALaCASA 11 Operand Access: Comparison TypeAdvantagesDisadvantages Reg-Reg (0-3) Simple, fixed length instruction encoding. Simple code generation. Instructions take similar number of clocks to execute. Higher inst. count. Some instructions are short and bit encoding may be wasteful. Reg-Mem (1,2) Data can be accessed without loading first. Instruction format tends to be easy to decode and yields good density. Source operand is destroyed in a binary operation. Clocks per instruction varies by operand location. Mem- Mem (3,3) Most compact.Large variation in instruction size and clocks per instructions. Memory bottleneck.

 AM LaCASALaCASA 12 Type and Size of Operands (cont’d) Distribution of data accesses by size (SPEC) Double word: 0% (Int), 69% (Fp) Word: 74% (Int), 31% (Fp) Half word: 19% (Int), 0% (Fp) Byte: 7% (Int), 0% (Fp) Summary: a new 32-bit architecture should support 8-, 16- and 32-bit integers; 64-bit floats 64-bit integers may be needed for 64-bit addressing others can be implemented in software Operands for media and signal processing Pixel – 8b (red), 8b (green), 8b (blue), 8b (transparency of the pixel) Fixed-point (DSP) – cheap floating-point Vertex (graphic operations) – x, y, z, w

 AM LaCASALaCASA 13 Addressing Modes Addressing mode - how a computer system specifies the address of an operand constants registers memory locations I/O addresses Memory addressing since 1980 almost every machine uses addresses to level of 1 byte => How do byte addresses map onto 32 bits word? Can a word be placed on any byte boundary?

 AM LaCASALaCASA 14 Interpreting Memory Addresses Big Endian address of most significant byte = word address (xx00 = Big End of the word); IBM 360/370, MIPS, Sparc, HP-PA Little Endian address of least significant byte = word address (xx00 = Little End of the word); Intel 80x86, DEC VAX, DEC Alpha Alignment require that objects fall on address that is multiple of their size

 AM LaCASALaCASA 15 Interpreting Memory Addresses a a+1 a+2 a x00 0x01 0x02 0x03 Big Endian 31 0 msb lsb 0x00 0x01 0x02 0x msb lsb 0x03 0x02 0x01 0x00 Little Endian Memory a a+4 a+8 a+C Aligned Not Aligned

 AM LaCASALaCASA 16 Addressing Modes: Examples Addr. modeExampleMeaningWhen used Register ADD R4,R3 Regs[R4]  Regs[R4]+Regs[R3] a value is in register Immediate ADD R4,#3 Regs[R4]  Regs[R4]+3 for constants Displacem. ADD R4,100(R1) Regs[R4]  Regs[R4]+Mem[100+Regs[R1]] local variables Reg. indirect ADD R4,(R1) Regs[R4]  Regs[R4]+Mem[Regs[R1]] accessing using a pointer Indexed ADD R4,(R1+R2) Regs[R4]  Regs[R4]+Mem[Regs[R1]+Regs[R2]] array addressing (base + offset) Direct ADD R4,(1001) Regs[R4]  Regs[R4]+Mem[1001] addr. static data Mem. indirect ADD Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]] if R3 keeps the address of a pointer p, this yields *p Autoincr. ADD R4,(R3)+ Regs[R4]  Regs[R4]+Mem[Regs[R3]] Regs[R3]  Regs[R3] + d stepping through arrays within a loop; d defines size of an el. Autodecr. ADD R4,-(R3) Regs[R3]  Regs[R3] – d Regs[R4]  Regs[R4]+Mem[Regs[R3]] similar as previous Scaled ADD R4,100(R2)[R3] Regs[R4]  Regs[R4] + Mem[100+Regs[R2]+Regs[R3]*d] to index arrays

 AM LaCASALaCASA 17 Addressing Mode Usage 3 programs measured on machine with all address modes (VAX) register direct modes are not counted (one-half of the operand references) PC-relative is not counted (exclusively used for branches) Results Displacement42% avg, ( %) Immediate33% avg, ( %) Register indirect13% avg, (3 - 24%) Scaled 7% avg, (0 - 16%) Memory indirect 3% avg, (1 - 6%) Misc. 2% avg, (0 - 3%) 75% 85%

 AM LaCASALaCASA 18 Displacement, immediate size Displacement 1% of addresses require > 16 bits 25% of addresses require > 12 bits Immediate If they need to be supported by all operations? Loads:10% (Int), 45% (Fp) Compares:87% (Int), 77% (Fp) ALU operations:58% (Int), 78% (Fp) All instructions:35% (Int), 10% (Fp) What is the range of values? 50% - 70% fit within 8 bits 75% - 80% fit within 16 bits

 AM LaCASALaCASA 19 Addressing modes: Summary Data addressing modes that are important: Displacement, Immediate, Register Indirect Displacement size should be 12 to 16 bits Immediate should be 8 to 16 bits

 AM LaCASALaCASA 20 Addressing Modes for Signal Processing DSPs deal with continuous infinite stream of data => circular buffers Modulo or Circular addressing mode FFT shuffles data at the start or end 0 (000) => 0 (000), 1 (001) => 4 (100), 2 (010) => 2 (010), 3 (011) => 6 (110),... Bit reverse addressing mode take original value, do bit reverse, and use it as an address 6 mfu modes from found in desktop, account for 95% of the DSP addr. modes

 AM LaCASALaCASA 21 Typical Operations Data Movementload (from memory), store (to memory) mem-to-mem move, reg-to-reg move input (from IO device), push (to stack), pop (from stack), output (to IO device), Arithmeticinteger (binary + decimal), Add, Subtract, Multiply, Divide Shiftshift left/right, rotate left/right Logicalnot, and, or, xor, clear, set Controlunconditional/conditional jump Subroutine Linkagecall/return SystemOS call, virtual memory management Synchronizationtest-and-set Floating-point FP Add, Subtract, Multiply, Divide, Compare, SQRT StringString move, compare, search Graphics Pixel and vertex operations, compression/decomp.

 AM LaCASALaCASA 22 Top ten 8086 instructions Simple instructions dominate instruction frequency => support them RankInstruction% total execution 1load22% 2conditional branch 20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move reg-reg4% 9call1% 10return1% Total96%

 AM LaCASALaCASA 23 Operations for Media and Signal Processing Multimedia processing and limits of human perception use narrower data words (don’t need 64b fp) => wide ALU’s operate on several data items at the same time partition add – e.g., perform four 16-bit adds on a 64-bit ALU SIMD – Single instruction Multiple Data or vector instructions (see Appendix F) Figure 2.17 (page 110) DSP processors algorithms often need saturating arithmetic if result too large to be represented, it is set to the largest representable number often need several rounding modes MAC (Multiply and Accumulate) instructions

 AM LaCASALaCASA 24 Instructions for Control Flow Control flow instructions Conditional branches (75% int, 82% fp) Call/return (19% int, 8% fp) Jump (6% int, 10% fp) Addressing modes for control flows PC-relative for returns and indirect jumps the target is not known in compile time => specify a register which contains the target address

 AM LaCASALaCASA 25 Instructions for Control Flow (cont’d) Methods for branch evaluation Condition Code – CC (ARM, 80x86, PowerPC) tests special bits set by ALU instructions Condition register (Alpha, MIPS) tests arbitrary register Compare and branch (PA-RISC, VAX) compare is a part of branch Procedure invocation options do control transfer and possibly some state saving at least return address must be saved (in link register) compiler generate loads and stores to save the state Caller savings vs. callee savings

 AM LaCASALaCASA 26 Encoding an Instruction Set Instruction set architect must choose how to represent instructions in machine code Operation is specified in one field called Opcode Each operand is specified by a separate Address specifier (tells which addressing modes is used) Balance among Many registers and addressing modes adds to richness Many registers and addressing modes increase code size Lengths of code objects should "match" architecture; e.g., 16 or 32 bits

 AM LaCASALaCASA 27 Basic variations in encoding a) Variable (e.g. VAX) Operation & no. of operands Address specifier 1 Address field 1 Address specifier n Address field n... b) Fixed (e.g. DLX, MIPS, PowerPC,...) Operation Address field 1 Address field 2 Address field 3 c) Hybrid (e.g. IBM 360/70, Intel80x86) Operation Address specifier 1 Address field 1 Operation Address specifier 1 Address specifier 2 Address field Operation Address specifier Address field 1 Address field 2

 AM LaCASALaCASA 28 Summary of Instruction Formats If code size is most important, use variable length instructions If performance is over most important, use fixed length instructions Reduced code size in RISCs hybrid version with both 16-bit and 32-bit ins. narrow instructions support fewer operations, smaller address and immediate fields, fewer registers, and 2-address format ARM Thumb, MIPS MIPS16 (Appendix C) IBM: compressed code is kept in main memory, ROMs, disk caches keep decompressed code

 AM LaCASALaCASA 29 Role of Compilers Structure of recent compilers 1) Front-end transform language to common intermediate form language dependent, machine independent 2) High-level optimizations e.g., loop transformations, procedure inlining,... somewhat language dependent, machine independent 3) Global optimizer global and local optimizations, register allocation small language dependencies, somewhat machine dependencies 4) Code generator instruction selection, machine dependent optimizations language independent, highly machine dependent

 AM LaCASALaCASA 30 Compiler Optimizations 1) High-level optimizations done on the source 2) Local optimizations optimize code within a basic block 3) Global optimizations extend local optimizations across branches (loops) 4) Register allocation associates registers with operands 5) Processor-dependent optimizations take advantage of specific architectural knowledge