Computer Architecture

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
CEG3420 Lec2.1 ©UCB Fall 1997 ISA Review CEG3420 Computer Design Lecture 2.
1 Instruction Set Principles and Examples 游象甫. 2 Outline Introduction Classifying instruction set architectures Memory addressing Type and size of operands.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
Lecture 3: Instruction Set Principles Kai Bu
EECC551 - Shaaban #1 Lec # 2 Fall Instruction Set Architecture (ISA) “... the attributes of a [computing] system as seen by the programmer,
INSTRUCTION SET ARCHITECTURES
Computer Organization and Architecture
Recap Measuring and reporting performance Quantitative principles Performance vs Cost/Performance.
Computer Organization and Architecture
Computer Organization and Architecture
COMP381 by M. Hamdi 1 Instruction Set Architectures.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
Classifying Instruction Set Architectures
What is an instruction set?
ECE 4436ECE 5367 ISA I. ECE 4436ECE 5367 CPU = Seconds= Instructions x Cycles x Seconds Time Program Program Instruction Cycle CPU = Seconds= Instructions.
CSE378 MIPS ISA1 MIPS History MIPS is a computer family –R2000/R3000 (32-bit); R4000/4400 (64-bit); R8000; R10000 (64-bit) etc. MIPS originated as a Stanford.
Instruction Set Architecture
Machine Instruction Characteristics
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix A Authors: John Hennessy & David Patterson.
CET 520/ Gannod1 The MIPS Architecture Section 2.12.
1 Appendix B Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set Control flow instructions Instruction format.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
Chapter 5 A Closer Look at Instruction Set Architectures.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
Computer Architecture and Organization
Computer Architecture EKT 422
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
Csci 136 Computer Architecture II – Summary of MIPS ISA Xiuzhen Cheng
Lecture 04: Instruction Set Principles Kai Bu
CS 211: Computer Architecture Lecture 2 Instructor: Morris Lancaster.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
Instruction Sets: Addressing modes and Formats Group #4  Eloy Reyes  Rafael Arevalo  Julio Hernandez  Humood Aljassar Computer Design EEL 4709c Prof:
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture 5 A Closer Look at Instruction Set Architectures Lecture Duration: 2 Hours.
What is a program? A sequence of steps
Group # 3 Jorge Chavez Henry Diaz Janty Ghazi German Montenegro.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 7, 8 Instruction Set Architecture.
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
Computer Architecture & Operations I
A Closer Look at Instruction Set Architectures
Instruction Set Architecture and Principles
A Closer Look at Instruction Set Architectures
Instructions - Type and Format
CS170 Computer Organization and Architecture I
The University of Adelaide, School of Computer Science
ECEG-3202 Computer Architecture and Organization
Chapter 9 Instruction Sets: Characteristics and Functions
A Closer Look at Instruction Set Architectures Chapter 5
Computer Instructions
Chapter 2. Instruction Set Principles and Examples
Computer Architecture
ECEG-3202 Computer Architecture and Organization
Instruction Set Principles
Lecture 4: Instruction Set Design/Pipelining
Chapter 10 Instruction Sets: Characteristics and Functions
Presentation transcript:

Computer Architecture Princess Sumaya University for Technology Computer Architecture Dr. Esam Al_Qaralleh

Instruction Set Architecture (ISA)

Classifying instruction set architectures Instruction set measurements Outline Introduction Classifying instruction set architectures Instruction set measurements Memory addressing Addressing modes for signal processing Type and size of operands Operations in the instruction set Operations for media and signal processing Instructions for control flow Encoding an instruction set MIPS architecture

Instruction Set Principles and Examples

Basic Issues in Instruction Set Design What operations and How many Load/store/Increment/branch are sufficient to do any computation, but not useful (programs too long!!). How (many) operands are specified? Most operations are dyadic (e.g., AB+C); Some are monadic (e.g., A B). How to encode them into instruction format? Instructions should be multiples of Bytes. Typical Instruction Set 32-bit word Basic operand addresses are 32-bit long. Basic operands (like integer) are 32-bit long. In general, Instruction could refer 3 operands (AB+C). Challenge: Encode operations in a small number of bits.

Brief Introduction to ISA Instruction Set Architecture: a set of instructions Each instruction is directly executed by the CPU’s hardware How is it represented? By a binary format since the hardware understands only bits Options - fixed or variable length formats Fixed - each instruction encoded in same size field (typically 1 word) Variable – half-word, whole-word, multiple word instructions are possible opcode rs rt Immediate 6 5 5 16

Instruction Format (encoding) Location of operands and result What Must be Specified? Instruction Format (encoding) How is it decoded? Location of operands and result Where other than memory? How many explicit operands? How are memory operands located? Data type and Size Operations What are supported?

Example of Program Execution Command 1: Load AC from Memory 2: Store AC to memory 5: Add to AC from memory Add the contents of memory 940 to the content of memory 941 and stores the result at 941 Fetch Execution

Classifying Instruction Set Architecture

Instruction Set Design The instruction set influences everything

Instruction Characteristics Usually a simple operation Which operation is identified by the op-code field But operations require operands - 0, 1, or 2 To identify where they are, they must be addressed Address is to some piece of storage Typical storage possibilities are main memory, registers, or a stack 2 options explicit or implicit addressing Implicit - the op-code implies the address of the operands ADD on a stack machine - pops the top 2 elements of the stack, then pushes the result HP calculators work this way Explicit - the address is specified in some field of the instruction Note the potential for 3 addresses - 2 operands + the destination

Classifying Instruction Set Architectures Based on CPU internal storage options AND # of operands These choices critically affect - #instructions, CPI, and cycle time

Operand Locations for Four ISA Classes

Register (register-memory) C=A+B Stack Push A Push B Add Pop the top-2 values of the stack (A, B) and push the result value into the stack Pop C Accumulator (AC) Load A Add B Add AC (A) with B and store the result into AC Store C Register (register-memory) Load R1, A Add R3, R1, B Store R3, C Register (load-store) Load R2, B Add R3, R1, R2

Modern Choice – Load-store Register (GPR) Architecture Reasons for choosing GPR (general-purpose registers) architecture Registers (stacks and accumulators…) are faster than memory Registers are easier and more effective for a compiler to use (A+B) – (C*D) – (E*F) May be evaluated in any order (for pipelining concerns or …) But on a stack machine  must left to right Registers can be used to hold variables Reduce memory traffic Speed up programs Improve code density (fewer bits are used to name a register) Compiler writers prefer that all registers be equivalent and unreserved The number of GPR: at least 16

Characteristics Divide GPR Architectures # of operands Three-operand: 1 result and 2 source operands Two-operand – 1 both source/result and 1 source How many operands are memory addresses 0 – 3 (two sources + 1 result) Load-store Register-memory Memory-memory

Pro’s and Con’s of Three Most Common GPR Computers Register-Register: (0,3) + Simple, fixed length instruction encoding. + Simple code-generation model. + Similar number of clocks to execute. - Higher instruction count. Memory-memory: (3,3) + Most compact. - Different Instruction size. - Memory access bottleneck. Register-Memory: (1,2) + Data access without loading first. + Easy to encode and yield good density. - One operand is destroyed. - Limited number of registers.

Memory Addressing

Memory Addressing Basics All architectures must address memory What is accessed - byte, word, multiple words? Today’s machine are byte addressable Main memory is organized in 32 - 64 byte lines Big-Endian or Little-Endian addressing Hence there is a natural alignment problem Size s bytes at byte address A is aligned if A mod s = 0 Misaligned access takes multiple aligned memory references Memory addressing mode influences instruction counts (IC) and clock cycles per instruction (CPI)

Big Endian: Byte 0 is most, 3 is least Byte Ordering Idea Bytes in long word numbered 0 to 3 Which is most (least) significant? Can cause problems when exchanging binary data between machines Big Endian: Byte 0 is most, 3 is least IBM 360/370, Motorola 68K, SPARC. Little Endian: Byte 0 is least, 3 is most Intel x86, VAX Alpha Chip can be configured to operate either way DEC workstation are little endian Cray T3E Alpha’s are big endian

Byte Ordering Example union { unsigned char c[8]; unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } dw; c[3] s[1] i[0] c[2] c[1] s[0] c[0] c[7] s[3] i[1] c[6] c[5] s[2] c[4] l[0]

Byte Ordering on Alpha Little Endian Output on Alpha: f0 f1 f2 f3 f4 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] LSB MSB LSB MSB LSB MSB LSB MSB s[0] s[1] s[2] s[3] LSB MSB LSB MSB i[0] i[1] LSB MSB l[0] Print Output on Alpha: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [0xf7f6f5f4f3f2f1f0]

Byte Ordering on x86 Little Endian Output on Pentium: f0 f1 f2 f3 f4 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] LSB MSB LSB MSB LSB MSB LSB MSB s[0] s[1] s[2] s[3] LSB MSB LSB MSB i[0] i[1] LSB MSB l[0] Print Output on Pentium: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [f3f2f1f0]

Byte Ordering on Sun Big Endian Output on Sun: f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] MSB LSB MSB LSB MSB LSB MSB LSB s[0] s[1] s[2] s[3] MSB LSB MSB LSB i[0] i[1] MSB LSB l[0] Print Output on Sun: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7] Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7] Long 0 == [0xf0f1f2f3]

Addressing Modes Register Immediate Add R4, R3 Add R4, #3 Regs[R4]  Regs[R4]+3 Operand:3 Register Add R4, R3 Regs[R4]  Regs[R4]+Regs[R3] R3 Operand Registers Register Indirect Add R4, (R1) Regs[R4]  Regs[R4]+Mem[Regs[R1]] R1 Operand Registers Memory

Addressing Modes(Cont.) Direct Add R4, (1001) Regs[R4]  Regs[R4]+Mem[1001] 1001 Operand Memory Indirect Add R4, @(R3) Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]] R3 Operand Registers Memory Memory

Addressing Modes(Cont.) Displacement Add R4, 100(R1) Regs[R4]  Regs[R4]+Mem[100+R1] Registers R1 100 Memory Operand Scaled Add R1, 100(R2) [R3] Regs[R1]  Regs[R1]+Mem[100+ Regs[R2]+Regs[R3]*d] R3 R2 100 Operand *d Registers Memory

Typical Address Modes (I)

Typical Address Modes (II)

Use of Memory Addressing Mode (Figure 2.7) Based on a VAX which supported everything Not counting Register mode (50% of all)

Displacement Address Size Average of 5 programs from SPECint92 and SPECfp92. 1% of addresses > 16 bits. Integer Average FP Average

Immediate Addressing Mode 10 Programs from SPECInt92 and SPECfp92

Immediate Addressing Mode 50% to 60% fit within 8 bits 75% to 80% fit within 16 bits gcc spice Tex

Short Summary – Memory Addressing Need to support at least three addressing modes Displacement, immediate, and register deferred (+ REGISTER) They represent 75% -- 99% of the addressing modes in benchmarks The size of the address for displacement mode to be at least 12—16 bits (75% – 99%) The size of immediate field to be at least 8 – 16 bits (50%— 80%)

Typical types: assume word= 32 bits Operand Type & Size Typical types: assume word= 32 bits Character - byte - ASCII or EBCDIC (IBM) - 4 per word Short integer - 2- bytes, 2’s complement Integer - one word - 2’s complement Float - one word - usually IEEE 754 these days Double precision float - 2 words - IEEE 754 BCD or packed decimal - 4- bit values packed 8 per word

Data Access Patterns

Short Summary – Type and Size of Operand The future - as we go to 64 bit machines Larger offsets, immediate, etc. is likely Usage of 64 and 128 bit values will increase DSPs need wider accumulating registers than the size in memory to aid accuracy in fixed-point arithmetic

ALU Operations

What Operations are Needed Arithmetic + Logical Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT Logical operation: AND, OR, XOR, NOT Data Transfer - copy, load, store Control - branch, jump, call, return, trap System - OS and memory management We’ll ignore these for now - but remember they are needed Floating Point Same as arithmetic but usually take bigger operands Decimal String - move, compare, search Graphics – pixel and vertex, compression/decompression operations

Make them fast, as they are the common case Top 10 Instructions for 80x86 load: 22% conditional branch: 20% compare: 16% store: 12% add: 8% and: 6% sub: 5% move register-register: 4% call: 1% return: 1% The most widely executed instructions are the simple operations of an instruction set The top-10 instructions for 80x86 account for 96% of instructions executed Make them fast, as they are the common case

Control Instructions are a Big Deal Jumps - unconditional transfer Conditional Branches How is condition code set? – by flag or part of the instruction How is target specified? How far away is it? Calls Where is return address kept? How are the arguments passed? Callee vs. Caller save! Returns Where is the return address? How far away is it? How are the results passed?

Breakdown of Control Flows Call/Returns Integer: 19% FP: 8% Jump Integer: 6% FP: 10% Conditional Branch Integer: 75% FP: 82%

Branch Address Specification Known at compile time for unconditional and conditional branches - hence specified in the instruction As a register containing the target address As a PC-relative offset Consider word length addresses, registers, and instructions Full address desired? Then pick the register option. BUT - setup and effective address will take longer. If you can deal with smaller offset then PC relative works PC relative is also position independent - so simple linker duty

Returns and Indirect Jumps Branch target is not known at compile time Need a way to specify the target dynamically Use a register Permit any addressing mode Regs[R4]  Regs[R4] + Mem[Regs[R1]] Also useful for case or switch Dynamically shared libraries High-order functions or function pointers

Branch Stats - 90% are PC Relative Call/Return TeX = 16%, Spice = 13%, GCC = 10% Jump TeX = 18%, Spice = 12%, GCC = 12% Conditional TeX = 66%, Spice = 75%, GCC = 78%

Branch Distances

Condition Testing Options PSW: program Switch Word

What kinds of compares do Branches Use? Large comparisons are with zero

Direction, Frequency, and real Change Key points – 75% are forward branch • Most backward branches are loops - taken about 90% • Branch statistics are both compiler and application dependent • Any loop optimizations may have large effect

Short Summary – Operations in the Instruction Set Branch addressing to be able to jump to about 100+ instructions either above or below the branch Imply a PC-relative branch displacement of at least 8 bits Register-indirect and PC-relative addressing for jump instructions to support returns as well as many other features of current systems ( dynamic allocations)

Encoding an Instruction Set

Encoding the ISA Encode instructions into a binary representation for execution by CPU Can pick anything but: Affects the size of code - so it should be tight Affects the CPU design - in particular the instruction decode So it may have a big influence on the CPI or cycle-time Must balance several competing forces Desire for lots of addressing modes and registers Desire to make average program size compact Desire to have instructions encoded into lengths that will be easy to handle in a pipelined implementation (multiple of bytes)

3 Popular Encoding Choices Variable (compact code but difficult to encode) Primary opcode is fixed in size, but opcode modifiers may exist Opcode specifies number of arguments - each used as address fields Best when there are many addressing modes and operations Use as few bits as possible, but individual instructions can vary widely in length e. g. VAX - integer ADD versions vary between 3 and 19 bytes Fixed (easy to encode, but lengthy code) Every instruction looks the same - some field may be interpreted differently Combine the operation and the addressing mode into the opcode e. g. all modern RISC machines Hybrid Set of fixed formats e. g. IBM 360 and Intel 80x86 Trade-off between size of program VS. ease of decoding

3 Popular Encoding Choices (Cont.)

An Example of Variable Encoding -- VAX addl3 r1, 737(r2), (r3): 32-bit integer add instruction with 3 operands  need 6 bytes to represent it Opcode for addl3: 1 byte A VAX address specifier is 1 byte (4-bits: addressing mode, 4-bits: register) r1: 1 byte (register addressing mode + r1) 737(r2) 1 byte for address specifier (displacement addressing + r2) 2 bytes for displacement 737 (r3): 1 byte for address specifier (register indirect + r3) Length of VAX instructions: 1—53 bytes

Short Summary – Encoding the Instruction Set Choice between variable and fixed instruction encoding Code size than performance  variable encoding Performance than code size  fixed encoding

Role of Compilers

Critical goals in ISA from the compiler viewpoint What features will lead to high-quality code What makes it easy to write efficient compilers for an architecture

Compiler and ISA ISA decisions are no more for programming AL easily Due to HLL, ISA is a compiler target today Performance of a computer will be significantly affected by compiler Understanding compiler technology today is critical to designing and efficiently implementing an instruction set Architecture choice affects the code quality and the complexity of building a compiler for it

Make the frequent cases fast and the rare case correct Goal of the Compiler Primary goal is correctness Second goal is speed of the object code Others: Speed of the compilation Ease of providing debug support Inter-operability among languages Flexibility of the implementation - languages may not change much but they do evolve - e. g. Fortran 66 ===> HPF Make the frequent cases fast and the rare case correct

Optimization Observations Hard to reduce branches Biggest reduction is often memory references Some ALU operation reduction happens but it is usually a few % Implication: Branch, Call, and Return become a larger relative % of the instruction mix Control instructions among the hardest to speed up

How can Architects Help Compiler Writers Provide Regularity Address modes, operations, and data types should be orthogonal (independent) of each other Simplify code generation especially multi-pass Counterexample: restrict what registers can be used for a certain classes of instructions Provide primitives - not solutions Special features that match a HLL construct are often un-usable What works in one language may be detrimental to others

How can Architects Help Compiler Writers (Cont.) Simplify trade-offs among alternatives How to write good code? What is a good code? Metric: IC or code size (no longer true) caches and pipeline… Anything that makes code sequence performance obvious is a definite win! How many times a variable should be referenced before it is cheaper to load it into a register Provide instructions that bind the quantities known at compile time as constants Don’t hide compile time constants Instructions which work off of something that the compiler thinks could be a run-time determined value hand-cuffs the optimizer

Short Summary -- Compilers ISA has at least 16 GPR (not counting FP registers) to simplify allocation of registers using graph coloring Orthogonality suggests all supported addressing modes apply to all instructions that transfer data Simplicity – understand that less is more in ISA design Provide primitives instead of solutions Simplify trade-offs between alternatives Don’t bind constants at runtime Counterexample – Lack of compiler support for multimedia instructions

The MIPS Architecture

Expectations for New ISA Use general-purpose registers, with a load-store architecture Support displacement (offset size12-16 bits), immediate (size 8 to 16 bits), and register indirect Support 8-, 16-, 32-, and 64-bit integers and 64-bit IEEE 754 floating-point numbers Support the following simple instructions: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8 bits long), jump, call, return Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size Provide at least 16 general-purpose registers (GPA) + separate floating-point registers, be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist instruction set

Enable efficient pipeline implementation MIPS Simple load- store ISA Enable efficient pipeline implementation Fixed instruction set encoding Efficiency as a compiler target MIPS64 variant is discussed here

32 64-bit integer GPR’s - R0, R1, ... R31, R0= 0 always Register for MIPS 32 64-bit integer GPR’s - R0, R1, ... R31, R0= 0 always 32 FPR’s - used for single or double precision For single precision: F0, F1, ... , F31 (32-bit) For double precision: F0, F2, ... , F30 (64-bit) Extra status registers - moves via GPR’s Instructions for moving between an FRP and a GPR

32-bit single precision and 64-bit double precision for FP Data Types for MIPS 8-bit byte, 16-bit half words, 32-bit word, and 64-bit double words for integer data 32-bit single precision and 64-bit double precision for FP MIPS64 operations work on 64-bit integer and 32- or 64-bit floating point Bytes, half words, and words are loaded into the GPRs with zeros or the sign bit replicated to fill the 64 bits of the GPRs All references between memory and either GPRs or FPRs are through load or stores

Addressing Modes for MIPS Data addressing : immediate and displacement (16 bits) Displacement: Add R4, 100(R1) (Regs[R4]Regs[R4]+Mem[100+Regs[R1]]) Register-indirect: placing 0 in displacement field Add R4, (R1) (Regs[R4]Regs[R4]+Mem[Regs[R1]]) Absolute addressing (16 bits): using R0 as the base register Add R1, (1001) (Regs[R4]Regs[R4]+Mem[1001]) Byte addressable with 64-bit address Mode selection for Big Endian or Little Endian

MIPS Instruction Format Encode addressing mode into the opcode All instructions are 32 bits with 6-bit primary opcode

MIPS Instruction Format (Cont.) I-Type Instruction opcode rs rt Immediate 6 5 5 16 Loads and Stores LW R1, 30(R2), S.S F0, 40(R4) ALU ops on immediates DADDIU R1, R2, #3 rt <-- rs op immediate Conditional branches BEQZ R3, offset rs is the register checked rt unused immediate specifies the offset Jump registers ,jump and link register JR R3 rs is target register rt and immediate are unused but = 011

MIPS Instruction Format (Cont.) R-Type Instruction opcode rs rt shamt 6 5 5 5 5 6 rd func Register-register ALU operations: rdrs funct rt DADDU R1, R2, R3 Function encodes the data path operations: Add, Sub... read/write special registers Moves J-Type Instruction: Jump, Jump and Link, Trap and return from exception 6 26 opcode Offset added to PC

MIPS instruction MIX SPECint2000

MIPS instruction MIX (Cont.) SPECfp2000