Lecture 6: Instruction Set Architecture (Continued)

Lecture 6: Instruction Set Architecture (Continued)
Michael B. Greenwald Computer Architecture CIS 501 Fall 1999

Administration Solutions for HW #1 will be posted on web site as soon as class is over Hardcopy will normally be made available in class Homework #2 due today. Homework #3 will be handed out on Thursday Please read Chapter 5.

Instruction Encoding: Issues
Size of compiled code Speed of decoding Number of resources (registers) and access methods (addressing modes)

Kinds of Addressing Modes
memory Register direct Ri Immediate (literal) v Direct (absolute) M[v] Register indirect M[Ri] Base+Displacement M[Ri + v] Base+Index M[Ri + Rj] Scaled Index M[Ri + Rj*d + v] Autoincrement M[Ri++] Autodecrement M[Ri - -] Memory Indirect M[ M[Ri] ] [Indirection Chains] (PDP-10, LispM) reg. file Ri Rj v

Case Study: Quantitative Analysis of Constants
3 kinds of constants Immediate literals Displacement addressing mode Branch Distances Frequency Range/magnitude

Immediate Literals For ALU, comparisons, MOV immediate to register for constant, & MOV immediate to register for address. Frequency:

Immediate Literals: Range
0 implicit, 94% positive, on VAX, 32bits 50-70% within 8 bits, 75-80% within 16 bits

Frequency of addressing modes
Immediate and Displacement dominate

Kinds of Addressing Modes
memory Register direct Ri Immediate (literal) v Direct (absolute) M[v] Register indirect M[Ri] Base+Displacement M[Ri + v] Base+Index M[Ri + Rj] Scaled Index M[Ri + Rj*d + v] Autoincrement M[Ri++] Autodecrement M[Ri - -] Memory Indirect M[ M[Ri] ] [Indirection Chains] (PDP-10, LispM) reg. file Ri Rj v

Displacement Addressing Mode: Range
Widely distributed, tested on mach. w/16 bit displacement 1% >= 16 bits, 12 bits = 75%, 16 bits = 99%

Branch Instructions See textbook
Conditional branches dominate call/returns and jumps (unconditional branch) Unlike displacements or immediates, most branch distances are instructions away: 0% are 0 99% 8 bits (flat > 8 bits) 64% are 4 bits, 80% are 5bits Use PC-relative addressing.

High Level Languages to Machine Code: Compilation
Almost no assembler programming anymore In order to understand system performance, and to design and implement efficient instruction sets, need to understand compiler technology Quality of output: How easily can compilers generate efficient code for a given ISA? Complexity, ease of implementation: How complicated do compilers need to be to generate code for this ISA?

Compiler Goals Correctness Quality of generated code: Performance
Quality of code generation: Efficiency Portability (multiple targets) Flexibility (multiple sources) Debugging and profiling

Front End: per-language High Level Optimizations
Compiler Structure High level language Language dependent, machine independent Independent of both mach. & Language Machine dependent, language independent Translate to common representation CSE, inlining, loop transformation Global and local optimizations, register allocation Instruction selection, machine- specific optimizations (ordering etc.) Front End: per-language Intermediate form High Level Optimizations Intermediate form Global Optimizations Intermediate form Code Generator Assembly Language Assembler

Compiler Structure Phase ordering
Which optimizations come first? Dependency on the results of other optimizations Which optimizations belong in which phase? Easiest representation, necessary information (type ...) Procedure inlining is one of the most important (high level) optimizations because it enables other optimizations but tradeoff depends on size of code

Compilers: optimizations
High level: Procedure inlining Partial evaluation Local Common subexpression elimination Constant propagation Global Copy propagation/substitution Code motion/loop invariants Induction variables Code Generator Strength reduction pipeline scheduling layout (reduce branch offsets peephole optimizations

Compilers: Allocation of storage
High level: Storage categories Stack: extent (& scope) follows control flow, easy de/allocation Heap: persistent, large, dynamically allocated Static, global:persistent, large, statically allocated Low level: Implementation: Memory: slow, addressable, plentiful Registers: fast, can’t be pointed to, scarce (KL10 had low addresses point to registers, though) Register Allocation is (one of) the most important (low-level) optimization techniques

Register Allocation Restrictions General Technique: graph coloring
aliasing: multiple ways of naming same object, can’t guarantee that compiler will catch and maintain identity quantity: finite number of registers, if more than N active variables at one time they cannot all be in registers. General Technique: graph coloring Construct graph where each variable is a vertex and if two variables are active at the same time they share an edge Color graph so that no two vertices sharing an edge are the same color Minimize number of colors. Each color is a register If two adjacent vertices are the same color, one must be stored back in memory. (which one? when? ....)

ISA impact on compiler writers
Regularity: separate components. e.g. all addressing modes available for all op-codes Primitives with minimal semantics: optimization for one language is burden for all others (and oftern wrong, or inefficient, even for original language!) Simplify trade-offs: make costs easy to understand Allow compile-time constants to avoid run-time interpretation

A "Typical" RISC 32-bit fixed format instruction (3 formats,1 size)
32 32-bit GPR (R0 contains zero, DP take pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store: base + displacement no indirection Delayed load Delayed branch Simple branch conditions see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

Example: MIPS Register-Register shamt Op Rs1 Rs2 Rd func
31 26 25 21 20 16 15 11 10 6 5 shamt Op Rs1 Rs2 Rd func Register-Immediate 31 26 25 21 20 16 15 immediate Op Rs1 Rd Spec doesn’t distinguish Branch 31 26 25 21 20 16 15 immediate Op Rs1 Rs2/Cond Jump / Call 31 26 25 target Op

Recommendations Use GPR with load-store architecture
Popular addressing modes: displacement (offset only bits), immediate, and register deferred Simple instructions: ALU, few branch conditions Simple data-types: integers (8,16,32), and IEEE 754 floats (64 bit) Fixed instructions for perf. variable for size At least 16 GPR, seperate FPR, all addressing modes apply to all data xfer instructions ....

DLX Recommendations Use GPR with load-store architecture
Popular addressing modes: displacement (offset only bits), immediate, and register deferred Simple instructions: ALU, few branch conditions Simple data-types: integers (8,16,32), and IEEE 754 floats (64 bit) Fixed instructions for perf. variable for size At least 16 GPR, seperate FPR, all addressing modes apply to all data xfer instructions .... DLX

DLX Registers 32 GPR 32 bit integers GPR0-GPR32
GPR0 Always contains 0 32 GPR 32 bit integers GPR0-GPR32 Special registers, eg floating point status 32 FPR 32 bit single floats, (integer mult. div.) or 16 DP FPR, 64 bit double floats

DLX Data Types and Addressing Modes
8,16,32 bit integers. 32bit single floats and 64 bit IEEE 754 double floats. (All integer ops are 32 bits). Addressing modes: immediate and displacement. R0 as base address gives us absolute addressing (low 16 bits of addr. space). displacement = 0, gives us register deferred.

DLX Formats I-Type instruction Opcode rs1 rd Immediate
31 26 25 21 20 16 15 Opcode rs1 rd Immediate Load/Store, all immediates, conditional branch (no rd), jump register, jump & link R-type instruction 31 26 25 21 20 16 15 11 10 function Opcode rs1 rs2 rd ALU operations (func encodes the data path operation +,-), read/write special registers,move J-type instruction 31 26 25 Opcode Offset added to PC Jump, jump&Link,trap, ret from exception

DLX Operations 4 classes: Load/Stores, ALU, Branch&Jump, FP.
Some are synthesized out of existing instructions MOV R2, R1 = ADD R2, R1, R0 LI R2,#foo=ADDI R2, R0, #foo 32 bit constant (address or data) = LHI R7,#hi-16bits ADDI R7,R0,#lo-16bits COMPARE: no status register, just beqz and bnez. <, <=, =, /=, >, >= places a 1 or 0 in dest. register

DLX Operations (cont.) Integer MULT/DIV done in FPU
Operands must be moved to FP registers.

How efficient is DLX? Trade off increased instruction count for reduced CPI and reduced cycle time and reduced complexity (hence reduced cost).

MIPS vs. VAX (performance)
MIPS has 2x instructions, but 1/6 CPI, so 3x performance. Cost is also significant. Data from VAX folks: That’s why DEC no longer makes VAX.

80x86 v. DLX: Instruction Counts
SPEC pgm x86 DLX DLX÷86 gcc 3,771,327,742 3,892,063, espresso 2,216,423,413 2,801,294, spice 15,257,026,309 16,965,928, nasa7 15,603,040,963 6,118,740, Floats

Summary: Instruction Set Architecture
Instruction set still important even though compilers rather than programmers are clients 5 primary design dimensions Operand Storage Number of (explicit) operands Effective Address Type & Size of Operands Operations GPR, Load-store, simple addressing modes and comparisons

Summary: Instruction Set Architecture
Need to understand compilation DLX language; quantitative principles applied to instruction set design.

Lecture 6: Instruction Set Architecture (Continued)

Similar presentations

Presentation on theme: "Lecture 6: Instruction Set Architecture (Continued)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 6: Instruction Set Architecture (Continued)

Similar presentations

Presentation on theme: "Lecture 6: Instruction Set Architecture (Continued)"— Presentation transcript:

Similar presentations

About project

Feedback