Computer Architecture Principles Dr. Mike Frank

Slides:

Advertisements

Similar presentations

CEG3420 Lec2.1 ©UCB Fall 1997 ISA Review CEG3420 Computer Design Lecture 2.

Advertisements

CPE 631: Instruction Set Principles and Examples Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,

1 Instruction Set Principles and Examples 游象甫. 2 Outline Introduction Classifying instruction set architectures Memory addressing Type and size of operands.

1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.

Lecture 3: Instruction Set Principles Kai Bu

INSTRUCTION SET ARCHITECTURES

Computer Organization and Architecture

Recap Measuring and reporting performance Quantitative principles Performance vs Cost/Performance.

Computer Organization and Architecture

Computer Organization and Architecture

CSE378 ISA evolution1 Evolution of ISA’s ISA’s have changed over computer “generations”. A traditional way to look at ISA complexity encompasses: –Number.

Unit -II CPU Organization By- Mr. S. S. Hire. CPU organization.

1 Appendix B Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set Control flow instructions Instruction format.

Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.

Computer Organization and Design Computer Abstractions and Technology

1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set.

1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.

Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.

Computer Architecture and Organization

Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.

Lecture 04: Instruction Set Principles Kai Bu

CS 211: Computer Architecture Lecture 2 Instructor: Morris Lancaster.

Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.

DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 7, 8 Instruction Set Architecture.

Computer Architecture & Operations I

Computer Architecture & Operations I

Instruction Set Principles

Overview of Instruction Set Architectures

CS203 – Advanced Computer Architecture

Computer Architecture Instruction Set Architecture

A Closer Look at Instruction Set Architectures

William Stallings Computer Organization and Architecture 8th Edition

William Stallings Computer Organization and Architecture 8th Edition

Instruction Set Principles

Advanced Computer Architecture 5MD00 / 5Z032 Instruction Set Design

Processor Organization and Architecture

Lecture 04: Instruction Set Principles

Instructions - Type and Format

Appendix A Classifying Instruction Set Architecture

CS170 Computer Organization and Architecture I

The University of Adelaide, School of Computer Science

1 Overview of Microprocessors A. Parveen. 2 Lecture overview Introduction to microprocessors Instruction set architecture Typical commercial microprocessors.

Chapter 9 Instruction Sets: Characteristics and Functions

Computer Instructions

Chapter 2. Instruction Set Principles and Examples

Computer Architecture

Computer Architecture

Evolution of ISA’s ISA’s have changed over computer “generations”.

What is Computer Architecture?

Introduction to Microprocessor Programming

Instruction Set Principles

Dr Hao Zheng Computer Sci. & Eng. U of South Florida

COMS 361 Computer Organization

What is Computer Architecture?

What is Computer Architecture?

Evolution of ISA’s ISA’s have changed over computer “generations”.

Evolution of ISA’s ISA’s have changed over computer “generations”.

CPU Structure CPU must:

Lecture 4: Instruction Set Design/Pipelining

Chapter 11 Processor Structure and function

COMPUTER ORGANIZATION AND ARCHITECTURE

Evolution of ISA’s ISA’s have changed over computer “generations”.

CSE378 Introduction to Machine Organization

Chapter 10 Instruction Sets: Characteristics and Functions

Presentation transcript:

Computer Architecture Principles Dr. Mike Frank CDA 5155 Summer 2003 Module #9 Basics of Instruction Set Architectures

Moving on to Chapter 2... Topic: Principles of instruction set design… An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: A set of instructions (really, instruction types) With associated argument fields, assembly syntax, and machine encoding. A set of named storage locations Registers, memory, … Programmer-accessible caches? A set of addressing modes (ways to name locations) Often an I/O interface (usu. memory-mapped)

Classifying Architectures

Broad Classes of Processor Architectures Can be distinguished by radically differing programming models: Instruction-based programming model: Subdivision of hardware into memory, register file, ALUs, etc. Traditional assembly-language operations; like high-level RTL statements Load, add, move, etc. operating on programmer-visible registers Microinstruction-based programming model: Similar microarchitecture to that used in instruction-based archs., but… Microinstructions directly specify control signals to be sent to internal mach. registers, muxes, ALUs, etc.; equivalent to low-level RTL statements Dataflow / systolic array programming models: “Program” specifies custom pattern of connectivity among pre-existing arrays of hardware functional units operating in parallel Examples: MIT J-machine, RAW project Circuit-based programming models: “Program” specifies interconnection and internal logic of functional units. Examples: FPGAs (Field-Programmable Gate Arrays) (Xilinx, Altera) Others? (E.g., mesh of processing elements, ea. w. all features) Focus of this course

The Best of All Possible Worlds? Chip Multiprocessor: Homogeneous 2-D (or 3-D) mesh of processing elements Heterogeneous Processing Element Systolic dataflow array Superscalar, highly dynamic, heavily pipelined, RISC CPU core Special-purpose units (graphics, signals, media) Local memory hierarchy 1st-level cache 2nd level cache Reconfigurable FPGA core 3rd-level cache or local DRAM Communications/power/cooling grid

Chapter 2 contents Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F (online at mkp.com)

2.2. Classifying Architectures One important classification scheme is by the type of addressing modes supported. Stack architecture: Operands implicitly on top of a stack. (Early machines, Intel floating-point.) Accumulator architecture: One operand is implicitly an accumulator (a special register). (Early machs.) General-purpose register arch.: Operands may be any of a large (typically 10s-100s) # of registers. Register-memory architectures: One op may be memory. Load-store architectures: All ops are registers, except in special load and store instructions.

Illustrating Architecture Types Assembly for C:=A+B:

Number of Operands A further classification is by the max. number of operands, and # that can be memory: e.g., 2-operand (e.g. a += b) src/dest(reg), src(reg) src/dest(reg), src(mem) IBM 360, x86, 68k src/dest(mem), src(mem) VAX 3-operand (e.g. a = b+c) dest(reg), src1(reg), src2(reg) MIPS, PPC, SPARC, &c. dest(reg), src1(reg), src2(mem) dest(mem), src1(mem), src2(mem) VAX

Memory Addressing Modes & Conventions

2.3. Memory Addressing A memory address n names the location of the (n+1)th “item” in memory. If each item is a byte (octet, 8-bit chunk), then the ISA’s memory system is byte-addressed. (Standard) Also possible is numbering with larger chunks (e.g., 32 bits), such memories are called word-addressed. Objects consisting of several consecutive items might be accessible as a unit: Bytes, half-words (2 bytes), words (4 bytes), double words (8 bytes).

Endians & Alignment Increasing byte address 7 6 5 4 3 2 1 4 4 Word-aligned word at byte address 4. 2 Halfword-aligned word at byte address 2. 1 Byte-aligned (non-aligned) word, at byte address 1. 4 Little-endian byte order (least-significant byte “first”). 3 (MSB) 2 1 0 (LSB) 4 Big-endian byte order (most-significant byte “first”). 0 (LSB) 1 2 3 (MSB)

Addressing Modes In example assembly syntax in middle column, ( ) indicates memory access. (A typical syntax.) In RTL syntax on right, [ ] denotes accessing a member of an array, Register or Memory.

Addressing Modes Visualization Name Instr. Field(s) Reg. File Memory Immediate imm reg Register addr Direct reg Indirect “base” address reg imm all your base are belong to us Displacement + offset

Example row size = 8 locations Addr. Mode Vis. Cont. Mode Name Instr. Field(s) Reg. File Memory “base” address reg1 reg2 Indexed + offset Memory Indirect reg Scaled reg1 reg2 rowsz + (r1)[r2] × Example row size = 8 locations Base address index

(Out of non-register modes) Addressing Mode Usage (Out of non-register modes) (on a VAX)

(Alpha, optimized, SPEC CPU2000) Offset Distribution (Alpha, optimized, SPEC CPU2000)

Popularity of Immediates (Alpha, optimized, SPEC CPU2000)

Distribution of Immediates

Instruction Categories

2.7. Types of Instructions

Instruction Distribution

Control-Flow Instructions

2.9. Control Flow Instructions Four basic types: (Conditional) branches (Unconditional) jumps Procedure calls Procedure returns Control flow addressing modes: Often PC-relative (PC + displacement). Relocatable. Also useful: register indirect jumps (reg. has addr.). Uses: Case / switch statements Virtual functions / methods (abstract class method calls) High-order functions / function pointers Dynamically shared libraries

Conditional Branch Options Condition Code (CC) Register E.g.: X86, ARM, PPC, SPARC, … ALU ops set condition code flags in the CCR Branch just checks the flag Condition register E.g.: Alpha, MIPS Comparison instruction puts result in a GPR Branch instruction checks the register Compare & Branch E.g.: PA-RISC, VAX Compare & branch in 1 instruction.

Special Control-Flow Instrs. In DSPs: Repeat instruction Repeat subsequent code block n times Avoids some loop overhead

Procedure Calling Conventions Two major calling conventions: Caller saves: Before the call, procedure caller saves registers that will be needed later Callee saves: Inside the call, called procedure saves registers that it will overwrite Can be more efficient if many small procedures Many archs. use a combination of schemes: E.g., MIPS: Some registers caller-saves, some callee-saves

Control Flow Instr. Distrib.

Branch Distances

Comparison Types

Data Access Sizes

Outline of Today’s Lecture Additions for signal & media processing: Addressing modes Operands Instruction types Instruction set encodings Role of compilers Examples: MIPS, Trimedia Fallacies & Pitfalls

Last Lecture / This Lecture Chapter 2 contents Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F (online at mkp.com) Last Lecture / This Lecture

DSP & Multimedia Instruction-Set Extensions

Special DSP/media Addr. Modes Modulo or circular addressing: For dealing with circular buffers for handling infinite, continuous streams of data Automatically increment pointer, reset to start of buffer if at end Bit reverse addressing: Facilitates Fast Fourier Transform (FFT) operation The n low-order bits of an address are reversed before making the access Special modes rarely used even in DSP code Mainly just in hand-coded assembly library routines Strided, gather/scatter addressing: Used in SIMD vector machines

Special DSP & Media Operands Media processing (e.g., 2-D & 3-D graphics): Vertex (x,y,z,w coordinates, each a 32-bit float) w is a visibility or color value Pixel (R,G,B,A channels, each an 8-bit integer) Red, Green, Blue; A is transparency Signal processing Fixed point (fractions between −1 and +1)

Special DSP & Media Operations Partitioned add, etc. Use same hardware for multiple small ops as for a single large op E.g. use 1 same hardware that makes up one 64-bit ALU to do four 16-bit adds simultaneously Or, 2 single-precision FP ops w. 1 instruction Examples: Intel MMX, PowerPC AltiVec SIMD (single-inst., multiple data) / vector ops Same idea, more general – used on supercomputers Saturating add, etc. Max out @ MAXINT, instead of throw overflow exception Multiply-accumulate (MAC) Used in dot products for vector & matrix multiplications Others: Max, min, pack, unpack, merge, permute, shuffle, abs

2.10. Instruction Set Encodings Competing forces in IS encoding design: Want as many registers & modes as possible Large register & mode fields  larger programs Want simplicity of pipelined execution path Some solutions: Variable-length encoding (VAX, x86) Fixed-length encoding (most RISC) Hybrid (e.g., MIPS16, Thumb) Dynamic decompression (IBM CodePack)

Instruction Set Encodings

Compiler Technology and ISA Design

2.11. Compiler Passes

Compiler Optimizations

Compiler Optimizations cont.

Effect of Optimization

Compilers Need Architectures that… Provide regularity Orthogonality (independence) of: Registers used Addressing modes Operations used Provide primitives, not solutions Don’t directly support specific kernels or languages Simplify trade-offs among alternatives Make easy to tell fastest code sequence @ compile time Don’t interpret values known at compile time Allow compile-time constants to be provided in immediates

ISA Example: MIPS

Design Principles used in MIPS 2.2. Use GPRs, load-store architecture 2.3. Best addr. Modes: Displacement (12-16 bits), immediate (8-16 bits), register indirect. 2.5. Data sizes/types: 8-64 bit integers, 64-bit IEEE 754 standard doubles 2.7. Support load, store, add, subtract, move, shift. 2.9. Compares: =, ≠, <, branch (relative 8+-bit), jump, call, return 2.10. Fixed encoding for performance, variable for code size 2.11. GPRs, orthogonality, simplicity

MIPS64 Registers 32-bit instructions 32 64-bit GPRs, R0-R31. Really, only 31 – R0 is just a constant 0. 32 64-bit FPRs, F0-F31 Can hold 32-bit floats also (with other ½ unused). “SIMD” extensions operate on 2 floats in 1 FPR A few special registers Floating-point status register Load/store 8-, 16-, 32-, 64-bit integers All sign-extended to fill 64-bit GPR Also 32- bit floats/doubles

MIPS Addressing Modes Register (arith./logical ops only) Immediate (arith./logical only) & Displacement (load/stores only) 16-bit immediate / offset field Register indirect: use 0 as displacement offset Direct (absolute): use R0 as displacement base Byte-addressed memory, 64-bit address Software-settable big-endian/little-endian flag Alignment required

MIPS Instruction Layouts

I-type instructions

R-type instructions

J-type instructions

MIPS Operations Go through figures 2.28-2.31 in textbook, Also, see appendix C online @ www.mkp.com/CA3 (if it’s up) for more details on MIPS & other RISC-style architectures Patterson & Hennssy, Computer Organization & Design: The Hardware/Software Interface, appendix A-10 has a description of MIPS ISA. Textbook for CDA3101

MIPS dynamic instr. frequencies Integer benchmarks Floating-point benchmarks Also see figs. 2.32 (int) & 2.33 (fp) in textbook

MIPS versus VAX (2.16)

Other ISA Examples

2.13 Trimedia TM32 CPU A “media processor” Some special features: Typical media apps include: Data communication (Viterbi decoding) Audio coding (AC3, MP3 encode/decode) Video coding (MPEG2 encode/decode, DVD decode) Video processing (various transforms) Graphics (3d render) Some special features: 5x VLIW encoding (compare 3x in IA-64) Instructions compressed in memory, like w. CodePack

Intel architecture lineage 1970, 4-bit: 4004 (1st gen.-purpose comm. p) 1970’s, 8-bit: 8008, 8080 accumulator machine 1978-80, 16-bit: 8086, + 8087 (stack-based fpu) extended accumulator:additional dedicated registers 1982, 24-bit: 80286 (bigger addr. space, etc.) 1985, 32-bit (quasi-GPR): 2×108 in svc. in 1995 80386, 80486, Pentium, PII, PIII, P4. Any time now, 64-bit: IA-64 (Merced/Itanium)

Intel 80x86 ISA See Williams, Computer Systems Architecture: A Networking Approach, chapter 7 We’ll put this on reserve (& online)

Intel IA-64 See various technical documents available from intel.com, other websites

Some Motorola arch. lineage 8-bit: 6800, 650x, 6809, 68HC11 Early Radio Shack home computers, many embedded systems 16-bit: 68HC12, 68HC16 32-bit: 68000, 68010, 68020, 68030, 68040, 68060 Apple Lisa, Macintosh, orig. commodore Amiga Motorola adopted 32-bit PowerPC for its later processors: 821, 823, 850, 860 PDAs, embedded systems

Thursday’s Lecture We’ll survey other current architectures: Other RISC archs.: Alpha, Sparc, PPC, PA, ARM… Transmeta Crusoe?

Intel IA-64 See various technical documents available from intel.com, other websites

Other archs. We’ll survey other current architectures: Other RISC archs.: Alpha, Sparc, PPC, PA, ARM… Transmeta Crusoe?

Other RISC architectures MIPS (MIPS technologies, inc.) SGIs, CPU cores & embedded systems: routers, PlayStation 2, web appliances, Aibo, TV systems. PowerPC (IBM, Motorola, and Apple) Embedded systems, Macintoshes, new Amiga. SPARC (Sun Microsystems) Workstations, servers, PCI cards, storage systems… PA-RISC (Hewlett-Packard) Workstations, servers Alpha (DEC  Compaq  HP?) 64-bit Unix workstations, VMS servers

Some details from some archs. Go through sections of various books… H&P: App. C, RISC archs. App. D, 80x86 archs. Stallings: PDP-8 & PDP-10 ISAs VAX ISA PII & PPC opcodes, addressing modes, inst. formats IA-64 inst. Format Williams: Chapter on Pentium Section on IA-64 Chapter on MC68300 microcontroller

Fallacies & Pitfalls Fallacies: There is a “typical” program. A flawed architecture cannot succeed. You can design a flawless architecture. Pitfalls: “High-level” ISA features Reducing code size w/o considering compiler Expecting good perf. from DSP compilers