Caltech CS184 Spring2003 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 2: April 2, 2003 Instruction Set Architecture.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Instruction Set Design
Chapter 8: Central Processing Unit
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
Lecture 3: Instruction Set Principles Kai Bu
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
INSTRUCTION SET ARCHITECTURES
1 ECE462/562 ISA and Datapath Review Ali Akoglu. 2 Instruction Set Architecture A very important abstraction –interface between hardware and low-level.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
10/9: Lecture Topics Starting a Program Exercise 3.2 from H+P Review of Assembly Language RISC vs. CISC.
ELEN 468 Advanced Logic Design
Computer Organization and Architecture
Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Computer Organization and Architecture
Chapter 11 Instruction Sets
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 2: March 30, 2005 Instruction Set Architecture.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day3:
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Computer Architecture and Organization
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Lecture 04: Instruction Set Principles Kai Bu
 1998 Morgan Kaufmann Publishers MIPS arithmetic All instructions have 3 operands Operand order is fixed (destination first) Example: C code: A = B +
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
What is a program? A sequence of steps
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 5: January 17, 2003 What is required for Computation?
Computer Organization Instructions Language of The Computer (MIPS) 2.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 3: April 4, 2003 Pipelined ISA.
15-740/ Computer Architecture Lecture 3: Performance
William Stallings Computer Organization and Architecture 8th Edition
ELEN 468 Advanced Logic Design
Computer Organization & Design Microcode for Control Sec. 5
Overview Introduction General Register Organization Stack Organization
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Computer Architecture (CS 207 D) Instruction Set Architecture ISA
Lecture 4: MIPS Instruction Set
CS170 Computer Organization and Architecture I
The University of Adelaide, School of Computer Science
ECE232: Hardware Organization and Design
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Computer Instructions
Introduction to Microprocessor Programming
COMS 361 Computer Organization
CPU Structure CPU must:
Lecture 4: Instruction Set Design/Pipelining
Chapter 11 Processor Structure and function
CS/COE0447 Computer Organization & Assembly Language
Presentation transcript:

Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 2: April 2, 2003 Instruction Set Architecture

Caltech CS184 Spring DeHon 2 Today Datapath review H&P view on ISA Questions Themes Compilers RISC

Caltech CS184 Spring DeHon 3 RISC? What does RISC mean?

Caltech CS184 Spring DeHon 4 Terminology Primitive Instruction (pinst) –Collection of bits which tell a single bit- processing element what to do –Includes: select compute operation input sources in space –(interconnect) input sources in time –(retiming) Compute net0 add mem slot#6

Caltech CS184 Spring DeHon 5 Instructions Distinguishing feature of programmable architectures? –Instructions -- bits which tell the device how to behave Compute net0 add mem slot#6

Caltech CS184 Spring DeHon 6 Single ALU Datapath

Caltech CS184 Spring DeHon 7 With Branch and Indirect Instr: ALUOP Bsel Write Bsrc Asrc DST Baddr CS184a Day 5

Caltech CS184 Spring DeHon 8 ISA Model based around Sequential Instruction Execution Visible-Machine-State  Instruction  New Visible-Machine-State New Visible-Machine-State identifies next instruction –PC  PC+1 or PC  branch-target

Caltech CS184 Spring DeHon 9 Machine State: Initial Counter: 0 Instruction Memory: 000: : : Data Memory: 000: A 001: B 010: ? 011: ? 100: : ? 110: ? 111: ? CS184a Day 5

Caltech CS184 Spring DeHon 10 ISA Visible Machine State –Registers (including PC) –Memory Instructions –ADD R1, R2, R3 –LD R3, R4 –BNE R4, R2, loop

Caltech CS184 Spring DeHon 11 Datapath [Datapath from PH (Fig. 5.1)]

Caltech CS184 Spring DeHon 12 Instructions Primitive operations for constructing (describing) a computation Need to do? –Interconnect (space and time) –Compute (intersect bits) –Control (select operations to run)

Caltech CS184 Spring DeHon 13 Detail Datapath [Datapath from PH (Fig. 5.13)]

Caltech CS184 Spring DeHon 14 uCoded / Decoded uCoded –Bits directly control datapath –Horizontal vs. Vertical –Not abstract from implementation Decoded –more compressed –only support most common operations –abstract from implementation –time/area to decode to datapath control signals

Caltech CS184 Spring DeHon 15 Detail Datapath [Datapath from PH (Fig. 5.13)]

Caltech CS184 Spring DeHon 16 H&P View ISA design done? Not many opportunities to completely redefine Many things mostly settled –at least until big technology perturbations arrive Implementation (uArch) is where most of the action is Andre: maybe we’ve found a nice local minima...

Caltech CS184 Spring DeHon 17 H&P Issues Registers/stack/accumulator –# operands, memory ops in instruction Addressing Modes Operations Control flow Primitive Data types Encoding

Caltech CS184 Spring DeHon 18 Register/stack/accumulator Driven largely by cost model –ports into memory –latency of register versus memory –instruction encoding (bits to specify) Recall Assignment 3: Instructions #4

Caltech CS184 Spring DeHon 19 Register/stack/accumulator Today: Load-Store, General Register arch. Registers more freedom of addressing than stack Load into register, then operate –Separate op, not much slower than memory addressing mode –usually use more than once (net reduction)

Caltech CS184 Spring DeHon 20 Addressing Modes Minimal: –Immediate #3 –Register R1 –register indirect (R1) Others: –displacement –indirect (double derference) –auto increment/decrement ( p[x++]=y) –scaled

Caltech CS184 Spring DeHon 21 Addressing Modes More capable –less instructions –potentially longer instructions bits and cycle time –many operations (complicate atomicity of instructions) Add (R2)+,(R3)+,(R4)+

Caltech CS184 Spring DeHon 22 Address Space Quote The Virtual Address eXtension of the PDP-11 architecture... provides a virtual address of about 4.3 gigabytes which, even given the rapid improvement of memory technology, should be adequate far into the future. William Strecker, “VAX-11/780—A Virtual address Extension to the PDP-11 Family,” AFIPS Proc., National Computer Conference, 1978

Caltech CS184 Spring DeHon 23 Operations ALU/Arithmetic –add, sub, or, and, xor –compare Interconnect –move registers –load, store Control –jump –conditional branch –procedure call/return

Caltech CS184 Spring DeHon 24 Operations: ALU Small set of SIMD operations Covers very small fraction of the space of all w  w  w

Caltech CS184 Spring DeHon 25 Operations: Branching Models: –ops set condition codes, branch on condition codes Extra state –compare result placed in register; branch on register zero or one –comparison part of branch May affect critical path for branch resolution

Caltech CS184 Spring DeHon 26 Operations: Procedure call/return ? Save registers? Update PC –call target –return address Change stack and frame pointers –store old –install new

Caltech CS184 Spring DeHon 27 Operations: Procedure call/return Question: How much should instruction do? Lesson: High variance in work needs to be done –which registers need to save –best way to transfer arguments to procedures –better to expose primitives to the compiler and let it specialize the set of operations to the particular call

Caltech CS184 Spring DeHon 28 Data Types Powers of two from bytes to double words? –8, 16, 32, 64 –(very implementation driven decision) Floating Point types Are pointers integers? Alignment requirements

Caltech CS184 Spring DeHon 29 Encoding Variable vs. Fixed How complex is the decoding? –Fields in the same place…or have to be routed/muxed? –Sequential requirements in decode? E.g. must decode previous byte to know what to do with next byte?

Caltech CS184 Spring DeHon 30 Detail Datapath [Datapath from PH (Fig. 5.13)]

Caltech CS184 Spring DeHon 31 Encoding: RISC/Modern [DLX Instruction Format from HP2nd ed. (Fig. 2.21)]

Caltech CS184 Spring DeHon 32 Operation Complexity Contradiction? –Providing primitives –including floating point ops

Caltech CS184 Spring DeHon 33 Local Minima? Self-Fulfilling? –How would we quantitatively validate need for a new operation? –[cue: bridge story] –This is what we use as primitives –Funny, we don’t find a need for other primitives…

Caltech CS184 Spring DeHon 34 Themes Common case fast Provide primitives (building blocks) Let compiler specialize to particular operation Make decode/operation simple so implementation is fast

Caltech CS184 Spring DeHon 35 Compilers 1960  1990 shift –increasing capability and sophistication of compilers –e.g. inter-procedural optimization register assignment (register usage) strength reduction dataflow analysis and instruction reordering (some progress) alias analysis

Caltech CS184 Spring DeHon 36 Compilers Gap between programmer and Architecture Increasingly bridged by compiler Less need to make assembly language human programmable More opportunity for compiler to specialize, partial evaluate –(do stuff at compile time to reduce runtime) RISC: “Relegate Interesting Stuff to Compiler”

Caltech CS184 Spring DeHon 37 Implementation Significance André Agree: Implementation issues are significant in the design of ISA Many of these issues are more interesting when we discuss in light of implementation issues

Caltech CS184 Spring DeHon 38 ISA Driven by 1.Implementation costs 2.Compiler technology 3.Application structure Can’t do good architecture in isolation from any of these issues.

Caltech CS184 Spring DeHon 39 RISC?

Caltech CS184 Spring DeHon 40 VAX Instructions Vary in length 1 to 53 bytes Some very complex –Powerful call routines –Polynomial evaluate (polyf) –Calculate CRC (crc)

Caltech CS184 Spring DeHon 41 VAX / MIPS procedure

Caltech CS184 Spring DeHon 42 RISC Reduced Instruction Set Computers Idea: –Provide/expose minimal primitives –Make sure primitives fast –Compose primitives to build functionality –Provide orthogonal instructions

Caltech CS184 Spring DeHon 43 RISC Equation Time= CPI  Instructions  CycleTime CISC: –Minimize: Instructions –Result in High CPI –Maybe High CycleTime RISC: –Target single-cycle primitives (CPI~1) –Instruction Count increases –Simple encoding, ops  reduced Cycle Time

Caltech CS184 Spring DeHon 44 VAX Data [Emer/Clark, ISCA 1984]

Caltech CS184 Spring DeHon 45 RISC Enabler 1 “large”, fast On-Chip SRAM –Large enough to hold kernel exploded in RISC Ops ~ 1--10K 32b words? Previous machines: –Off-chip memory bandwidth bottleneck –Fetch single instruction from off chip –Execute large number of microinstructions from on- chip ROM ROM smaller than SRAM Small/minimal machine  make room for cache

Caltech CS184 Spring DeHon 46 RISC Enable 2 High Level Programming –Bridge semantic gap by compiler –As opposed to providing powerful building blocks to assembly language programmer

Caltech CS184 Spring DeHon 47 Fit Problem “A great dal depends on being able to fit an entire CPU design on a single chip." "RISC computers benefit from being realizable at an earlier date."

Caltech CS184 Spring DeHon 48 Common Case "wherever there is a system function that is expensive or slow in all its generality, but where software can recognize a frequently occurring degenerate case (or can move the entire function from runtime to compile time) that function is moved from hardware to software, resulting in lower cost and improved performance." – 801 paper

Caltech CS184 Spring DeHon 49 Measurement Good Don’t assume you know what’s going on – measure Tune your intuition "Boy, you ruin all our fun -- you have data.” – DEC designers in response to a detailed quantitative study [Emer/Clark Retrospective on 11/780 performance characterization]

Caltech CS184 Spring DeHon 50 VAX/RISC Compare [Bhandarkar/Clark ASPLOS 1991]

Caltech CS184 Spring DeHon 51 VAX/RISC Compare [Bhandarkar/Clark ASPLOS 1991]

Caltech CS184 Spring DeHon 52 VAX Smoking gun?: –3-operand instructions –One cycle per operand field –If field a mem-op, wash with ld/st in RISC –If register-op, costs more …long way to supporting gap…

Caltech CS184 Spring DeHon 53 ISA Growth Can add instructions (upward compatibility) Do we ever get to remove any?

Caltech CS184 Spring DeHon 54 RISC Everyone believe RISC –X86 only one left –...and it’s a RISC core… …but do they understand it? –Today’s processors pretty complicated –Who’s managing instruction scheduling? –What mean to FPGAs?

Caltech CS184 Spring DeHon 55 Big Ideas Common Case –Measure to verify/know what it is! Primitives Highly specialized instructions brittle Design pulls –simplify processor implementation –simplify coding Orthogonallity (limit special cases) Compiler: fill in gap between user and hardware architecture