CSC 2405 Computer Systems II Advanced Topics. Instruction Set Architecture.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

OMSE 510: Computing Foundations 4: The CPU!
Pipelining I Topics Pipelining principles Pipeline overheads Pipeline registers and stages Systems I.
Chapter 8. Pipelining.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
1 Seoul National University Logic Design. 2 Overview of Logic Design Seoul National University Fundamental Hardware Requirements  Computation  Storage.
David O’Hallaron Carnegie Mellon University Processor Architecture Logic Design Processor Architecture Logic Design
CSC 2400 Computer Systems I Lecture 4 Processor Architecture.
PipelinedImplementation Part I CSC 333. – 2 – Overview General Principles of Pipelining Goal Difficulties Creating a Pipelined Y86 Processor Rearranging.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
DLX Instruction Format
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
CS:APP CS:APP Chapter 4 Computer Architecture Control Logic and Hardware Control Language CS:APP Chapter 4 Computer Architecture Control Logic and Hardware.
Randal E. Bryant CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture SequentialImplementation Slides.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Systems I.
CS1104: Computer Organisation School of Computing National University of Singapore.
HCL and ALU תרגול 10. Overview of Logic Design Fundamental Hardware Requirements – Communication: How to get values from one place to another – Computation.
1 Seoul National University Pipelined Implementation : Part I.
1 Seoul National University Logic Design. 2 Overview of Logic Design Seoul National University Fundamental Hardware Requirements  Computation  Storage.
Integrated Circuits Costs
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Systems I.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
Computer Architecture Carnegie Mellon University
1 Sequential CPU Implementation. 2 Outline Logic design Organizing Processing into Stages SEQ timing Suggested Reading 4.2,4.3.1 ~
CS:APP3e CS:APP Chapter 4 Computer Architecture Logic Design CS:APP Chapter 4 Computer Architecture Logic Design CENG331 - Computer Organization Murat.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Pipelining Example Laundry Example: Three Stages
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Sequential Hardware “God created the integers, all else is the work of man” Leopold Kronecker (He believed in the reduction of all mathematics to arguments.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Real-World Pipelines Idea Divide process into independent stages
Lecture 18: Pipelining I.
Pipelines An overview of pipelining
Review: Instruction Set Evolution
Lecture 12 Logic Design Review & HCL & Bomb Lab
CMSC 611: Advanced Computer Architecture
Seoul National University
ECE232: Hardware Organization and Design
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Chapter 4 The Processor Part 2
Pipelined Implementation : Part I
Instruction Decoding Optional icode ifun valC Instruction Format
Lecturer: Alan Christopher
Pipelined Implementation : Part I
An Introduction to pipelining
Pipelined Implementation : Part I
Chapter 8. Pipelining.
Systems I Pipelining II
Pipelining Appendix A and Chapter 3.
Systems I Pipelining II
Introduction to the Architecture of Computers
Presentation transcript:

CSC 2405 Computer Systems II Advanced Topics

Instruction Set Architecture

3 Chapter 4 Instruction Set Architecture Assembly Language View – Processor state Registers, memory, … – Instructions addl, movl, leal, … How instructions are encoded as bytes Layer of Abstraction – Above: how to program machine Processor executes instructions in a sequence – Below: what needs to be built Use variety of tricks to make it run fast E.g., execute multiple instructions simultaneously ISA CompilerOS CPU Design Circuit Design Chip Layout Application Program

4 Chapter 4 Instruction Set Architectures Basic ISA Classes StackAccumulatorRegister (Register-memory) Register (load-store) Push ALoad ALoad R1, A Push BAdd BAdd R1, BLoad R2, B AddStore CStore C, R1Add R3, R1, R2 Pop CStore C, R3 The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B. Registers are the class that won out. The more registers on the CPU, the better.

5 Chapter 4 80x86 Instruction Frequency

6 Chapter 4 Relative Frequency of Control Instructions Design hardware to handle branches quickly, since these occur most frequently

7 Chapter 4 CISC Instruction Sets – Complex Instruction Set Computer – Dominant style through mid-80’s Stack-oriented instruction set – Use stack to pass arguments, save program counter – Explicit push and pop instructions Arithmetic instructions can access memory – addl %eax, 12(%ebx,%ecx,4) requires memory read and write Complex address calculation Condition codes – Set as side effect of arithmetic and logical instructions Philosophy – Add instructions to perform “typical” programming tasks

8 Chapter 4 RISC Instruction Sets – Reduced Instruction Set Computer – Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) Fewer, simpler instructions – Might take more to get given task done – Can execute them with small and fast hardware Register-oriented instruction set – Many more (typically 32) registers – Use for arguments, return pointer, temporaries Only load and store instructions can access memory – Similar to Y86 mrmovl and rmmovl No Condition codes – Test instructions return 0/1 in register

9 Chapter 4 Example RISC Instruction Formats Op rs1rd immediate Op Op rs1rs2 offset added to PC rd Register-Register (R-type)ADD R1, R2, R Register-Immediate (I-type)SUB R1, R2, #3 Jump / Call (J-type)JUMP end func (ALU imm. operations, loads and stores, conditional branch, jump (and link) (jump, jump and link, trap and return from exception) (ALI reg. operations, read/write special registers and moves)

10 Chapter 4 CISC vs. RISC Original Debate – Strong opinions! – CISC proponents---easy for compiler, fewer code bytes – RISC proponents---better for optimizing compilers, can make run fast with simple chip design Current Status – For desktop processors, choice of ISA not a technical issue With enough hardware, can make anything run fast Code compatibility more important – For embedded processors, RISC makes sense Smaller, cheaper, less power

Logic Design

12 Chapter 4 Overview of Logic Design Fundamental Hardware Requirements – Communication How to get values from one place to another – Computation – Storage Bits are Our Friends – Everything expressed in terms of values 0 and 1 – Communication Low or high voltage on wire – Computation Compute Boolean functions – Storage Store bits of information

13 Chapter 4 Digital Signals – Use voltage thresholds to extract discrete values from continuous signal – Simplest version: 1-bit signal Either high range (1) or low range (0) With guard range between them – Not strongly affected by noise or low quality circuit elements Can make circuits simple, small, and fast Voltage Time 0 1 0

14 Chapter 4 Computing with Logic Gates – Outputs are Boolean functions of inputs – Respond continuously to changes in inputs With some, small delay Voltage Time a b a && b Rising Delay Falling Delay

15 Chapter 4 Combinational Circuits Acyclic Network of Logic Gates – Continuously responds to changes on primary inputs – Primary outputs become (after some delay) Boolean functions of primary inputs Acyclic Network Primary Inputs Primary Outputs

16 Chapter 4 Bit Equality – Generate 1 if a and b are equal Hardware Control Language (HCL) – Very simple hardware description language Boolean operations have syntax similar to C logical operations – We’ll use it to describe control logic for processors Bit equal a b eq bool eq = (a&&b)||(!a&&!b) HCL Expression

17 Chapter 4 Word Equality – 32-bit word size – HCL representation Equality operation Generates Boolean value b 31 Bit equal a 31 eq 31 b 30 Bit equal a 30 eq 30 b1b1 Bit equal a1a1 eq 1 b0b0 Bit equal a0a0 eq 0 Eq = = B A Word-Level Representation bool Eq = (A == B) HCL Representation

18 Chapter 4 1-Bit Latch D Latch Q+ Q– R S D C Data Clock Latching 1 d!d d dd 0 Storing d!d q !q q 0 0

19 Chapter 4 Registers – Stores word of data Different from program registers seen in assembly code – Collection of edge-triggered latches – Loads input on rising edge of clock IO Clock D C Q+ D C D C D C D C D C D C D C i7i7 i6i6 i5i5 i4i4 i3i3 i2i2 i1i1 i0i0 o7o7 o6o6 o5o5 o4o4 o3o3 o2o2 o1o1 o0o0 Clock Structure

20 Chapter 4 Random-Access Memory – Stores multiple words of memory Address input specifies which word to read or write – Register file Holds values of program registers %eax, %esp, etc. Register identifier serves as address – ID 8 implies no read or write performed – Multiple Ports Can read and/or write multiple words in one cycle – Each has separate address and data input/output Register file Register file A B W dstW srcA valA srcB valB valW Read portsWrite port Clock

21 Chapter 4 Basic Logic Gates NOTE: okay to use just a circle for NOT: 

22 Chapter 4 More than 2 Inputs? AND/OR can take any number of inputs. – AND = 1 if all inputs are 1. – OR = 1 if any input is 1. – Similar for NAND/NOR. Can implement with multiple two-input gates

23 Chapter 4 Logical Completeness Can implement ANY truth table with AND, OR, NOT. ABCD AND combinations that yield a "1" in the truth table. 2. OR the results of the AND gates.

24 Chapter 4 DeMorgan's Law Converting AND to OR (with some help from NOT) Consider the following gate: AB To convert AND to OR (or vice versa), invert inputs and output.

25 Chapter 4 Decoder n inputs, 2 n outputs – exactly one output is 1 for each possible input pattern 2-bit decoder

Sequential Processors

27 Chapter 4 Sequential HW Structure State – Program counter register (PC) – Condition code register (CC) – Register File – Memories Access same memory space Data: for reading/writing program data Instruction: for reading instructions Instruction Flow – Read instruction at address specified by PC – Process through stages – Update program counter Instruction memory Instruction memory PC increment PC increment CC ALU Data memory Data memory Fetch Decode Execute Memory Write back icode, ifun rA,rB valC Register file Register file AB M E Register file Register file AB M E PC valP srcA,srcB dstA,dstB valA,valB aluA,aluB Bch valE Addr, Data valM PC valE,valM newPC

28 Chapter 4 Seqential Stages Fetch – Read instruction from instruction memory Decode – Read program registers Execute – Compute value or address Memory – Read or write data Write Back – Write program registers PC – Update program counter Instruction memory Instruction memory PC increment PC increment CC ALU Data memory Data memory Fetch Decode Execute Memory Write back icode, ifun rA,rB valC Register file Register file AB M E Register file Register file AB M E PC valP srcA,srcB dstA,dstB valA,valB aluA,aluB Bch valE Addr, Data valM PC valE,valM newPC

29 Chapter 4 Instruction Decoding Instruction Format – Instruction byteicode:ifun – Optional register byterA:rB – Optional constant wordvalC 50 rArB D icode ifun rA rB valC Optional

30 Chapter 4 Sequential Summary Implementation – Express every instruction as series of simple steps – Follow same general flow for each instruction type – Assemble registers, memories, predesigned combinational blocks – Connect with control logic Limitations – Too slow to be practical – In one cycle, must propagate through instruction memory, register file, ALU, and data memory – Would need to run clock very slowly – Hardware units only active for fraction of clock cycle

Pipelined Processors

32 Chapter 4 What is Pipelining Computers execute billions of instructions, so instruction throughput is what matters IDEA: Divide instruction execution up into several pipeline stages. For example IF ID EX MEM WB Simultaneously have different instructions in different pipeline stages The length of the longest pipeline stage determines the cycle time Desirable pipeline features (e.g., RISC): – all instructions same length – registers located in same place in instruction format – memory operands only in loads or stores

33 Chapter 4 What Is Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes ABCD

34 Chapter 4 What Is Pipelining Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? A B C D PM Midnight TaskOrderTaskOrder Time

35 Chapter 4 Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads ABCD 6 PM Midnight TaskOrderTaskOrder Time What Is Pipelining

36 Chapter 4 Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup ABCD 6 PM 789 TaskOrderTaskOrder Time What Is Pipelining

37 Chapter 4 Real-World Pipelines: Car Washes Idea – Divide process into independent stages – Move objects through stages in sequence – At any given times, multiple objects being processed SequentialParallel Pipelined

38 Chapter 4 Pipeline Diagrams Unpipelined – Cannot start new operation until previous one completes 3-Way Pipelined – Up to 3 operations in process simultaneously Time OP1 OP2 OP3 Time ABC ABC ABC OP1 OP2 OP3

39 Chapter 4 Data Dependencies System – Each operation depends on result from preceding one Clock Combinational logic RegReg Time OP1 OP2 OP3

40 Chapter 4 Data Hazards – Result does not feed back around in time for next operation – Pipelining has changed behavior of system RegReg Clock Comb. logic A RegReg Comb. logic B RegReg Comb. logic C Time OP1 OP2 OP3 ABC ABC ABC OP4 ABC

41 Chapter 4 One Memory Port/Structural Hazards I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMem Ifetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg

42 Chapter 4 I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Stall Instr 3 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg Bubble How do you “bubble” the pipe? One Memory Port/Structural Hazards

43 Chapter 4 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Data Hazard on R1 Time (clock cycles) IFID/RF EX MEM WB

44 Chapter 4 Read After Write (RAW) Instr J tries to read operand before Instr I writes it Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. Three Generic Data Hazards I: add r1,r2,r3 J: sub r4,r1,r3

45 Chapter 4 Write After Read (WAR) Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards

46 Chapter 4 Three Generic Data Hazards Write After Write (WAW) Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

47 Chapter 4 Data Forwarding Naïve Pipeline – Register isn’t written until completion of write-back stage – Source operands read from register file in decode stage Needs to be in register file at start of stage Observation – Value generated in execute or memory stage Trick – Pass value directly from generating instruction to decode stage – Needs to be available at end of decode stage

48 Chapter 4 Time (clock cycles) Forwarding to Avoid Data Hazard I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg