CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.

Slides:

Advertisements

Similar presentations

1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.

Advertisements

CIS 314 Fall 2005 MIPS Datapath (Single Cycle and Multi-Cycle)

ELEN 468 Advanced Logic Design

CMPT 334 Computer Organization

Pipelining Preview Basics & Challenges

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.

CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.

CIS629 Fall 2002 Pipelining 2- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens.

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.

DLX Instruction Format

Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.

Appendix A Pipelining: Basic and Intermediate Concepts

Pipelining Basics Assembly line concept An instruction is executed in multiple steps Multiple instructions overlap in execution A step in a pipeline is.

Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.

Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

Lecture 7: Pipelining Review Kai Bu

COMP381 by M. Hamdi 1 Pipelining Improving Processor Performance with Pipelining.

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Lecture 5: Pipelining Implementation Kai Bu

Lecture 05: Pipelining Basics & Hazards Kai Bu

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

CDA 3101 Fall 2013 Introduction to Computer Organization

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining Ver. Jan 14, 2014 Marco D. Santambrogio:

1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.

Processor Design CT101 – Computing Systems. Content GPR processor – non pipeline implementation Pipeline GPR processor – pipeline implementation Performance.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

CS161 – Design and Architecture of Computer Systems

Electrical and Computer Engineering University of Cyprus

Computer Organization

ARM Organization and Implementation

Morgan Kaufmann Publishers

Lecture: Pipelining Basics

Performance of Single-cycle Design

ELEN 468 Advanced Logic Design

CMSC 611: Advanced Computer Architecture

Morgan Kaufmann Publishers The Processor

Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.

Morgan Kaufmann Publishers The Processor

CS/COE0447 Computer Organization & Assembly Language

Pipelining: Implementation

Design of the Control Unit for Single-Cycle Instruction Execution

School of Computing and Informatics Arizona State University

Lecture 3 Instruction Level Parallelism (Pipelining)

Single-Cycle CPU DataPath.

Computer Organization “Central” Processing Unit (CPU)

Design of the Control Unit for One-cycle Instruction Execution

CSC 4250 Computer Architectures

Rocky K. C. Chang 6 November 2017

An Introduction to pipelining

Systems Architecture I

COMS 361 Computer Organization

ECE 463/563 Fall `18 RISC-V instruction formats

Pipelining Appendix A and Chapter 3.

MIPS Pipelining: Part I

Morgan Kaufmann Publishers The Processor

Lecture 06: Pipelining Implementation

Control Unit (single cycle implementation)

Guest Lecturer: Justin Hsia

The Processor: Datapath & Control.

COMS 361 Computer Organization

Processor: Datapath and Control

Pipelining Hazards.

Presentation transcript:

CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining

What is Pipelining? Implementation technique whereby multiple instructions are overlapped in execution Pipelining exploits parallelism among the instructions in a sequential instruction stream Recall the formula: CPU time = IC × CPI × cct Pipelining yields a reduction in the average execution time per instruction; i.e., it decreases the CPI

RISC Architectures Reduced Instruction Set Computer All operations on data apply to data in registers Only operations that affect memory are loads and stores that move data from memory to register or to memory from register, respectively Instruction formats are few in number with all instructions typically the same in size

Three Classes of Instructions We consider ALU instructions Load and store instructions Branches (no jumps)

ALU Instructions Take either two registers or a register and a sign-extended immediate, operate on them, and store result into a third register: DADDR1,R2,R3 OpcodeR2 R3 R1 shamt opx rs rt rd Reg[R1] ← Reg[R2] + Reg[R3] DADDIR1,R2,#3 Opcode R2 R1 Immediate rs rt Reg[R1] ← Reg[R2] + 3

Load and Store Instructions Take register source (base register) and immediate field (offset). The sum (effective address) is memory address. Second register is destination (load) or source (store) of data. LDR2,30(R1) OpcodeR1 R2 Immediate Reg[R2] ← Mem[30+Reg[R1]] SDR2,30(R1) OpcodeR1 R2 Immediate Mem[offset+Reg[R1]] ← Reg[R2]

Branches Branches are conditional transfers of control Branch destination obtained by adding a sign-extended offset to current PC We consider only comparison against zero: BEQZ R1,name BEQZ is pseudo-instruction for BEQ with R0: BEQ R1,R0,name Opcode R1 R0 Immediate

RISC Instruction Set At most five clock cycles: 1. Instruction fetch cycle (IF) 2. Instruction decode/register fetch cycle (ID) 3. Execution/effective address cycle (EX) 4. Memory access/branch completion (MEM) 5. Write-back cycle (WB)

Instruction Fetch (IF) Send program counter (PC) to memory and fetch current instruction from memory; Update PC by adding 4 (why 4?). Operations: IR←Mem[PC]; NPC←PC + 4;

Instruction Decode/Register Fetch (ID) Decode instruction Read registers Decoding is done in parallel with reading registers (fixed-field decoding) Sign-extend the offset field Operations: A←Reg[rs]; B←Reg[rt]; Imm←sign-extended immediate field of IR (A and B are temporary registers).

Execution/Effective Address (EX) ALU operates on the operands prepared in ID, performing one of four possible functions: Memory ref. (add base register and offset):  ALUOutput← A + Imm Register-Register ALU instruction:  ALUOutput← A func B Register-Immediate ALU instruction:  ALUOutput← A op Imm Branch:  ALUOutput← NPC + (Imm << 2)  Cond← (A == 0)

Memory Access/Branch Completion (MEM) PC is updated: PC←NPC Access memory if needed: LMD = Load Memory Data Register LMD←Mem[ALUOutput] or Mem[ALUOutput]←B Branch: If (cond)PC←ALUOutput

Write Back (WB) Register-Register ALU: Reg[rd]←ALUOutput Register-Immediate ALU: Reg[rt]←ALUOutput Load: Reg[rt]←LMD

Simple RISC Pipeline Clock Number Instr. # Instr. i IF ID EX ME WB Instr. i+1IF ID EX ME WB Instr. i+2 IF ID EX ME WB Instr. i+3 IF ID EX ME WB Instr. i+4 IF ID EX ME WB What are the stages needed for an ALU instruction? What are the stages needed for a Store instruction? What are the stages needed for a Branch instruction? Which stage is expected to take the most time?

Figure A.2. Pipeline

Three Observations on Overlapping Execution 1. Use separate instruction and data memories, which is typically implemented with separate instruction and data caches. The use of separate caches eliminates a conflict for a single memory that would arise between instruction fetch and data memory access.

Three Observations on Overlapping Execution 2. The register file is used in two stages: one for reading in ID and one for writing in WB. These uses are distinct. Hence, we need to perform two reads and one write every clock cycle (why two reads?). To handle reads and a write to the same register (and for another reason that will arise), we perform the register write in the first half and the reads in the second half.

Three Observations on Overlapping Execution 3. To start a new instruction every clock, we must increment and store the PC every clock, and this must be done during the IF stage in preparation for the next instruction. Another problem is that a branch does not change the PC until the MEM stage (this problem will be handled soon).

Pipeline Registers Prevent interference between two different instructions in adjacent stages in pipeline. Carry data of a given instruction from one stage to the next. Registers are triggered by clock edge ─ values change instantaneously on clock edge. Add pipelining overhead.

Figure A.3. Pipeline Registers

Example Consider unpipelined processor. Assume 1 ns clock cycle, 4 cycles for ALU operations and branches, and 5 cycles for memory operations. Suppose relative frequencies are 40%, 20%, and 40%, respectively. The pipelining overhead is 0.2 ns. What is the speedup from pipelining?

Answer Average execution time on unpipelined processor =Clock ×Average CPI =1 ns × ((40%+20%)×4+40%×5) =4.4 ns Speedup from pipelining =4.4 ns / 1.2 ns =3.7