RISC Pipelining RISC Pipelining CS 147 Spring 2011 Kui Cheung.

Slides:



Advertisements
Similar presentations
UQ: Explain in brief integer instruction pipeline stages of Pentium
Advertisements

Instructor: Yuzhuang Hu Final Exam! The final exam is scheduled on 7 th, August, Friday 7:00 pm – 10:00 pm.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
ELEN 468 Advanced Logic Design
CMPT 334 Computer Organization
Chapter 12 Pipelining Strategies Performance Hazards.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Midterm Thursday let the slides be your guide Topics: First Exam - definitely cache,.. Hamming Code External Memory & Buses - Interrupts, DMA & Channels,
Appendix A Pipelining: Basic and Intermediate Concepts
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
RISC Architecture RISC vs CISC Sherwin Chan.
Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.
Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CBP 2005Comp 3070 Computer Architecture1 Last Time … All instructions the same length We learned to program MIPS And a bit about Intel’s x86 Instructions.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
CS203 – Advanced Computer Architecture Pipelining Review.
RISC Pipelining CS 147 Spring 2011 Kui Cheung
Chapter Six.
Advanced Architectures
Computer Organization
CDA3101 Recitation Section 8
Pipelining Chapter 6.
William Stallings Computer Organization and Architecture 8th Edition
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers
ELEN 468 Advanced Logic Design
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Design of the Control Unit for Single-Cycle Instruction Execution
Morgan Kaufmann Publishers The Processor
Pipelining review.
ELEC / Computer Architecture and Design Spring Pipelining (Chapter 6)
Pipelining Chapter 6.
Design of the Control Unit for One-cycle Instruction Execution
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Systems Architecture II
CSCI206 - Computer Organization & Programming
Pipeline control unit (highly abstracted)
Chapter Six.
Chapter Six.
Control unit extension for data hazards
Pipeline control unit (highly abstracted)
CS203 – Advanced Computer Architecture
Pipelining: Basic Concepts
Pipeline Control unit (highly abstracted)
Control unit extension for data hazards
Pipelining Chapter 6.
Morgan Kaufmann Publishers The Processor
Introduction to Computer Organization and Architecture
Pipelining Chapter 6.
Control unit extension for data hazards
Systems Architecture II
Presentation transcript:

RISC Pipelining RISC Pipelining CS 147 Spring 2011 Kui Cheung

RISC Pipelining Classic five stage instruction Fetch – fetch instruction from memory Decode – determine what action is required Execute – execute instruction Memory – data cache access Writeback – write result to register

Arm9 Nintendo DS 5 Stage Pipeline RISC Pipelining Arm9 If we use the basketball team analogy, we can assign the following positions to the different stages. 1)Coach give a play to the point guard. 2)Point guard pass the ball to the right person to execute the play. 3)SF or PF continue setting up the play by doing some fancy moves and then pass the ball to the center. 4)Center continue setup and pass the ball to SG for a clean shot. 5)SG takes the shot. Power Forward Coach Point Guard Small Forward Center Shooting Guard Nintendo DS 5 Stage Pipeline

Arm9 Nintendo DS 5 Stage Pipeline RISC Pipelining Arm9 1)Fetch instruction from instruction register(IR) 4)Access cache if needed 2)Determine what action to take 3)Execute the instruction 5)Write result in register Example: MOV Reg1, Mem1 1)fetch instruction(MOV Reg1, Mem1) 2)decided it is a move instruction from memory to register 3)fetch address of memory to be move 4)fetch data from memory 5)write data to Reg1 Nintendo DS 5 Stage Pipeline

RISC Pipelining 1 2 3 4 5 6 7 8 9 FI DI EX MEM WB Instruction 1 2 3 4 5 6 7 8 9 FI DI EX MEM WB FI - fetch instruction DI - decode instruction EX - execute instruction MEM – data cache access WB - write back

Pipeline Delay FI DI EX MEM WB MOV Reg1, Mem1 MOV Reg1, Reg2 RISC Pipelining Pipeline Delay FI DI EX MEM WB MOV Reg1, Mem1 MOV Reg1, Reg2 MOV Mem2, Reg1 (a) No data load delay in the pipeline 1) move data from Mem1 to Reg1 2) move data from Reg2 to Reg1 3) move data from Reg1 to Mem2

Pipeline Delay Write data from Mem1 into Reg1 FI DI EX MEM WB RISC Pipelining Pipeline Delay Write data from Mem1 into Reg1 FI DI EX MEM WB MOV Reg1,Mem1 MOV Reg2,(Reg1) (b)Data dependency delay Must wait for data to be loaded into Reg1 FI DI EX MEM WB MOV Reg1,Mem1 MOV Reg2,(Reg1) Stall(bubble) 1) move data from Mem1 to Reg1 2) move data from Reg1 to Reg2

Pipeline Delay Add a NOP(no operation perform) to fill the gap FI DI RISC Pipelining Pipeline Delay Add a NOP(no operation perform) to fill the gap FI DI EX MEM WB MOV Reg1,Mem1 NOP MOV Reg2,(Reg1) 1) move data from Mem1 to Reg1 2) no operation perform 3) move data from Reg1 to Reg2

(c)Control dependency delay RISC Pipelining (c)Control dependency delay At this point Reg3 equal Reg2 + Reg1, and line 103 can compare Reg3 to Reg4 and decide jumping to 106 or not FI DI EX MEM WB 101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3 ,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1, Reg2 106 MOV Mem1, Reg4 Data dependency delay jump Reg3 = Reg4, jump to 106 Waiting for 103 to decide going to 104 or jumping to 106 101 add Reg2 to Reg1 and put in Reg3 102 no operation perform 103 if Reg3 = Reg4, jump to 106 else 104 104 move Reg3 to Mem1 105 add Reg2 to Reg1 and put in Reg4 106 move Reg4 to Mem1

(c)Control dependency delay RISC Pipelining (c)Control dependency delay At this point Reg3 equal Reg2 + Reg1, and line 103 can compare Reg3 to Reg4 and decide jumping to 106 or not FI DI EX MEM WB 101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3 ,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1, Reg2 106 MOV Mem1, Reg4 Data dependency delay Reg3 = Reg4, jump to 106, no time wasted Guess branch will happen 101 add Reg2 to Reg1 and put in Reg3 102 no operation perform 103 if Reg3 = Reg4, jump to 106 else 104 104 move Reg3 to Mem1 105 add Reg2 to Reg1 and put in Reg4 106 move Reg4 to Mem1

(c)Control dependency delay RISC Pipelining (c)Control dependency delay At this point Reg3 equal Reg2 + Reg1, and line 103 can compare Reg3 to Reg4 and decide jumping to 106 or not FI DI EX MEM WB 101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3 ,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1, Reg2 106 MOV Mem1, Reg4 107 MOV Reg2, Mem2 Data dependency delay Reg3 not= Reg4, clear and fetch 104 next Guess wrong can lead to wasted time

Pure RISC Pipeline Simple primitive instructions and addressing modes RISC Pipelining Pure RISC Pipeline Simple primitive instructions and addressing modes Instructions execute in one clock cycle Uniformed length instructions and fixed instruction format Instructions interface with memory via fixed mechanisms (load/store) Pipelining Instruction set is orthogonal (little overlapping of instruction functionality) Hardwired control Complexity pushed to the compiler

Pure RISC Pipeline Register to register cycle RISC Pipelining Pure RISC Pipeline Register to register cycle 1) F: instruction fetch from register 2) E: execute , perform ALU operations with register input and output Load and Store cycle 2) E: execute, calculates memory address 3) W: memory, register to memory, memory to register operations

Pure RISC Pipeline a) Traditional pipeline 1 2 3 4 5 6 7 F E W RISC Pipelining Pure RISC Pipeline a) Traditional pipeline Instruction 1 2 3 4 5 6 7 F E W 100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP 105 103 ADD Reg1, Reg2 105 MOVE Mem2, Reg1 100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump to 105 103 add Reg1 to Reg2 105 move Reg1 to Mem2 Jump execute and 103 is cleared from the pipeline, 105 is fetch F – fetch E – execute W – write back

Pure RISC Pipeline a) RISC Pipeline with inserted NOP 1 2 3 4 5 6 7 F RISC Pipelining Pure RISC Pipeline a) RISC Pipeline with inserted NOP Instruction 1 2 3 4 5 6 7 F E W 100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP 105 103 NOP 105 MOVE Mem2, Reg1 100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump to 105 103 no operation 105 move Reg1 to Mem2 A NOP is added so no special circuitry is needed to clear the pipeline F – fetch E – execute W – write back

Pure RISC Pipeline a) Reversed instructions 1 2 3 4 5 6 7 F E W RISC Pipelining Pure RISC Pipeline a) Reversed instructions Instruction 1 2 3 4 5 6 7 F E W 100 MOVE Reg1, Mem1 101 JUMP 105 102 ADD 1, Reg1 105 MOVE Mem2, Reg1 Delayed branch When a branch occur, delay the execution and fetch the next instruction first. ex) fetch 102 before executing JUMP to 105, this way 102 can execute at the same time 105 is fetch 100 move Mem1 to Reg1 101 Jump to 105 102 add Reg1 to Reg2 105 move Reg1 to Mem2 F – fetch E – execute W – write back

Superpipeline A B C D E F G H I J K L Branch executed RISC Pipelining Superpipeline A B C D E F G H I J K L Branch executed and pipeline is clear In theory, more and shorter stages could allow more instructions to be process at the same time. But a branch could lead to wasted cycles.

Arm11 Pipeline Arm11(IPhone 3G) 8 Stage pipeline Fetch Instruction RISC Pipelining Arm11 Pipeline Fetch Instruction Decode Execute Memory Writeback Arm11(IPhone 3G) 8 Stage pipeline

RISC Pipelining Arm Cortex A8(IPhone3GS, Samsung Galaxy S) Dynamic Branch Prediction 95% accuracy Decode(5 stages) Fetch Instruction(2 stages) Execute, Memory, Writeback(6 stages) Arm Cortex A8(IPhone3GS, Samsung Galaxy S) 13 Stage pipeline

I7(Nehalem)Superpipeline RISC Pipelining I7(Nehalem)Superpipeline Fetch Decode 14 Stages Execute Memory, Writeback

Reference http://www.jp.arm.com/event/pdf/forum2008/t1-1.pdf RISC Pipelining Reference http://www.jp.arm.com/event/pdf/forum2008/t1-1.pdf http://www-cs-faculty.stanford.edu/~eroberts/courses/soco/projects/2000- 01/risc/pipelining/index.html http://www.bit-tech.net/hardware/cpus/2008/11/03/intel-core-i7-nehalem-architecture-dive/5 http://qu.academia.edu/AwsYousif/Papers/120709/A_New_Trend_for_CISC_and_RISC_Archit ectures Course text book: Computer Organization and Architecture, 7th editions, William Stallings