COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
Morgan Kaufmann Publishers The Processor
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Chapter 8. Pipelining.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
Chapter 12 Pipelining Strategies Performance Hazards.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Chapter 12 CPU Structure and Function. Example Register Organizations.
DLX Instruction Format
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Appendix A Pipelining: Basic and Intermediate Concepts
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Pipelining By Toan Nguyen.
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.1. Basic idea of instruction pipelining.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.
Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Lecture 16: Basic Pipelining
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Introduction to Computer Organization Pipelining.
CMPE 421 REVIEW: MIDTERM 1. A MODIFIED FIVE-Stage Pipeline PC A Y R MD1 addr inst Inst Memory Imm Ext add rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
CS203 – Advanced Computer Architecture Pipelining Review.
PipeliningPipelining Computer Architecture (Fall 2006)
Chapter Six.
Computer Architecture Chapter (14): Processor Structure and Function
ARM Organization and Implementation
Lecture: Pipelining Basics
PowerPC 604 Superscalar Microprocessor
Out of Order Processors
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
Lecture 5: Pipelining Basics
Chapter Six.
Chapter Six.
Control unit extension for data hazards
Data Dependence Distances
Lecture: Pipelining Basics
Conceptual execution on a processor which exploits ILP
Presentation transcript:

COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.

©S.Maciulevičius Instruction execution Computer executes sequences of instructions I 1, I 2, I 3,..., I n. Every instruction I i consists from several steps or phases, which can be described as follows:  F – instruction fetch,  D - instruction decoding,  O – operand fetch,  X - operation executing,  W – result storing. Of course, partition can be different (it depends on processor)

©S.Maciulevičius Sequential execution In case of sequential execution, (i+1)-th instruction starts after finishing execution of i-th instruction: Phases have different time-span FDOXW FDOXW FD

©S.Maciulevičius Pipeline Pipelined execution of instructions requires rhythmic functioning of pipeline: FDOXWFDOXW t F t D t O t X t W       = max(t F, t D, t O, t X, t W ) ? Duration of stage (phase):

©S.Maciulevičius Pipeline Then execution of the (i +1)-th instruction starts by one step later than the i-th: i) i+1) i+2) i+3) FDOXWFDOXWFDOXWFDOXW

©S.Maciulevičius Pipeline implementation Pipelined execution of instructions requires correct transmitting of information between the stages: Stage circuits Latch Stage circuits Latch Stage circuits Latch Data Clock Of course, the latches between levels of memory cells may be excluded, however, the pipeline design complexity will be higher, but the pipeline can be accelerated

©S.Maciulevičius Example of pipeline 4 stage pipeline can be as in this picture: PC ADD R3R2R1 Register file Address Instruction ADD R3 Value R3 Result R1 R2 Values ALU X OF W R3, Result Cock

©S.Maciulevičius PowerPC pipelines IQ-7 IQ-6 IQ-5 IQ-4 IQ-3 IQ-2 IQ-1 IQ-0 (IU decode) IU buffer IU execute Write FPU buffer FPU decoding FPU execute 1 FPU execute 2 Write BPU decode/ execute Write Load/Store From cache

©S.Maciulevičius PowerPC pipeline – IU 1 and 2 mul 3 cmp 4 add 0 add dedoding execution writing waiting in IQ waiting in IU buffer IQ-1 IQ-0 (decoding) IU buffer IU execution Writing

©S.Maciulevičius PowerPC pipeline – IU Decoding each IU instruction requires 1 cycle After decoding follows execution of operation in integer pipeline mul instruction requires for execution 5 cycles, thus cmp can not be executed in 5-th cycle, so it falls to the IU buffer and stays there, till functional unit gets free Thus add instruction stays in decoding stage

©S.Maciulevičius Pipeline hazards Pipeline work really is not as perfect as previously depicted. There are typically three types of hazards: structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time data hazard refers to a situation where an instruction needs as operand result of previous instruction control hazard occurs when processor executes branch or jump operation; pipeline must be filled from target address

©S.Maciulevičius Data hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequentially executing instructions on the unpipelined machine ADDR1, R2, R3FDOXW SUBR4, R5, R1 FDOXW ANDR6, R1, R7 FDOXW ORR8, R1, R9 FDOXW XORR10, R1, R11 FDOXW

©S.Maciulevičius Data hazards Let we have such two instructions: add r1, r2, r3; r1 := r2 + r3 sub r4, r1, r5; r4 := r1 – r5 add: sub: Similar occurs is a such case: ld r1, a; r1 := ATM[a] add r4, r1, r5; r4 := r1 + r5 FDOX W:  r1 FD O: r1  XW

©S.Maciulevičius Data hazards Data hazards can be eliminated using : Software tools: – inserting NOOP – changing order of instructions Hardware tools: – stalling the pipeline – adding special data lines – bypassing FDOX W:  r1 FD O: r1  X

©S.Maciulevičius Data hazards - Bypassing Data bus Main memory Register file Mux Result buffer ALU Bypass for data load Bypass for result

©S.Maciulevičius Control hazards Control hazards can cause a greater performance loss for pipeline than data hazards When a branch is executed, it may or may not change the PC (program counter) to something other than its current value plus 4 If a branch changes the PC to its target address, it is a taken branch; if it falls through, it is not taken

©S.Maciulevičius Control hazards Branches and jumps branch FDO X:  PC W F Stall XW XW FDO FD FDOXW FDOX i+1 i+2 i+3 i+4 Stall After recognizing branch, pipeline is stalled until branch target address is calculated

©S.Maciulevičius Control hazards What to do in order to reduce possible time losses? As soon as possible find out whether branch occurs As soon as possible calculate new value of PC Measures to reduce the delay time: Using branch prediction Changing instruction order Using multithreading Using buffers for storing unused instructions

©S.Maciulevičius Superpipelining Superpipelining simply refers to pipelining that uses a longer pipeline (with more stages) than "regular" pipelining In theory, a design with more stages, each doing less work, can be scaled to higher clock frequency However, this depends a lot on other design characteristics, and it isn't true by default that a processor claiming superpipelining is "better"

©S.Maciulevičius Superpipeline Pipeline rhythm can be achieved otherwise either: FDOXW Duration of stage (phase): t F t D t O t X t W      = max(t X /2, t D ) F1DOX1W1F2X2W2

©S.Maciulevičius Superpipeline Such superpipeline looks so: F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2

©S.Maciulevičius Superpipeline in Pentium II IFU – Instruction Fetch Unit ID – Instruction Decode RAT – Register Allocator ROB – Reorder Buffer DIS – Dispatcher EX – Execute Stage RET – Retire Unit RET2RET1 EXDISROBRATID2ID1IFU3IFU2IFU1

Haswell pipeline Haswell pipeline can be seen on two next slides:  First part of pipeline - Front End  Second part of pipeline - Back End; this part usually is presented as Haswell Execution Engine ©S.Maciulevičius

©S.Maciulevičius

©S.Maciulevičius