COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

Pipelining (Week 8).
Morgan Kaufmann Publishers The Processor
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Chapter 8. Pipelining.
Instruction-Level Parallelism (ILP)
MIPS Pipelined Datapath
COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
ECE 445 – Computer Organization
EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Pipelining By Toan Nguyen.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
B10001 Pipelining Hazards ENGR xD52 Eric VanWyk Fall 2012.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CMPE 421 Parallel Computer Architecture
COMP25212 Lecture 51 Pipelining Reducing Instruction Execution Time.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
Pipelining Example Laundry Example: Three Stages
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
10/11: Lecture Topics Execution cycle Introduction to pipelining
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Interstage Buffers 1 Computer Organization II © McQuain Pipeline Timing Issues Consider executing: add $t2, $t1, $t0 sub $t3, $t1, $t0 or.
Speed up on cycle time Stalls – Optimizing compilers for pipelining
Pipeline Timing Issues
CDA3101 Recitation Section 8
ARM Organization and Implementation
CSCI206 - Computer Organization & Programming
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Dr. Javier Navaridas Pipelining Dr. Javier Navaridas COMP25212 System Architecture.
Morgan Kaufmann Publishers The Processor
Pipelining review.
Pipelining Chapter 6.
Current Design.
Computer Organization CS224
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Dr. Javier Navaridas Pipelining Dr. Javier Navaridas COMP25212 System Architecture.
Systems Architecture II
CSCI206 - Computer Organization & Programming
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Pipelining.
Control unit extension for data hazards
Control unit extension for data hazards
MIPS Pipelined Datapath
Need to stall for one cycle.
Presentation transcript:

COMP25212 Further Pipeline Issues

Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power 115 KW (250KW inc Storage and cooling)

COMP25212 Further Pipeline Issues

COMP25212 More Pipeline Detail Register Bank Data Cache PC Instruction Cache MUX ALU IF ID EX MEM WB

COMP25212 Data Hazards Pipeline can cause other problems Consider ADD R1,R2,R3 MUL R0,R1,R1 The ADD instruction is producing a value in R1 The following MUL instruction uses R1 as input

COMP25212 Instructions in the Pipeline Register Bank Data Cache PC Instruction Cache MUX ALU IF ID EX MEM WB ADD R1,R2,R3MUL R0,R1,R1

COMP25212 The Data isn’t Ready At end of ID cycle, MUL instruction should have selected value in R1 to put into buffer at input to EX stage But the correct value for R1 from ADD instruction is being put into the buffer at output of EX stage at this time It won’t get to input of Register Bank until one cycle later – then probably another cycle to write into R1

COMP25212 Insert Delays? One solution is to detect such data dependencies in hardware and hold instruction in decode stage until data is ready – ‘bubbles’ & wasted cycles again Another is to use the compiler to try to reorder instructions Only works if we can find something useful to do – otherwise insert NOPs - waste

COMP25212 Forwarding Register Bank Data Cache PC Instruction Cache MUX ALU ADD R1,R2,R3MUL R0,R1,R1 We can add extra paths for specific cases Control becomes more complex

COMP25212 Why did it Occur? Due to the design of our pipeline In this case, the result we want is ready one stage ahead of where it was needed, why pass it down the pipeline? But what if we have the sequence LDR R1,[R2,R3] MUL R0,R1,R1 LDR instruction means load R1 from memory address R2+R3

COMP25212 Pipeline Sequence for LDR Fetch Decode and read registers (R2 & R3) Execute – add R2+R3 to form address Memory access, read from address Now we can write the value into register R1 We have designed the ‘worst case’ pipeline to work for all instructions

Forwarding Register Bank Data Cache PC Instruction Cache MUX ALU NOPMUL R0,R1,R1 We can add extra paths for specific cases Control becomes more complex LDR R1,[R2,R3]

COMP25212 Longer Pipelines As mentioned previously we can go to longer pipelines –Do less per pipeline stage –Each step takes less time –So can increase clock frequency –But greater penalty for hazards –More complex control Negative returns?

COMP25212 Where Next? Despite these difficulties it is possible to build processors which approach 1 cycle per instruction (cpi) Given that the computational model is one of serial instruction execution can we do any better than this?