Review CSE 4711 Computer Design and Organization Architecture = Design + Organization + Performance Topics in this class: –Central processing unit: deeply.

Slides:

Advertisements

Similar presentations

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Advertisements

Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

Instruction-Level Parallelism (ILP)

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Chapter 12 Pipelining Strategies Performance Hazards.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

DLX Instruction Format

The Basics: Pipelining J. Nelson Amaral University of Alberta 1.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

Appendix A Pipelining: Basic and Intermediate Concepts

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

CIS429.S00: Lec12- 1 Miscellaneous pipelining topics Why pipelining is so hard: exception handling Advanced pipelining techniques: loop unrolling.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

1 Manchester Mark I, This was the second (the first was a small- scale prototype) machine built at Cambridge. A production version of this computer.

CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.

Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

Lecture 16: Basic Pipelining

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.

CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.

ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Computer Organization

Computer Organization CS224

Lecture 15: Pipelining: Branching & Complications

Pipelining: Hazards Ver. Jan 14, 2014

Pipelining Wrapup Brief overview of the rest of chapter 3

Appendix C Pipeline implementation

Exceptions & Multi-cycle Operations

Computer Design and Organization

Pipelining: Advanced ILP

Morgan Kaufmann Publishers The Processor

Pipelining review.

Pipelining in more detail

CSE 586 Computer Architecture

Performance of computer systems

\course\cpeg323-05F\Topic6b-323

Pipelining Basic concept of assembly line

How to improve (decrease) CPI

Pipeline control unit (highly abstracted)

The Processor Lecture 3.6: Control Hazards

Control unit extension for data hazards

An Introduction to pipelining

The Processor Lecture 3.5: Data Hazards

Instruction Execution Cycle

Project Instruction Scheduler Assembler for DLX

Computer Design and Organization

Overview What are pipeline hazards? Types of hazards

Pipeline control unit (highly abstracted)

Pipeline Control unit (highly abstracted)

Control Hazards Branches (conditional, unconditional, call-return)

Pipelining Basic concept of assembly line

Performance of computer systems

Control unit extension for data hazards

Pipelining Basic concept of assembly line

Morgan Kaufmann Publishers The Processor

Control unit extension for data hazards

Throughput = #instructions per unit time (seconds/cycles etc.)

Interrupts and exceptions

Guest Lecturer: Justin Hsia

CMSC 611: Advanced Computer Architecture

Presentation transcript:

Review CSE 4711 Computer Design and Organization Architecture = Design + Organization + Performance Topics in this class: –Central processing unit: deeply pipelined, multiple instr. per cycle, exploitation of instruction level parallelism (in-order and out-of- order), support for speculation (branch prediction, spec. loads). –Memory hierarchy: multi-level cache hierarchy, includes hardware and software assists for enhanced performance –Multiprocessors: SMP’s and CMP’s –cache coherence and synchronization –Multithreading: Fine, coarse and SMT –Some “advanced” topic: current research in dept.

Review CSE 4712 Technological improvements CPU : –Annual rate of speed improvement is 35% before 1985 and 60% from 1985 until 2003 –Slightly faster than increase in number of transistors on-chip (Moore’s law) Memory: –Annual rate of speed improvement (decrease in latency) is < 10% –Density quadruples in 3 years. I/O : –Access time has improved by 30% in 10 years –Density improves by 50% every year

Review CSE 4713 Moore’s Law

Review CSE 4714 Evolution of Intel Microprocessor Speeds

Review CSE 4715 Power Dissipation

Review CSE 4716 Processor-Memory Performance Gap (there is a much nicer graph in H&P 4 th Ed Figure 5.2 page 289 although it assumes that the processor speed is still improving) x Memory latency decrease (10x over 8 years but densities have increased 100x over the same period) o x86 CPU speed (100x over 10 years) “Memory gap” “Memory wall” x x x x x x o o o o o 386 Pentium Pentium Pro Pentium III Pentium IV

Review CSE 4717 Performance evaluation basics Performance inversely proportional to execution time Elapsed time includes: user + system; I/O; memory accesses; CPU per se CPU execution time (for a given program): 3 factors –Number of instructions executed –Clock cycle time (or rate) –CPI: number of cycles per instruction (or its inverse IPC) CPU execution time = Instruction count * CPI * clock cycle time

Review CSE 4718 Components of the CPI CPI for single instruction issue with ideal pipeline = 1 Previous formula can be expanded to take into account classes of instructions –For example in RISC machines: branches, f.p., load-store. –For example in CISC machines: string instructions CPI =  CPI i * f i where f i is the frequency of instructions in class i We’ll talk about “contributions to the CPI” from, e.g,: –memory hierarchy –branch (misprediction) –hazards etc.

Review CSE 4719 Comparing and summarizing benchmark performance For execution times, use (weighted) arithmetic mean: Weighted Ex. Time =  Weight i * Time i For rates, use (weighted) harmonic mean: Weighted Rate = 1 /  (Weight i / Rate i ) As per Jim Smith (1988 – CACM) “Simply put, we consider one computer to be faster than another if it executes the same set of programs in less time” Common benchmark suites: SPEC for int and fp (SPEC92, SPEC95, SPEC2000, SPEC2006), SPECweb, SPECjava etc., Ogden benchmark (linked lists), multimedia etc.

Review CSE Computer design: Make the common case fast Amdahl’s law (speedup) Speedup = (performance with enhancement)/(performance base case) Or equivalently Speedup = (exec.time base case)/(exec.time with enhancement) Application to parallel processing –s fraction of program that is sequential –Speedup S is at most 1/s –That is if 20% of your program is sequential the maximum speedup with an infinite number of processors is at most 5

Review CSE Pipelining One instruction/result every cycle (ideal) –Not in practice because of hazards Increase throughput (wrt non-pipelined implementation) –Throughput = number of results/second Speed-up (over non-pipelined implementation) –In the ideal case, if n stages, the speed-up will be close to n. Can’t make n too large: physical limitations and load balancing between stages & hazards Might slightly increase the latency of individual instructions (pipeline overhead)

Review CSE Basic pipeline implementation Five stages: IF, ID, EXE, MEM, WB What are the resources needed and where –ALU’s, Registers, Multiplexers etc. What info. is to be passed between stages –Requires pipeline registers between stages: IF/ID, ID/EXE, EXE/MEM and MEM/WB –What is stored in these pipeline registers? Design of the control unit.

Review CSE Inst. mem. 4 PC ALU Data mem. Regs. sese 2 zero IF ID/RR EXE Mem WB IF/ID ID/EX EX/MEM MEM/WB (PC) (Rd) data control Five instructions in progress; one of each color

Review CSE Hazards Structural hazards –Resource conflict (mostly in multiple instruction issue machines; also for resources which are used for more than one cycle) Data dependencies –Most common RAW but also WAR and WAW in OOO execution Control hazards –Branches and other flow of control disruptions Consequence: stalls in the pipeline –Equivalently: insertion of bubbles or of no-ops

Review CSE Pipeline speed-up

Review CSE Example of structural hazard For single issue machine: common data and instruction memory (unified cache) –Pipeline stall every load-store instruction (control easy to implement) Better solutions –Separate I-cache and D-cache –Instruction buffers –Both + sophisticated instruction fetch unit! Will see more cases in multiple issue machines

Review CSE Data hazards Data dependencies between instructions that are in the pipe at the same time. For single pipeline in order issue: Read After Write hazard (RAW) AddR1, R2, R3#R1 is result register SubR4, R1,R2#conflict with R1 AddR3, R5, R1#conflict with R1 OrR6,R1,R2#conflict with R1 Add R5, R2, R1#R1 OK now (5 stage pipe)

Review CSE || | | | | | | | | | | ||| |||| ||||| Add R1, R2, R3 R1 available here Sub R4,R1,R2 R 1 needed here ADD R3,R5,R1 OR R6,R1,R2 Add R5,R1,R2|||||| OK IF ID EXE MEM WB OK if in ID stage one can write in 1 st part of cycle and read in 2 nd part

Review CSE Forwarding Result of ALU operation is known at end of EXE stage Forwarding between: –EXE/MEM pipeline register to ALUinput for instructions i and i+1 –MEM/WB pipeline register to ALUinput for instructions i and i+2 Note that if the same register has to be forwarded, forward the last one to be written –Forwarding through register file (write 1st half of cycle, read 2nd half of cycle) Need of a “forwarding box” in the Control Unit to check on conditions for forwarding Forwarding between load and store (memory copy)

Review CSE || | | | | | | | | | | ||| |||| ||||| Add R1, R2, R3 R1 available here Sub R4,R1,R2 R 1 needed here ADD R3,R5,R1 OR R6,R1,R2 Add R5,R1,R2|||||| OK w/o forwarding IF ID EXE MEM WB

Review CSE Other data hazards Write After Write (WAW). Can happen in –Pipelines with more than one write stage –More than one functional unit with different latencies (see later) Write After Read (WAR). Very rare –With VAX-like autoincrement addressing modes

Review CSE Forwarding cannot solve all conflicts For example, in a simple MIPS-like pipeline Lw R1, 0(R2)#Result at end of MEM stage SubR4, R1,R2#conflict with R1 AddR3, R5, R1#OK with forwarding OrR6,R1,R2# OK with forwarding

Review CSE || | | | | | | | | | | ||| |||| ||||| LW R1, 0(R2) R1 available here Sub R4,R1,R2 R 1 needed here ADD R3,R5,R1 OR R6,R1,R2 IF ID EXE MEM WB OK No way! OK

Review CSE ||| | | | | | | |||| ||||| LW R1, 0(R2) R1 available here Sub R4,R1,R2 R 1 needed here ADD R3,R5,R1 OR R6,R1,R2 IF ID EXE MEM WB | | ||||| | Insert a bubble

Review CSE Hazard detection unit If a Load (instruction i-1) is followed by instruction i that needs the result of the load, we need to stall the pipeline for one cycle, that is –instruction i-1 should progress normally –instruction i should not progress –no new instruction should be fetched Controlled by a “hazard detection box” within the Control unit; it should operate during the ID stage

Review CSE Inst. mem. 4 PC ALU Data mem. Regs. sese 2 zero IF ID/RR EXE Mem WB IF/ID ID/EX EX/MEM MEM/WB (PC) (Rd) data control Forwarding unit Control unit Stall unit

Review CSE Control Hazards Branches (conditional, unconditional, call-return) Interrupts: asynchronous event (e.g., I/O) –Occurrence of an interrupt checked at every cycle –If an interrupt has been raised, don’t fetch next instruction, flush the pipe, handle the interrupt Exceptions (e.g., arithmetic overflow, page fault etc.) –Program and data dependent (repeatable), hence “synchronous”

Review CSE Exceptions Occur “within” an instruction, for example: –During IF: page fault –During ID: illegal opcode –During EX: division by 0 –During MEM: page fault; protection violation Handling exceptions –A pipeline is restartable if the exception can be handled and the program restarted w/o affecting execution

Review CSE Precise exceptions If exception at instruction i then –Instructions i-1, i-2 etc complete normally (flush the pipe) –Instructions i+1, i+2 etc. already in the pipeline will be “no-oped” and will be restarted from scratch after the exception has been handled Handling precise exceptions: Basic idea –Force a trap instruction on the next IF –Turn off writes for all instructions i and following –When the target of the trap instruction receives control, it saves the PC of the instruction having the exception –After the exception has been handled, an instruction “return from trap” will restore the PC.

Review CSE Precise exceptions (cont’d) Relatively simple for integer pipeline –All current machines have precise exceptions for integer and load- store operations (page fault) More complex for f-p –Might be more lenient in treating exceptions –Special f-p formats for overflow and underflow etc.

Review CSE Integer pipeline (RISC) precise exceptions Recall that exceptions can occur in all stages but WB Exceptions must be treated in instruction order –Instruction i starts at time t –Exception in MEM stage at time t + 3 (treat it first) –Instruction i + 1 starts at time t +1 –Exception in IF stage at time t + 1 (occurs earlier but treat in 2nd)

Review CSE Treating exceptions in order Use pipeline registers –Status vector of possible exceptions carried on with the instruction. –Once an exception is posted, no writing (no change of state; easy in integer pipeline -- just prevent store in memory) –When an instruction leaves MEM stage, check for exception.