Give qualifications of instructors: DAP

Slides:



Advertisements
Similar presentations
EEM 486 EEM 486: Computer Architecture Lecture 4 Designing a Multicycle Processor.
Advertisements

CS152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control March 8 th, 2004 John Kubiatowicz (
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Instruction-Level Parallelism (ILP)
ECE 232 L22.Pipeline3.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 22 Pipelining,
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
CPE 731 Advanced Computer Architecture Pipelining Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
CMPE 421 Parallel Computer Architecture
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
By Wannarat Computer System Design Lecture 8 Wannarat Suntiamorntut.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Computer Organization
Exceptions Another form of control hazard Could be caused by…
Instruction Level Parallelism
Note how everything goes left to right, except …
Morgan Kaufmann Publishers
Number Systems Give qualifications of instructors:
5 Steps of MIPS Datapath Figure A.2, Page A-8
Single Clock Datapath With Control
Appendix C Pipeline implementation
The Story so far: Instruction Set Architectures Performance issues
ECE232: Hardware Organization and Design
\course\cpeg323-08F\Topic6b-323
ECE232: Hardware Organization and Design
CpE 442 Designing a Pipeline Processor (lect. II)
Pipelining: Advanced ILP
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Pipelining review.
Single-cycle datapath, slightly rearranged
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS 704 Advanced Computer Architecture
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
CS152 – Computer Architecture and Engineering Lecture 11 –
Pipelining in more detail
Adapted from Prof. D. Patterson’s class notes Copyright 1998, 2000 UCB
\course\cpeg323-05F\Topic6b-323
Lecture 5. MIPS Processor Design
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
ساختمان داده ها لیستهای پیوندی
Control unit extension for data hazards
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Instruction Execution Cycle
Overview What are pipeline hazards? Types of hazards
Pipeline control unit (highly abstracted)
Key to pipelining: smooth flow Hazards limit performance
CS152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to Advanced Pipelining.
Pipeline Control unit (highly abstracted)
Control unit extension for data hazards
NAND and XOR Implementation
Introduction to Computer Organization and Architecture
Control unit extension for data hazards
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
Circuit Analysis Procedure by Dr. M
Spring 2010 Ilam University
©2003 Craig Zilles (derived from slides by Howard Huang)
Pipelined datapath and control
CS 151 Digital Systems Design Lecture 1 Course Overview
Presentation transcript:

ECE 232 Hardware Organization and Design Lecture Pipelining Advanced issues Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html

Interrupts, traps, faults MIPS clocking Software pipelining Outline Interrupts, traps, faults MIPS clocking Software pipelining Loop unrolling Historical perspective credential: bring a computer die photo wafer : This can be an hidden slide. I just want to use this to do my own planning. I have rearranged Culler’s lecture slides slightly and add more slides. This covers everything he covers in his first lecture (and more) but may We will save the fun part, “ Levels of Organization,” at the end (so student can stay awake): I will show the internal stricture of the SS10/20. Notes to Patterson: You may want to edit the slides in your section or add extra slides to taylor your needs.

The Big Picture: Where are We Now? The Five Classic Components of a Computer Today’s Topics: Interrupts in pipeline processor Advanced issues Control Datapath Memory Processor Input Output So where are in in the overall scheme of things. Well, we just finished designing the processor’s datapath. Now I am going to show you how to design the control for the datapath. +1 = 7 min. (X:47) Pipelined datapath

Recap: Pipelined Datapath with Data Stationary Control npc I mem Regs B alu S D mem m IAU PC lw $2,20($5) A im op rw n Operand Register Selects ALU Op PC <= PC + 4 + immed MEM Op Result Reg Select and Enable

Details of “Data Stationary Control” The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/Dec Exec Mem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control Wr The main control here is identical to the one in the single cycle processor. It generate all the control signals necessary for a given instruction during that instruction’s Reg/Decode stage. All these control signals will be saved in the ID/Exec pipeline register at the end of the Reg/Decode cycle. The control signals for the Exec stage (ALUSrc, ... etc.) come from the output of the ID/Exec register. That is they are delayed ONE cycle from the cycle they are generated. The rest of the control signals that are not used during the Exec stage is passed down the pipeline and saved in the Exec/Mem register. The control signals for the Mem stage (MemWr, Branch) come from the output of the Exec/Mem register. That is they are delayed two cycles from the cycle they are generated. Finally, the control signals for the Wr stage (MemtoReg & RegWr) come from the output of the Exec/Wr register: they are delayed three cycles from the cycle they are generated. +2 = 45 min. (Y:45)

Pipeline Hazards Again I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control hazard Data hazards IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RS WAR Data Hazard (write after read) IF DCD EX Mem WB

Detect and resolve remaining ones Data Hazards Avoid some “by design” eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) Detect and resolve remaining ones stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard IF DCD OF Ex RS IF DCD EX Mem WB

Hazard Detection Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. A RAW hazard exists on register r if r Î Rregs( i ) Ç Wregs( j ) Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. A WAW hazard exists on register r if r Î Wregs( i ) Ç Wregs( j ) A WAR hazard exists on register r if r Î Wregs( i ) Ç Rregs( j )

Record of Pending Writes npc I mem Regs B alu S D mem m IAU PC A im op rw n op rw rs rt Current operand registers Pending writes hazard <= ((rs == rwex) & regWex) OR ((rs == rwmem) & regWme) OR ((rs == rwwb) & regWwb) OR ((rt == rwex) & regWex) OR ((rt == rwmem) & regWme) OR ((rt == rwwb) & regWwb)

Resolve RAW by forwarding npc I mem Regs B alu S D mem m IAU PC A im op rw n op rw rs rt Forward mux Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing

What about memory operations? op Rd Ra Rb Rd to reg file R T ° If instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! ° What does delaying WB on arithmetic operations cost? – cycles ? – hardware ? ° What about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 => "Delayed Loads"

Compiler Avoiding Load Stalls:

What about Interrupts, Traps, Faults? External Interrupts: Allow pipeline to drain, Load PC with interupt address Faults (within instruction, restartable) Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state Refer to MIPS solution

Exception Handling npc I mem Regs alu D mem m IAU PC im op rw n B alu S D mem m IAU PC lw $2,20($5) A im op rw n detect bad instruction address detect bad instruction detect overflow detect bad data address Allow exception to take effect

Exception Problem Exceptions/Interrupts: 5 instructions executing in 5 stage pipeline How to stop the pipeline? Restart? Who caused the interrupt? Stage Problem interrupts occurring IF Page fault on instruction fetch; misaligned memory access; memory-protection violation ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory access; memory-protection violation; memory error Load with data page fault, Add with instruction page fault? Solution 1: interrupt vector/instruction , check last stage Solution 2: interrupt ASAP, restart everything incomplete

Resolution: Freeze above & Bubble Below IAU npc I mem freeze Regs op rw rs rt PC bubble im n op rw B A alu n op rw S D mem m n op rw Regs

FYI: MIPS R3000 clocking discipline phi1 phi2 2-phase non-overlapping clocks Pipeline stage is two (level sensitive) latches Edge-triggered phi1 phi2

MIPS R3000 Instruction Pipeline Inst Fetch Decode Reg. Read ALU / E.A Memory Write Reg TLB I-Cache RF Operation WB E.A. TLB D-Cache TLB I-cache RF ALUALU D-Cache WB Resource Usage Write in phase 1, read in phase 2 => eliminates bypass from WB

Recall: Data Hazard on r1 Time (clock cycles) IF ID/RF EX MEM WB I n s t r. O r d e add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 ALU Im Reg Dm With MIPS R3000 pipeline, no need to forward from WB stage

MIPS R3000 Multicycle Operations B op Rd Ra Rb mul Rd Ra Rb Rd to reg file R T Ex: Multiply, Divide, Cache Miss Stall all stages above multicycle operation in the pipeline Drain (bubble) stages below it Use control word of local stage state to step through multicycle operation

Issues in Pipelined design Limitation ° Pipelining IF D Ex M W IF D Ex M W IF D Ex M W Issue rate, FU stalls, FU depth ° Super-pipeline IF D Ex M W - Issue one instruction per (fast) cycle - ALU takes multiple cycles IF D Ex M W IF D Ex M W Clock skew, FU stalls, FU depth IF D Ex M W IF D Ex M W ° Super-scalar IF D Ex M W Hazard resolution - Issue multiple scalar IF D Ex M W IF D Ex M W instructions per cycle IF D Ex M W ° VLIW (“EPIC”) - Each instruction specifies Packing IF D Ex M W multiple scalar operations - Compiler determines parallelism Ex M W Ex M W Ex M W ° Vector operations IF D Ex M W Applicability - Each instruction specifies Ex M W Ex M W series of identical operations Ex M W

Historical Perspective Today early 90's RISC Superscalars 80's RISC pipelines 80ns, vector proc. (mips,sparc,...) 2Kb Ctrl. St Cache 4x16b bus (ibm 360/85, ...) 960ns mem Load/Store ISA Dynamic Inst. 32KB cache (cdc 6600,7600, Scheduling with 60-160ns Cray-1, . . .) extensive pipelining (ibm 360/91) 1966 25x basic model Virtual Memory (multics, ge-645, 1967 ibm 360/67, ...) TLB 60ns Inst. Pipelining hardwired Inst. Buffering 8x16b bus (Stretch 780ns mem Microprogramming - 100x ibm704 1961

Technology Perspective 4 bit 8 bit 16 bit 32 bit 64 bit Superscalar

Partitioned Instruction Issue (simple Superscalar) Independent Int and FP issue to separate pipelines I-Cache Int Reg Inst Issue and Bypass FP Reg Operand / Result Busses Int Unit Load / Store Unit FP Add FP Mul D-Cache Single Issue Total Time = Int Time + FP Time Max Speedup: Total Time MAX(Int Time, FP Time)

Example: DAXPY Basic Loop: Cycles Assumptions load Ra <- Ai 1 load Ry <- Yi 1 fmult Rm <- Ra*Rx 1+ 6 6 cycle mult, 3 stage fadd Rs <- Rm+Ry 1 + 4 4 cycle add, 2 stage store Ai <- Rs 1 inc Yi 1 dec i 1 inc Ai 1 branch 1 Total Single Issue Cycles: 19 ( 7 integer, 12 floating point) Minimum with Dual Issue: 12 Potential Speedup: 1.6 !!! Actual Cycles: 18

Unrolling

Software Pipelining

Multiple Pipes/ Harder Superscalar Issues: Reg. File ports Detecting Data Dependences Bypassing RAW Hazard WAR Hazard Multiple load/store ops? Branches Register File A B R T D$ IR0 IR1

Branch penalties in superscalar Example: resolved in op-fetch stage, single exposed delay (ala MIPS, Sparc) I-fetch Branch delay Squash 2 I-fetch Branch delay Squash 1

Summary Pipelines pass control information down the pipe just as data moves down pipe Forwarding/Stalls handled by local control Exceptions stop the pipeline MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) More performance from deeper pipelines, parallelism