Give qualifications of instructors: DAP

ECE 232 Hardware Organization and Design Lecture Pipelining Advanced issues
Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall Maciej Ciesielski

Interrupts, traps, faults MIPS clocking Software pipelining
Outline Interrupts, traps, faults MIPS clocking Software pipelining Loop unrolling Historical perspective credential: bring a computer die photo wafer : This can be an hidden slide. I just want to use this to do my own planning. I have rearranged Culler’s lecture slides slightly and add more slides. This covers everything he covers in his first lecture (and more) but may We will save the fun part, “ Levels of Organization,” at the end (so student can stay awake): I will show the internal stricture of the SS10/20. Notes to Patterson: You may want to edit the slides in your section or add extra slides to taylor your needs.

The Big Picture: Where are We Now?
The Five Classic Components of a Computer Today’s Topics: Interrupts in pipeline processor Advanced issues Control Datapath Memory Processor Input Output So where are in in the overall scheme of things. Well, we just finished designing the processor’s datapath. Now I am going to show you how to design the control for the datapath. +1 = 7 min. (X:47) Pipelined datapath

Recap: Pipelined Datapath with Data Stationary Control
npc I mem Regs B alu S D mem m IAU PC lw $2,20($5) A im op rw n Operand Register Selects ALU Op PC <= PC immed MEM Op Result Reg Select and Enable

Details of “Data Stationary Control”
The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/Dec Exec Mem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control Wr The main control here is identical to the one in the single cycle processor. It generate all the control signals necessary for a given instruction during that instruction’s Reg/Decode stage. All these control signals will be saved in the ID/Exec pipeline register at the end of the Reg/Decode cycle. The control signals for the Exec stage (ALUSrc, ... etc.) come from the output of the ID/Exec register. That is they are delayed ONE cycle from the cycle they are generated. The rest of the control signals that are not used during the Exec stage is passed down the pipeline and saved in the Exec/Mem register. The control signals for the Mem stage (MemWr, Branch) come from the output of the Exec/Mem register. That is they are delayed two cycles from the cycle they are generated. Finally, the control signals for the Wr stage (MemtoReg & RegWr) come from the output of the Exec/Wr register: they are delayed three cycles from the cycle they are generated. +2 = 45 min. (Y:45)

Pipeline Hazards Again
I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control hazard Data hazards IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RS WAR Data Hazard (write after read) IF DCD EX Mem WB

Detect and resolve remaining ones
Data Hazards Avoid some “by design” eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) Detect and resolve remaining ones stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard IF DCD OF Ex RS IF DCD EX Mem WB

Hazard Detection Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. A RAW hazard exists on register r if r Î Rregs( i ) Ç Wregs( j ) Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. A WAW hazard exists on register r if r Î Wregs( i ) Ç Wregs( j ) A WAR hazard exists on register r if r Î Wregs( i ) Ç Rregs( j )

Record of Pending Writes
npc I mem Regs B alu S D mem m IAU PC A im op rw n op rw rs rt Current operand registers Pending writes hazard <= ((rs == rwex) & regWex) OR ((rs == rwmem) & regWme) OR ((rs == rwwb) & regWwb) OR ((rt == rwex) & regWex) OR ((rt == rwmem) & regWme) OR ((rt == rwwb) & regWwb)

Resolve RAW by forwarding
npc I mem Regs B alu S D mem m IAU PC A im op rw n op rw rs rt Forward mux Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing

What about memory operations?
op Rd Ra Rb Rd to reg file R T ° If instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! ° What does delaying WB on arithmetic operations cost? – cycles ? – hardware ? ° What about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 => "Delayed Loads"

Compiler Avoiding Load Stalls:

What about Interrupts, Traps, Faults?
External Interrupts: Allow pipeline to drain, Load PC with interupt address Faults (within instruction, restartable) Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state Refer to MIPS solution

Exception Handling npc I mem Regs alu D mem m IAU PC im op rw n
B alu S D mem m IAU PC lw $2,20($5) A im op rw n detect bad instruction address detect bad instruction detect overflow detect bad data address Allow exception to take effect

Exception Problem Exceptions/Interrupts: 5 instructions executing in 5 stage pipeline How to stop the pipeline? Restart? Who caused the interrupt? Stage Problem interrupts occurring IF Page fault on instruction fetch; misaligned memory access; memory-protection violation ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory access; memory-protection violation; memory error Load with data page fault, Add with instruction page fault? Solution 1: interrupt vector/instruction , check last stage Solution 2: interrupt ASAP, restart everything incomplete

Resolution: Freeze above & Bubble Below
IAU npc I mem freeze Regs op rw rs rt PC bubble im n op rw B A alu n op rw S D mem m n op rw Regs

FYI: MIPS R3000 clocking discipline
phi1 phi2 2-phase non-overlapping clocks Pipeline stage is two (level sensitive) latches Edge-triggered phi1 phi2

MIPS R3000 Instruction Pipeline
Inst Fetch Decode Reg. Read ALU / E.A Memory Write Reg TLB I-Cache RF Operation WB E.A TLB D-Cache TLB I-cache RF ALUALU D-Cache WB Resource Usage Write in phase 1, read in phase 2 => eliminates bypass from WB

Recall: Data Hazard on r1
Time (clock cycles) IF ID/RF EX MEM WB I n s t r. O r d e add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 ALU Im Reg Dm With MIPS R3000 pipeline, no need to forward from WB stage

MIPS R3000 Multicycle Operations
B op Rd Ra Rb mul Rd Ra Rb Rd to reg file R T Ex: Multiply, Divide, Cache Miss Stall all stages above multicycle operation in the pipeline Drain (bubble) stages below it Use control word of local stage state to step through multicycle operation

Issues in Pipelined design
Limitation ° Pipelining IF D Ex M W IF D Ex M W IF D Ex M W Issue rate, FU stalls, FU depth ° Super-pipeline IF D Ex M W - Issue one instruction per (fast) cycle - ALU takes multiple cycles IF D Ex M W IF D Ex M W Clock skew, FU stalls, FU depth IF D Ex M W IF D Ex M W ° Super-scalar IF D Ex M W Hazard resolution - Issue multiple scalar IF D Ex M W IF D Ex M W instructions per cycle IF D Ex M W ° VLIW (“EPIC”) - Each instruction specifies Packing IF D Ex M W multiple scalar operations - Compiler determines parallelism Ex M W Ex M W Ex M W ° Vector operations IF D Ex M W Applicability - Each instruction specifies Ex M W Ex M W series of identical operations Ex M W

Historical Perspective
Today early 90's RISC Superscalars 80's RISC pipelines 80ns, vector proc. (mips,sparc,...) 2Kb Ctrl. St Cache 4x16b bus (ibm 360/85, ...) 960ns mem Load/Store ISA Dynamic Inst. 32KB cache (cdc 6600,7600, Scheduling with 60-160ns Cray-1, . . .) extensive pipelining (ibm 360/91) 1966 25x basic model Virtual Memory (multics, ge-645, 1967 ibm 360/67, ...) TLB 60ns Inst. Pipelining hardwired Inst. Buffering 8x16b bus (Stretch 780ns mem Microprogramming - 100x ibm704 1961

Technology Perspective
4 bit 8 bit 16 bit 32 bit 64 bit Superscalar

Partitioned Instruction Issue (simple Superscalar)
Independent Int and FP issue to separate pipelines I-Cache Int Reg Inst Issue and Bypass FP Reg Operand / Result Busses Int Unit Load / Store Unit FP Add FP Mul D-Cache Single Issue Total Time = Int Time + FP Time Max Speedup: Total Time MAX(Int Time, FP Time)

Example: DAXPY Basic Loop: Cycles Assumptions load Ra <- Ai 1
load Ry <- Yi 1 fmult Rm <- Ra*Rx cycle mult, 3 stage fadd Rs <- Rm+Ry cycle add, 2 stage store Ai <- Rs 1 inc Yi 1 dec i 1 inc Ai 1 branch 1 Total Single Issue Cycles: 19 ( 7 integer, 12 floating point) Minimum with Dual Issue: 12 Potential Speedup: !!! Actual Cycles: 18

Unrolling

Software Pipelining

Multiple Pipes/ Harder Superscalar
Issues: Reg. File ports Detecting Data Dependences Bypassing RAW Hazard WAR Hazard Multiple load/store ops? Branches Register File A B R T D$ IR0 IR1

Branch penalties in superscalar
Example: resolved in op-fetch stage, single exposed delay (ala MIPS, Sparc) I-fetch Branch delay Squash 2 I-fetch Branch delay Squash 1

Summary Pipelines pass control information down the pipe just as data moves down pipe Forwarding/Stalls handled by local control Exceptions stop the pipeline MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) More performance from deeper pipelines, parallelism

Give qualifications of instructors: DAP

Similar presentations

Presentation on theme: "Give qualifications of instructors: DAP"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Give qualifications of instructors: DAP

Similar presentations

Presentation on theme: "Give qualifications of instructors: DAP"— Presentation transcript:

Similar presentations

About project

Feedback