Fall 2006 1 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.

Fall 2006 1 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS pipeline & control Pentium 4 architecture

Fall 2006 2 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Pipelining overview Pipelining –Increased performance through parallel operations –Goal: complete several operations at the same time Hazards –Conditions which inhibit parallel operations –Techniques exist to minimize the problem

Fall 2006 3 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering A laundry pipeline To Do laundry: wash, dry, fold, put away Each step takes 30 minutes, but for four students.... Laundry done at 2 AM

Fall 2006 4 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Let’s speed it up (pipeline) Move one load from one step to the next But start the next load before first is complete Takes only until 9:30 PM – party time !! Bucket Brigade

Fall 2006 5 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Speedup –Ratio of serial time to parallel –Metric to compare advantages of parallel operations

Fall 2006 6 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Find the laundry speedup?

Fall 2006 7 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering A computer pipeline Assume –Instructions require multiple clocks to complete –Each instruction follows approximately the same steps (stage) Method –Start initial instruction on first clock –On following clocks start subsequent instructions

Fall 2006 8 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering MIPS instruction steps/stages 1.IF: Fetch instruction from memory 2.ID: Read registers while decoding instruction 3.EX: Execute the operation or calculate an address 4.MEM: Access an operand in data memory 5.WB: Write the result into a register

Fall 2006 9 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering MIPS pipeline IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB  First instruction ends Fifth instruction starts 

Fall 2006 10 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Find the MIPS pipeline speedup? Assume five instructions

Fall 2006 11 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering What about a large program? Series Pipelined Speedup

Fall 2006 12 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Speedup of pipeline with p stages?

Fall 2006 13 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering MIPS pipelined datapath Pipeline registers added to datapath

Fall 2006 14 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Pipelined Control Signals used in later stage determined by IF/ID Save for Ex stage Save for Mem stage Save for WB stage

Fall 2006 15 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Datapath & pipelined control

Fall 2006 16 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Pentium 4 pipeline Twenty stages long Theoretical speedup of 20 Hazards ( forced sequential operations ) reduce speedup –Some instructions executed “out of order” to avoid hazard –Multiple (optimistic) pipelines created, one selected to create result, other data discarded

Fall 2006 17 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Early Pentium 4 Socket 423/478 42 M transistors, 0.18 and 0.13 mm technology 2.0 GHz core frequency, ~60 W Integrated heat spreader, built-in thermal monitor

Fall 2006 18 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering NetBurst Architecture Faster system bus Advanced transfer cache Advanced dynamic execution (execution trace cache, enhanced branch prediction) Hyper pipelined technology Rapid execution engine Enhanced floating point and multi-media (SSE2)

Fall 2006 19 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Architecture Overview

Fall 2006 20 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Front Side Bus

Fall 2006 21 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering FSB Bandwidth Clocked at 100 MHz, quad “pumped” 128 B cache lines, 64-bit (8 B) accesses Split transactions, pipelined External bandwidth: 100M x 8 x 4 = 3.2 GB/s Makes better use of bus bandwidth Clock A Clock B

Fall 2006 22 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering L2 Advanced Transfer Cache

Fall 2006 23 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Full-Speed L2 Cache Depth of 256 KB Eight-way set associative, 128 B line Wide instruction & data interface of 256 bits (32 B) Read latency of 7 clocks, but … Clocked at core frequency (2.0 GHz) Internal bandwidth, 32 x 2.0 G = 64 GB/s Optimizes data transfers to/from memory

Fall 2006 24 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering L1 Data Cache

Fall 2006 25 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering L1 Data Cache Depth of 8 KB Four-way, set associative, 64 B line Read latency of 2 clocks, but …. Dual port for one load & one store-per- clock Supports advanced pre-fetch algorithm

Fall 2006 26 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Dynamic Execution

Fall 2006 27 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Trace Cache & Branch Prediction Replaces traditional L1 instruction cache Trace cache contains ~12K decoded instructions (micro-operations), removes decode latency Improved branch prediction algorithm, eliminates 33% of P3 mis-predictions (pipeline stalls) Keeps correct instructions executing

Fall 2006 28 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Execution Engine

Fall 2006 29 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Hyper Pipelined Technology Execution pipeline contains 20 stages –Out-of-order, speculative execution unit –126 instructions “in flight” –Includes 48 load, 24 stores Rapid execution engine –2 ALUs, 2X clocked (one instruction in ½ clock) –2 AGUs, 2X clocked Results in higher throughput and reduced latency

Fall 2006 30 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Streaming SIMD Extensions FPU and MMX –128-bit format –AGU data movement register SSE2 (extends MMX and SSE) –144 new instructions –DP floating-point –Integer –Cache and memory management Performance increases across broad range of applications

Fall 2006 31 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering

Fall 2006 32 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Find the laundry speedup?

Fall 2006 33 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Find the MIPS pipeline speedup? Assume five instructions

Fall 2006 34 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Speedup of pipeline with p stages? Series Parallel

Fall 2006 1 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.

Similar presentations

Presentation on theme: "Fall 2006 1 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fall 2006 1 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.

Similar presentations

Presentation on theme: "Fall 2006 1 EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS."— Presentation transcript:

Similar presentations

About project

Feedback