Review: Instruction Set Evolution

Slides:

Advertisements

Similar presentations

PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,

Advertisements

OMSE 510: Computing Foundations 4: The CPU!

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Chapter 8. Pipelining.

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

ENGS 116 Lecture 41 Instruction Set Design Part II Introduction to Pipelining Vincent H. Berk September 28, 2005 Reading for today: Chapter 2.1 – 2.12,

Computer Architecture

CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)

1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.

Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)

9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.

CS1104: Computer Organisation School of Computing National University of Singapore.

Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.

CPE 731 Advanced Computer Architecture Pipelining Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,

Integrated Circuits Costs

B 0000 Pipelining ENGR xD52 Eric VanWyk Fall

EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.

Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.

Analogy: Gotta Do Laundry

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

Pipelining Example Laundry Example: Three Stages

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.

CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.

CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.

HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.

Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.

Lecture 18: Pipelining I.

Pipelines An overview of pipelining

Lecture 15: Pipelining: Branching & Complications

Pipelining: Hazards Ver. Jan 14, 2014

CS 3853 Computer Architecture Lecture 3 – Performance + Pipelining

CMSC 611: Advanced Computer Architecture

5 Steps of MIPS Datapath Figure A.2, Page A-8

Pipeline Implementation (4.6)

ECE232: Hardware Organization and Design

Appendix A - Pipelining

Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from

CPE 631 Lecture 03: Review: Pipelining, Memory Hierarchy

Chapter 4 The Processor Part 2

CMSC 611: Advanced Computer Architecture

Appendix A - Pipelining

Lecturer: Alan Christopher

Serial versus Pipelined Execution

An Introduction to pipelining

Chapter 8. Pipelining.

Electrical and Computer Engineering

Pipelining Appendix A and Chapter 3.

Throughput = #instructions per unit time (seconds/cycles etc.)

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

Computer Architecture

Presentation transcript:

Review: Instruction Set Evolution Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language Based Concept of a Family (B5000 1963) (IBM 360 1964) General Purpose Register Machines Complex Instruction Sets Load/Store Architecture (CDC 6600, Cray 1 1963-76) (Vax, Intel 432 1977-80) RISC (Mips, Sparc, HP PA, PowerPC, . . .1987-91)

Review: 80x86 v. DLX Instruction Counts SPEC pgm x86 DLX DLX÷86 gcc 3,771,327,742 3,892,063,460 1.03 espresso 2,216,423,413 2,801,294,286 1.26 spice 15,257,026,309 16,965,928,788 1.11 nasa7 15,603,040,963 6,118,740,321 0.39

Review From Last Time Intel Summary Archeology: history of instruction design in a single product Address size: 16 bit vs. 32-bit Protection: Segmentation vs. paged Temp. storage: accumulator vs. stack vs. registers “Golden Handcuffs” of binary compatibilitys Intel has shown HP/Intel announcement of common future instruction set by 2000 means end of 80x86??? “Beauty is in the eye of the beholder” At 50M/year sold, it is a beautiful business

Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes A B C D

Sequential Laundry 6 PM 7 8 9 10 11 Midnight 30 40 20 30 40 20 30 40 Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e A B C D Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?

Pipelined Laundry Start work ASAP 6 PM 7 8 9 10 11 Midnight Time 30 40 20 T a s k O r d e A B C D Pipelined laundry takes 3.5 hours for 4 loads

Pipelining Lessons 6 PM 7 8 9 30 40 20 A B C D Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup 7 8 9 Time T a s k O r d e 30 40 20 A B C D

5 Steps of DLX Datapath Figure 3.1, Page 130 Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back IR L M D

Pipelined DLX Datapath Figure 3.4, page 137 Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc. Write Back Memory Access Data stationary control local decode for each instruction phase / pipeline stage

Visualizing Pipelining Figure 3.3, Page 133 Time (clock cycles) I n s t r. O r d e

Its Not That Easy for Computers Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of instructions Data hazards: Instruction depends on result of prior instruction still in the pipeline Control hazards: Pipelining of branches & other instructionsCommon solution is to stall the pipeline until the hazardbubbles” in the pipeline

One Memory Port/Structural Hazards Figure 3.6, Page 142 Time (clock cycles) Load I n s t r. O r d e Instr 1 Instr 2 Instr 3 Instr 4

One Memory Port/Structural Hazards Figure 3.7, Page 143 Time (clock cycles) Load I n s t r. O r d e Instr 1 Instr 2 stall Instr 3

Speed Up Equation for Pipelining Speedup from pipelining = Ave Instr Time unpipelined Ave Instr Time pipelined = CPIunpipelined x Clock Cycleunpipelined CPIpipelined x Clock Cyclepipelined = CPIunpipelined Clock Cycleunpipelined CPIpipelined Clock Cyclepipelined Ideal CPI = CPIunpipelined/Pipeline depth Speedup = Ideal CPI x Pipeline depth Clock Cycleunpipelined CPIpipelined Clock Cyclepipelined x x

Speed Up Equation for Pipelining CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instr Speedup = Ideal CPI x Pipeline depth Clock Cycleunpipelined Ideal CPI + Pipeline stall CPI Clock Cyclepipelined Speedup = Pipeline depth Clock Cycleunpipelined 1 + Pipeline stall CPI Clock Cyclepipelined x x

Example: Dual-port vs. Single-port Machine A: Dual ported memory Machine B: Single ported memory, but its pipelined implementation has a 1.05 times faster clock rate Ideal CPI = 1 for both Loads are 40% of instructions executed SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe) = Pipeline Depth SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33 Machine A is 1.33 times faster

Data Hazard on R1 Figure 3.9, page 147 Time (clock cycles) IF ID/RF EX MEM WB I n s t r. O r d e add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11

Three Generic Data Hazards InstrI followed by InstrJ Read After Write (RAW) InstrJ tries to read operand before InstrI writes it

Three Generic Data Hazards InstrI followed by InstrJ Write After Read (WAR) InstrJ tries to write operand before InstrI reads it Can’t happen in DLX 5 stage pipeline because: All instructions take 5 stages, Reads are always in stage 2, and Writes are always in stage 5

Three Generic Data Hazards InstrI followed by InstrJ Write After Write (WAW) InstrJ tries to write operand before InstrI writes it Leaves wrong result ( InstrI not InstrJ) Can’t happen in DLX 5 stage pipeline because: All instructions take 5 stages, and Writes are always in stage 5 Will see WAR and WAW in later more complicated pipes

Forwarding to Avoid Data Hazard Time (clock cycles) I n s t r. O r d e add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11

HW Change for Forwarding

Data Hazard Even with Forwarding Figure 3.12, Page 153 Time (clock cycles) lw r1, 0(r2) I n s t r. O r d e sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9

Data Hazard Even with Forwarding Figure 3.13, Page 154 Time (clock cycles) I n s t r. O r d e lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9

Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd

Compiler Avoiding Load Stalls

Pipelining Summary Just overlap tasks, and easy if tasks are independent Speed Up Š Pipeline Depth; if ideal CPI is 1, then: Hazards limit performance on computers: Structural: need more HW resources Data: need forwarding, compiler scheduling Control: discuss next time Pipeline Depth Clock Cycle Unpipelined Speedup = X 1 + Pipeline stall CPI Clock Cycle Pipelined