Download presentation
Presentation is loading. Please wait.
Published byIris Cole Modified over 9 years ago
1
Computer Architecture Lec 1: Introduction Dr. Eng. Amr T. Abdel-Hamid CSEN 601 Spring 2011 Computer Architecture Text book slides: Computer Architec ture: A Quantitative Approach 4th E dition, John L. Hennessy & David A. Patterso with modifications.
2
Dr. Amr Talaat Elect 707 Computer Architecture
3
Dr. Amr Talaat Elect 707 Computer Architecture CPU History in a Flash Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm 2 chip Processor is the new transistor? RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm 2 chip 125 mm 2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+Dcache – RISC II shrinks to ~ 0.02 mm 2 at 65 nm – Caches via DRAM or 1 transistor SRAM
4
Dr. Amr Talaat Elect 707 Computer Architecture Instruction Set Architecture: Critical Interface instruction set software hardware Properties of a good abstraction Lasts through many generations (portability) Used in many different ways (generality) Provides convenient functionality to higher levels Permits an efficient implementation at lower levels
5
Dr. Amr Talaat Elect 707 Computer Architecture ISA vs. Computer Architecture Old definition of computer architecture = instruction set design Other aspects of computer design called implementation Insinuates implementation is uninteresting or less challengi ng Our view is: computer architecture >> ISA Architect’s job much more than instruction set design; te chnical hurdles today more challenging than those in ins truction set design
6
Dr. Amr Talaat Elect 707 Computer Architecture Computer Architecture is Design and Analysis Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Creativity Good Ideas Mediocre Ideas Bad Ideas Cost / Performance Analysis
7
Dr. Amr Talaat Elect 707 Computer Architecture Administrivia
8
Dr. Amr Talaat Elect 707 Computer Architecture Course Focus Understanding the design techniques, machine structu res, technology factors, evaluation methods that will determine the form of computers in 21st Century Technology Programming Languages Operating Systems History Applications Interface Design (ISA) Measurement & Eval uation Parallelism Computer Architecture: Organization Hardware/Software Boundary Compilers
9
Dr. Amr Talaat Elect 707 Computer Architecture Why to study Computer Architecture? Culture of anticipating and exploiting advances in techn ology Careful, quantitative comparisons Define, quantity, and summarize relative performance Define and quantity relative cost Define and quantity dependability Define and quantity power Culture of well-defined interfaces that are carefully impl emented and thoroughly checked Quantitative Principles of Design 1. Take Advantage of Parallelism 2. Principle of Locality 3. Focus on the Common Case 4. Amdahl’s Law 5. The Processor Performance Equation
10
Dr. Amr Talaat Elect 707 Computer Architecture 1) Taking Advantage of Parallelism Increasing throughput of server computer via multiple processors or multiple disks Detailed HW design (DSD course shortly) Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic in number of bits per operand Multiple memory banks searched in parallel in set-associative ca ches Pipelining: overlap instruction execution to reduce the total time to c omplete an instruction sequence. Not every instruction depends on immediate predecessor exe cuting instructions completely/partially in parallel possible Classic 5-stage pipeline: 1) Instruction Fetch (Ifetch), 2) Register Read (Reg), 3) Execute (ALU), 4) Data Memory Access (Dmem), 5) Register Write (Reg)
11
Dr. Amr Talaat Elect 707 Computer Architecture 2) The Principle of Locality The Principle of Locality: Program access a relatively small portion of the address spa ce at any instant of time. Two Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality (Locality in Space): If an item is referenced, ite ms whose addresses are close by tend to be referenced soo n (e.g., straight-line code, array access) Last 30 years, HW relied on locality for memory perf. P MEM $
12
Dr. Amr Talaat Elect 707 Computer Architecture Levels of the Memory Hierarchy CPU Registers 100s Bytes 300 – 500 ps (0.3-0.5 ns) L1 and L2 Cache 10s-100s K Bytes ~1 ns - ~10 ns $1000s/ GByte Main Memory G Bytes 80ns- 200ns ~ $100/ GByte Disk 10s T Bytes, 10 ms (10,000,000 ns) ~ $1 / GByte Capacity Access Time Cost Tape infinite sec-min ~$1 / GByte Registers L1 Cache Memory Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler 1-8 bytes cache cntl 32-64 bytes OS 4K-8K bytes user/operator Mbytes Upper Level Lower Level faster Larger L2 Cache cache cntl 64-128 bytes Blocks
13
Dr. Amr Talaat Elect 707 Computer Architecture 3) Focus on the Common Case Common sense guides computer design Since its engineering, common sense is valuable In making a design trade-off, favor the frequent case o ver the infrequent case E.g., Instruction fetch and decode unit used more frequen tly than multiplier, so optimize it 1st E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimiz e it 1st Frequent case is often simpler and can be done faster t han the infrequent case E.g., overflow is rare when adding 2 numbers, so improve performance by optimizing more common case of no over flow May slow down overflow, but overall performance improve d by optimizing for the normal case What is frequent case and how much performance impr oved by making case faster => Amdahl’s Law
14
Dr. Amr Talaat Elect 707 Computer Architecture 4) Amdahl’s Law Best you could ever hope to do:
15
Dr. Amr Talaat Elect 707 Computer Architecture Amdahl’s Law example New CPU 10X faster I/O bound server, so 60% time waiting for I/O Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster
16
Dr. Amr Talaat Elect 707 Computer Architecture 5) Processor performance equation CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X inst count CPI Cycle time
17
Dr. Amr Talaat Elect 707 Computer Architecture 5 Steps of MIPS Datapath Figure A.2, Page A-8 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc LMDLMD ALU MUX Memory Reg File MUX Data Memory MUX Sign Extend 4 Adder Zero? Next SEQ PC Address Next PC WB Data Inst RD RS1 RS2 Imm
18
Dr. Amr Talaat Elect 707 Computer Architecture 5 Steps of MIPS Datapath Figure A.3, Page A-9 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc ALU Memory Reg File MUX Data Memory MUX Sign Extend Zero? IF/ID ID/EX MEM/WB EX/MEM 4 Adder Next SEQ PC RD WB Data Next PC Address RS1 RS2 Imm MUX
19
Dr. Amr Talaat Elect 707 Computer Architecture 5 Steps of MIPS Datapath Figure A.3, Page A-9 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc ALU Memory Reg File MUX Data Memory MUX Sign Extend Zero? IF/ID ID/EX MEM/WB EX/MEM 4 Adder Next SEQ PC RD WB Data Data stationary control – local decode for each instruction phase / pipeline stage Next PC Address RS1 RS2 Imm MUX
20
Dr. Amr Talaat Elect 707 Computer Architecture Visualizing Pipelining Figure A.2, Page A-8 I n s t r. O r d e r Time (clock cycles) Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5
21
Dr. Amr Talaat Elect 707 Computer Architecture Pipelining is not quite that easy! Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of ins tructions (single person to fold and put clothes away) Data hazards: Instruction depends on result of prior instructio n still in the pipeline (missing sock) Control hazards: Caused by delay between the fetching of ins tructions and decisions about changes in control flow (branch es and jumps).
22
Dr. Amr Talaat Elect 707 Computer Architecture One Memory Port/Structural Hazards Figure A.4, Page A-14 I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMem Ifetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg
23
Dr. Amr Talaat Elect 707 Computer Architecture One Memory Port/Structural Hazards (Similar to Figure A.5, Page A-15) I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Stall Instr 3 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg Bubble How do you “bubble” the pipe?
24
Dr. Amr Talaat Elect 707 Computer Architecture Speed Up Equation for Pipelining For simple RISC pipeline, CPI = 1:
25
Dr. Amr Talaat Elect 707 Computer Architecture Example: Dual-port vs. Single-port Machine A: Dual ported memory (“Harvard Architecture”) Machine B: Single ported memory, but its pipelined implement ation has a 1.05 times faster clock rate Ideal CPI = 1 for both Loads are 40% of instructions executed SpeedUp A = Pipeline Depth/(1 + 0) x (clock unpipe /clock pipe ) = Pipeline Depth SpeedUp B = Pipeline Depth/(1 + 0.4 x 1) x (clock unpipe /(clock unpipe / 1.0 5) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUp A / SpeedUp B = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33 Machine A is 1.33 times faster
26
Dr. Amr Talaat Elect 707 Computer Architecture I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Data Hazard on R1 Figure A.6, Page A-17 Time (clock cycles) IFID/RF EX MEM WB
27
Dr. Amr Talaat Elect 707 Computer Architecture Read After Write (RAW) Instr J tries to read operand before Instr I writes it Caused by a “Dependence” (in compiler nomenclature). Thi s hazard results from an actual need for communication. Three Generic Data Hazards I: add r1,r2,r3 J: sub r4,r1,r3
28
Dr. Amr Talaat Elect 707 Computer Architecture Write After Read (WAR) Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. Can’t happen in MIPS 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5 I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards
29
Dr. Amr Talaat Elect 707 Computer Architecture Three Generic Data Hazards Write After Write (WAW) Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. Can’t happen in MIPS 5 stage pipeline because: All instructions take 5 stages, and Writes are always in stage 5 Will see WAR and WAW in more complicated pipes I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7
30
Dr. Amr Talaat Elect 707 Computer Architecture Time (clock cycles) Forwarding to Avoid Data Hazard Figure A.7, Page A-19 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg
31
Dr. Amr Talaat Elect 707 Computer Architecture HW Change for Forwarding Figure A.23, Page A-37 MEM/WR ID/EX EX/MEM Data Memory ALU mux Registers NextPC Immediate mux What circuit detects and resolves this hazard?
32
Dr. Amr Talaat Elect 707 Computer Architecture 32 Time (clock cycles) Forwarding to Avoid LW-SW Data Hazard Figure A.8, Page A-20 I n s t r. O r d e r add r1,r2,r3 lw r4, 0(r1) sw r4,12(r1) or r8,r6,r9 xor r10,r9,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg
33
Dr. Amr Talaat Elect 707 Computer Architecture Time (clock cycles) I n s t r. O r d e r lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9 Data Hazard Even with Forwarding Figure A.9, Page A-21 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg
34
Dr. Amr Talaat Elect 707 Computer Architecture Data Hazard Even with Forwarding (Similar to Figure A.10, Page A-21) Time (clock cycles) or r8,r1,r9 I n s t r. O r d e r lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 Reg ALU DMemIfetch Reg Ifetch ALU DMem Reg Bubble Ifetch ALU DMem Reg Bubble Reg Ifetch ALU DMem Bubble Reg H ow is this detected?
35
Dr. Amr Talaat Elect 707 Computer Architecture 11/6/2015CS252-s06, Lec 0 2-intro 35 Control Hazard on Branches Three Stage Stall 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg What do you do with the 3 instructions in between? How do you do it? Where is the “commit”?
36
Dr. Amr Talaat Elect 707 Computer Architecture Branch Stall Impact If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9! Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier MIPS branch tests if register = 0 or 0 MIPS Solution: Move Zero test to ID/RF stage Adder to calculate new PC in ID/RF stage 1 clock cycle penalty for branch versus 3
37
Dr. Amr Talaat Elect 707 Computer Architecture Adder IF/ID Pipelined MIPS Datapath Figure A.24, page A-38 Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc ALU Memory Reg File MUX Data Memory MUX Sign Extend Zero? MEM/WB EX/MEM 4 Adder Next S EQ PC RD WB Data Interplay of instruction set design and cycle time. Next PC Address RS1 RS2 Imm MUX ID/EX
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.