ECE-3056-B Quiz-2 Topic Areas John Copeland March 28, 2014.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

ELEN 468 Advanced Logic Design

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Instruction-Level Parallelism (ILP)

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.

The Processor: Datapath & Control

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?

Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Multi-Cycle Processor.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Pipelined Datapath and Control

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 2.

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

Review for Quiz-1 Applied Operating System Concepts Patterson & Hennessy Chap.s 1,2,6,7 ECE3055b, Spring 2005

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

CS161 – Design and Architecture of Computer Systems

CS161 – Design and Architecture of Computer Systems

Computer Organization

Exceptions Another form of control hazard Could be caused by…

Computer Organization CS224

Stalling delays the entire pipeline

CDA 3101 Spring 2016 Introduction to Computer Organization

CSCI206 - Computer Organization & Programming

Morgan Kaufmann Publishers

Performance of Single-cycle Design

ELEN 468 Advanced Logic Design

ECS 154B Computer Architecture II Spring 2009

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining: Advanced ILP

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Single-cycle datapath, slightly rearranged

Pipelining Chapter 6.

CS/COE0447 Computer Organization & Assembly Language

CS/COE0447 Computer Organization & Assembly Language

CS/COE0447 Computer Organization & Assembly Language

Systems Architecture II

CSCI206 - Computer Organization & Programming

Rocky K. C. Chang 6 November 2017

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.2: Building a Datapath with Control

Guest Lecturer TA: Shreyas Chand

Instruction Execution Cycle

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Morgan Kaufmann Publishers The Processor

Introduction to Computer Organization and Architecture

CS/COE0447 Computer Organization & Assembly Language

Pipelined datapath and control

ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.

CS/COE0447 Computer Organization & Assembly Language

Presentation transcript:

ECE-3056-B Quiz-2 Topic Areas John Copeland March 28, 2014

2 05 Single-Cycle Datapath Asynchronous Logic –combinatorial logic Synchronous Logic – latches, registers Data Path – actual determined by control signals and mux'es Instruction Decoding – determines values of control signals 3 Types of Instruction Formats – how used to simplify decoding Opcode( bits 31-26) "funct" (bits (5-0) determines ALU operation PC Register – 3 possible inputs: pc+4, branch, jump Given a diagram like slide 05-28, or VHDL like 05-32; know how to figure which control lines are set (true) for: add, lw, sw, j, jr, beq See 05-33, 34, 35, or 37 Energy use depends on average number of gates that switch per cycle Which uses more: Register Set or ALU (during add). Given VHDL code, or combinatorial logic, for simple ALU, be able to modify to add another operation (e.g., shift, subtract, bit-wise logic, …)

Datapath With Jumps Added 3 Slide 05-37

4 06 Multi-Cycle Datapath Divide Datapath into five phases – Fetch Instr., Decode Instr., Execute (calculate), Memory R/W, Write Back - (IF, ID, EX, MEM, WB) Given a diagram like slide 06-8, 11, 17, 18, 20, 22, 23,24; know how to figure which control lines (ALUOp, ALUSrcA, MemWrite, …) are set (true) for: add, lw, sw, j, jr, beq, … Interpret a state diagram like or – how many clock cycles for br, add, lw. For given MIPS assembler code: how many cycles to execute. State diagram for decoding instruction - ways to implement: ROM Implementation – very inefficient PLA " – Inputs: 5 Opcode bits and 4 state bits (“funct” bits decoded apart) Outputs: 1 for each control line (control signal) Vertical Lines: Minimized by choosing similar bits for closely related functions (06-32). CISC (complex instruction set computer) – uses microcode, more ticks/instr Unexpected change in flow of control: Exception (error) vs. Interrupt (I/O) Two registers just for Exception Handling: cause and status

Pipelining Example Instruction memory Address Add Add result Shift left 2 I n s t r u c t i o n IF/IDEX/MEMMEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1) RegDst 0 1 M u x Instruction [20–16] Instruction [15–11] Pipeline stage execution time Note what is happening in the register file Slide 08a-21 5

Pipelined Control Control signals derived from instruction – As in single-cycle implementation Pass control signals along like data Slide 08a-23

7 07 Performance Determined by: Algorithm & Language, Compiler, (hardware) Architecture Different Metrics, and different numbers for different "bench mark" programs SPEC2005 – performance doing a standard set of bench mark programs "Throughput" – work done per unit time "Response Time" (latency) – time to do a task (e.g., display a Web page) Trading throughput versus latency (for multitasking or parallel processing) Instructions per second = clock rate / average cycles per instruction Programs with heavy I/O or memory R/W's may depend less on CPU speed. Amdahl's Law – know definition, and do calculations like Corollary – "make the common case fast" Increased clock rate  higher temperature and lower energy efficiency,

8 08a Pipelined Datapath Use clocked registers to divide Datapath's five stages – Fetch Instr., Decode Instr., Execute, Memory R/W, Write Back - (IF, ID, EX, MEM, WB) Use more clocked registers to delay control signals as instruction moves through stages (08a-23). Hazards: Structural, Control, Data Structural – need to separate Instruction and Data memory (or caches) Control – guess "to branch" or not – stall when wrong – how? Can branch be determined 2 or 3 instructions ahead - if so ? How many real br instructions are there? only bre (branch if equal) Jump/Br must have next instruction started before (kill if wrong) Data – add logic devices to "forward" data needed by next instruction. Forwarding unit – controls when to forward – from EXE or MEM stage Problem with" Load Word" and Use –requires a" nop" insertion unless order of following instructions can be swapped ("code scheduling"). MIPS can predict branch always taken (wrong when loop ends) More complicated CPU's have "dynamic" branch prediction. MIPS – Exception causes Jump to "Exception Handler” procedure in OS.

9 08b Pipelined – Thread Level Parallelism Flynn's Taxonomy - Single/Multiple Instructions Single/Multiple Data Multithreading – MIMD – parallel threads or LWPs (light-weight processes) Process – running program with state (register values, include PC, pipeline r's). Scheduled by OS –swapped out, restarted later. Threads have own state, stack memory (local data), but share static data Hardware assisted switching between threads – separate hardware registers instructions may be interleaved in pipeline ("Fine-Grain Multithreading") "Course-Grained Multithreading" – swap on long stall (like cache miss) Synchronization between threads – requires hardware "atomic commands" "Test and Set" without switching threads between the Test and the Set. When altering common data, thread must tell OS "start/end Critical Path" Simultaneous Multithreading on multiple cores (data paths). Amdahl's Law – increased multithreading reaches "point of diminishing return” Big time Parallelism – requires special many-core computers, high-level languages, shared and local memory. "Grid" has cores on 2-d grid of buses. Communication (assigning tasks, collecting results) – by passing messages.

Different levels – large, slow, and low cost  small, fast, and high cost. Hard disk (Terabyte) $ 0.25 per Gigabytemillisecond DRAM (Gigabyte) $ per Gigabyteseveral nanoseconds Static Ram (Megabytes) $ 2000 per Gigabytefraction of nanosecond RegistersOn CPU – limited area and power Use multiple levels to get low average cost, high-speed for data being used. Multicore computers need a fast cache memory for each core Data movement takes more energy than computation What to put in cache? Temporal (recent) and Spatial (nearby) How to locate data. Direct Cache: address parts: [ tag ][ cache address or index][ byte location ] Size of cache: = 2^(no. of index bits) x 2^((no. of byte location bits) = number of blocks x block size Each block also has valid bit, dirty bit, tag. To read or write data, look at indexed block, match tag, select byte location. If not found (tag did not match), put contents in write queue if "dirty" before getting new data from memory (all byte-addresses that start with tag + index) "Write Through" immediately update memory, "Write Back" just write cache. 09a Memory 10

IndexVTagData 000N 001N 010Y11Mem[11010] 011N 100N 101N 110Y10Mem[10110] 111N Word addrBinary addrHit/missCache block ? ? ?000 09a-20 09a-21 Previous State of Cache What is new State of Cache? Then This Happens Answer on 09a-21 11