CSCE 513 Computer Architecture

Slides:

Advertisements

Similar presentations

Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.

Advertisements

COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.

ELEN 468 Advanced Logic Design

Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.

Instruction-Level Parallelism (ILP)

CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

Chapter 2 Instruction-Level Parallelism and Its Exploitation

Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.

EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

Appendix A Pipelining: Basic and Intermediate Concepts

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Computer Architecture

What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.

© 2009, Renesas Technology America, Inc., All Rights Reserved 1 Course Introduction  Purpose:  This course provides an overview of the SH-2 32-bit RISC.

Lecture 5: Pipelining Implementation Kai Bu

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:

Lecture 4.5 Pipelines – Control Hazards Topics Control Hazards Branch Prediction Misprediction stalls Readings: Appendix C September 2, 2015 CSCE 513 Computer.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.

IBM System 360. Common architecture for a set of machines

Instruction-Level Parallelism

Exceptions Another form of control hazard Could be caused by…

Computer Organization CS224

IA32 Processors Evolutionary Design

Instruction Level Parallelism

William Stallings Computer Organization and Architecture 8th Edition

Morgan Kaufmann Publishers

ELEN 468 Advanced Logic Design

CS203 – Advanced Computer Architecture

Pipelining Wrapup Brief overview of the rest of chapter 3

Lecture 6 Memory Hierarchy

Appendix C Pipeline implementation

Morgan Kaufmann Publishers The Processor

Lecture 12 Reorder Buffers

Pipelining: Implementation

Lecture 5 Pipelines – Control Hazards

Exceptions & Multi-cycle Operations

Super Quick Architecture Review

Pipelining: Advanced ILP

Lecture 3 Instruction Level Parallelism (Pipelining)

Lecture 6: Advanced Pipelines

Pipelining review.

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Pipelining in more detail

CSC 4250 Computer Architectures

CS 301 Fall 2002 Computer Organization

Lecture 5 Pipelines – Control Hazards

Guest Lecturer TA: Shreyas Chand

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Instruction Execution Cycle

Project Instruction Scheduler Assembler for DLX

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Overview What are pipeline hazards? Types of hazards

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipeline Control unit (highly abstracted)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CMSC 611: Advanced Computer Architecture

Lecture 06: Pipelining Implementation

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

The University of Adelaide, School of Computer Science

Lecture 5: Pipeline Wrap-up, Static ILP

Guest Lecturer: Justin Hsia

Introduction to Computer Systems Engineering

CMSC 611: Advanced Computer Architecture

Presentation transcript:

CSCE 513 Computer Architecture Lecture 4 Pipelines II Topics IEEE754 ISA and frequency counts Pipelining Data Hazards Forwarding Load-Use Hazard Control Hazards Readings: Appendix C September 13, 2017

Overview Last Time New References Review of Single cycle design 5 stage Pipeline Lecture 3 slides 1-20 New Slides 20-51 of Lecture 3 IEEE 754 Floating Point Normal Pipeline Operations – the Ideal World Hazards Data Hazards: RAW, WAR, WAW, forwarding, load-use Control hazards Performance with Stalls References Appendix C

gcc –S matmul.c  matmul.s 232 lines Inner loop C[i][j] = 0.0; for(x=0;x<k;++x){ C[i][j] = C[i][j] + A[i][x] * B[x][j]; } %st - floating point on stack %st(1) – one below it matmul.s 690 lines Part of Inner loop ..... ….. movl 48(%esp), %ecx sall $3, %ecx addl %ecx, %eax fldl (%eax) fmulp %st, %st(1) faddp %st, %st(1) fstpl (%edx) addl $1, 52(%esp)

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows the overlap among the parts of the data path, with clock cycle 5 (CC 5) showing the steady-state situation. Because the register file is used as a source in the ID stage and as a destination in the WB stage, it appears twice. We show that it is read in one part of the stage and written in another by using a solid line, on the right or left, respectively, and a dashed line on the other side. The abbreviation IM is used for instruction memory, DM for data memory, and CC for clock cycle. Copyright © 2011, Elsevier Inc. All rights Reserved.

Appendix A – Instruction Set Architecture(ISA) Memory Addressing Big Endian vs Little Endian alignment Address Modes – fig A.6 (next slide) Frequency of 80x86 Instruction Execution – fig A.7, A.13 Role of Compilers – Optimization Register allocation MIPS review: Appendix A.9 Registers: 32 integer reg. R0-R31(R0=0), float/double regs F0-F31

Address Modes – fig A.6 Register Immediate Displacement Register Indirect Indexed Direct Memory Indirect Autoincrement scaled

Frequency of Address Modes A.7

Frequency of 80x86 Instructions A.13

Compiler Optimizations figure A.20

Figure C.22 Inserting Pipeline Registers into Data Path

Figure C.22 Inserting Pipeline Registers into Data Path Fields in pipeline registers IF/ID.IR  ID/EX.IR  EX/MEM.IR ; Instruction copied ID/EX.A, ID/EX.B EX/MEM.ALUoutput …

Figure C.23

Figure C.21 Examples of Data Hazards No Dependencies with Accesses in order Instruction 1 2 3 4 5 6 7 8 LD R1, 44(R2) DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R4, R7

Examples of Data Hazards Dependency requiring a stall Note Instr1 = Load, … and Instr2 = DADD, DSUB, OR, AND, … rt in Instr1 == rs in Instr2 What type of circuit implements the == ? Instruction 1 2 3 4 5 6 7 8 LD R1, 44(R2) DADD R5, R1, R7 DSUB R8, R6, R7 OR R9, R1, R7

Examples of Data Hazards Dependence Instruction 1 2 3 4 5 6 7 8 LD R1, 44(R2) DADD R5, R6, R7 DSUB R8, R6, R1 OR R9, R4, R7

Figure C.21 Examples of Data Hazards Forwarding through the registers Instruction 1 2 3 4 5 6 7 8 LD R1, 44(R2) DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R1, R7

Figure C.9 (new slide) Data Forwarding Figure C.9 The load instruction can bypass its results to the AND and OR instructions, but not to the DSUB, since that would mean forwarding the result in “negative time.” Copyright © 2011, Elsevier Inc. All rights Reserved.

Logic to detect Hazards

Forwarding Figure C.26 Pipeline Reg. Source Opcode of Source Pipeline Reg. Destination Opcode of Destination Destination of forwarding Comparison (if equal then forward )

Pipeline Reg. Destination Opcode of Destination Comparison Pipeline Reg. Source Opcode of Source Pipeline Reg. Destination Opcode of Destination Destination of forwarding Comparison (if equal then forward )

Figure C.23 Forwarding Paths

Load/Use Hazard

Delays for Mis-predicted Branches .

Figure C.24 Avoiding some Branch Stalls

Figure FP Latencies

Figure C.29 MIPS Pipeline +FP Units

Figure C.31 Supporting multiple outstanding FP operations

Figure C.32 Timings of Independent FP operations

Figure C.33 Stalls due to RAW hazards

Figure C.34 Simultaneous write-back

Figure C.40

Dynamic Scheduling

Homework …

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows the overlap among the parts of the data path, with clock cycle 5 (CC 5) showing the steady-state situation. Because the register file is used as a source in the ID stage and as a destination in the WB stage, it appears twice. We show that it is read in one part of the stage and written in another by using a solid line, on the right or left, respectively, and a dashed line on the other side. The abbreviation IM is used for instruction memory, DM for data memory, and CC for clock cycle. Copyright © 2011, Elsevier Inc. All rights Reserved.

Figure C.9 (new slide) Data Forwarding Figure C.9 The load instruction can bypass its results to the AND and OR instructions, but not to the DSUB, since that would mean forwarding the result in “negative time.” Copyright © 2011, Elsevier Inc. All rights Reserved.

Figure C-23 Events on Pipeline

.

terms interrupt, fault, and exception The terms interrupt, fault, and exception are used, although not in a consistent fashion. We use the term exception to cover all these mechanisms, including the following: I/ O device request Invoking an operating system service from a user program Tracing instruction execution Breakpoint (programmer-requested interrupt) Integer arithmetic overflow FP arithmetic anomaly Page fault (not in main memory) Misaligned memory accesses (if alignment is required) Memory protection violation Using an undefined or unimplemented instruction Hardware malfunctions Power failure

Linux File System Hierarchy Basic Commands ls cd mv cp rm pwd gcc ./a.out Editors: emacs, vim, nano, pico, gedit, kate, …

cocsce-l1d39-16> lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Stepping: 3 CPU MHz: 800.156 BogoMIPS: 7184.02 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Linux - lscpu

Man –k ls | grep “^ls” cocsce-l1d39-16> man -k ls | grep "^ls" lsattr (1) - list file attributes on a Linux second extended file sy... lsb (8) - Linux Standard Base support for Debian lsb_release (1) - print distribution-specific information lsblk (8) - list block devices lscpu (1) - display information on CPU architecture lsdiff (1) - show which files are modified by a patch lsearch (3) - linear search of an array lseek (2) - reposition read/write file offset lshw (1) - list hardware lsinitramfs (8) - list content of an initramfs image lsmod (8) - Show the status of modules in the Linux Kernel lsof (8) - list open files lspci (8) - list all PCI devices lspcmcia (8) - display extended PCMCIA debugging information lspgpot (1) - extracts the ownertrust values from PGP keyrings and li... lstopo (1) - Show the topology of the system lstopo-no-graphics (1) - Show the topology of the system lsusb (8) - list USB devices