The University of Adelaide, School of Computer Science

Slides:

Advertisements

Similar presentations

1 Review of Chapters 3 & 4 Copyright © 2012, Elsevier Inc. All rights reserved.

Advertisements

CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Instruction-Level Parallelism (ILP)

/ Computer Architecture and Design Instructor: Dr. Michael Geiger Summer 2014 Lecture 6: Speculation.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

Hardware Multithreading. Increasing CPU Performance By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Spring 2003CSE P5481 Midterm Philosophy What the exam looks like. Definitions, comparisons, advantages & disadvantages what is it? how does it work? why.

1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.

EKT303/4 Superscalar vs Super-pipelined.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Use of Pipelining to Achieve CPI < 1

Instruction-Level Parallelism and Its Dynamic Exploitation

CS 352H: Computer Systems Architecture

COMP 740: Computer Architecture and Implementation

Advanced Architectures

15-740/ Computer Architecture Lecture 3: Performance

Computer Architecture Principles Dr. Mike Frank

/ Computer Architecture and Design

/ Computer Architecture and Design

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

Simultaneous Multithreading

CS203 – Advanced Computer Architecture

Pipeline Implementation (4.6)

CS203 – Advanced Computer Architecture

/ Computer Architecture and Design

CDA 3101 Spring 2016 Introduction to Computer Organization

Lecture 12 Reorder Buffers

Exceptions & Multi-cycle Operations

Electrical and Computer Engineering

The University of Adelaide, School of Computer Science

Out of Order Processors

Vishwani D. Agrawal James J. Danaher Professor

From before the Break Classic 5-stage pipeline

Computer Architecture Lecture 4 17th May, 2006

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.

Limits to ILP Conflicting studies of amount

Hardware Multithreading

The University of Adelaide, School of Computer Science

Lecture 11: Memory Data Flow Techniques

Coe818 Advanced Computer Architecture

Advanced Computer Architecture

Instruction Level Parallelism (ILP)

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

/ Computer Architecture and Design

/ Computer Architecture and Design

CSE 586 Computer Architecture Review

Vishwani D. Agrawal James J. Danaher Professor

Overview Prof. Eric Rotenberg

The University of Adelaide, School of Computer Science

Chapter 4 Multiprocessors

Hardware Multithreading

CSC3050 – Computer Architecture

Chip&Core Architecture

Introduction to Computer Systems Engineering

Presentation transcript:

The University of Adelaide, School of Computer Science Computer Architecture A Quantitative Approach, Fifth Edition The University of Adelaide, School of Computer Science 16 May 2019 EEL 6764 Principles of Computer Architecture Final Review Dr Hao Zheng Dept. of Comp Sci & Eng U of South Florida Chapter 2 — Instructions: Language of the Computer

Computer Architecture – Big Picture The University of Adelaide, School of Computer Science 16 May 2019 Computer Architecture – Big Picture uP Cache Interconnect Memory HD DISP ... KB Chapter 2 — Instructions: Language of the Computer

ISA Design Different architectures Classified by organizations of operands -- stack, accumulator, GP registers Classified by granularity – CISC vs RISC Objectives: performance, implementability, compatibility Considerations Number of registers Number of operands Memory addressing mode Types of instructions Instruction encoding – fixed vs flexible impact of code size

ILP and Pipelining ILP – overlapped executions of different instructions Pipelining – architecture to exploit ILP Architecture of 5-stage MIPS pipeline IF, ID, EX, MEM, WB Ideal pipeline performance Speedup is close to number of pipeline stages Pipeline hazards, mem & FU latency, pipeline register delays Pipeline hazards Structural – a hardware issue Data – dependencies in programs Control - related to branches

ILP and Pipelining Hazard handling – stall (simple, but undesirable) Structural – replicate HW, or pipelining slow components Data – RAW, WAR, WAW forwarding for RAW, register renaming for WAR & WAW Branch Branch prediction -- static & dynamic, 1-bit & 2-bit predictors HW speculation

ILP and Pipelining Scheduling – what is it? Tomasulo’s algorithm Static vs dynamic – issues of static scheduling register renaming – WAW & WAR hazards Tomasulo’s algorithm reg renaming + dynamic scheduling reservation stations, CDB, tags for registers operations – see Figure 3.13. HW speculation – what is it, and how does it work? What problem does it solve? What additional HW is used and operated? (see Fig. 3.18) Efficiency depends on branch prediction accuracy

ILP and Pipelining Multi-Issue Goal: reduce CPI to <1 Challenge – complexity of issue logic (see Fig 3.22) Multithreading – target thread-level parallelism Fine-grained Coarse-grained Simultaneous multithreading – additional PCs and registers

Memory Hierarchy Design Mem latency – limiting factor of performance Cache – Reduce average memory access latency spatial & temporal locality Organization Set Associativity Cache misses, and their causes Performance measurement Write policy – write-back vs write-through Optimization techniques Virtual memory – memory as cache for hard disk Roles: memory management and protection Organization: page, page table, page faults; addr translation TLB – cache for page table

TLP and Multiprocessors Parallel execution of instructions from different threads Multiprocessor = a set of processors connected together Support MIMD execution model Multiprocessing vs multithreading Architectures Symmetric MP, aka, centralized shared memory MP Distributed shared memory MP Coordination – shared memory Caching – reduce remote memory access latency Introduce coherence problem Cache Coherence protocols – snooping vs directory Write-invalidate vs write-update

Vector Processors DLP – parallel operations on different data Vector architecture Exploit DLP Support SIMD execution model Target applications with many vector operations/loops Main architecture features Vector registers, Vector load/store unit, addressing with stride, gather-scatter Pipelined functional units, multiple lanes, chaining Vector length register, strip mining Vector predicate register for conditional execution over vectors Differences against superscalar processors