/ Computer Architecture and Design

Slides:

Advertisements

Similar presentations

L.N. Bhuyan Adapted from Patterson’s slides

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Cache Optimization Summary

1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.

CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Lecture 13: Multiprocessors Kai Bu

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

The University of Adelaide, School of Computer Science

Lecture 13: Multiprocessors Kai Bu

1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.

CS 704 Advanced Computer Architecture

Translation Lookaside Buffer

CS 704 Advanced Computer Architecture

Processor support devices Part 2: Caches and the MESI protocol

/ Computer Architecture and Design

/ Computer Architecture and Design

/ Computer Architecture and Design

CSC 4250 Computer Architectures

Computer Engineering 2nd Semester

CS 704 Advanced Computer Architecture

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

CS 147 – Parallel Processing

The University of Adelaide, School of Computer Science

Multiprocessor Cache Coherency

5.2 Eleven Advanced Optimizations of Cache Performance

Cache Memory Presentation I

Jason F. Cantin, Mikko H. Lipasti, and James E. Smith

/ Computer Architecture and Design

CMSC 611: Advanced Computer Architecture

Example Cache Coherence Problem

Flynn’s Taxonomy Flynn classified by data and control streams in 1966

The University of Adelaide, School of Computer Science

Advanced Computer Architectures

Merry Christmas Good afternoon, class,

Lecture 23: Cache, Memory, Virtual Memory

Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.

Lecture 22: Cache Hierarchies, Memory

Lecture: Cache Innovations, Virtual Memory

Multiprocessors - Flynn’s taxonomy (1966)

Introduction to Multiprocessors

Lecture 24: Memory, VM, Multiproc

11 – Snooping Cache and Directory Based Multiprocessors

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Lecture 15: Memory Design

/ Computer Architecture and Design

/ Computer Architecture and Design

Lecture 25: Multiprocessors

/ Computer Architecture and Design

High Performance Computing

Lecture 25: Multiprocessors

CSC3050 – Computer Architecture

Chapter 4 Multiprocessors

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Lecture 24: Multiprocessors

Coherent caches Adapted from a lecture by Ian Watson, University of Machester.

Lecture 17 Multiprocessors and Thread-Level Parallelism

CPE 631 Lecture 20: Multiprocessors

CSL718 : Multiprocessors 13th April, 2006 Introduction

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

16.482 / 16.561 Computer Architecture and Design Instructor: Dr. Michael Geiger Spring 2015 Lecture 11: Final Exam Preview

Computer Architecture Lecture 11 Lecture outline Announcements/reminders HW 9 due today Final exam: Thursday, 4/23 Must complete course evaluation prior to starting exam Will post link as soon as it’s available Print and bring to exam Today’s lecture: Final Exam Preview 1/17/2019 Computer Architecture Lecture 11

Computer Architecture Lecture 11 Final exam notes Allowed to bring: Two 8.5” x 11” double-sided sheets of notes Calculator No other notes or electronic devices Test policies are same as last time Can’t remove anything from bag during exam, no sharing (pencils, erasers, etc.), only one person allowed out of room at a time Exam will last until 9:30 Will be written for ~90 minutes ... in theory Covers all lectures after Exam 1 Material starts with multiple issue/multithreading Question formats Problem solving Largely similar to homework problems, but shorter Some short answer Explain concepts in short paragraph 1/17/2019 Computer Architecture Lecture 11

Review: TLP, multithreading ILP (which is implicit) limited on realistic hardware for single thread of execution Could look at Thread-Level Parallelism (TLP) Focus on multithreading Coarse-grained: switch threads on a long latency stall Fine-grained: switch threads every cycle Simultaneous: allow multiple threads to execute in same cycle 1/17/2019 Computer Architecture Lecture 11

Review: Memory hierarchies Want large, fast, low-cost memory; can’t have it all at once Use multiple levels: cache (may have >1), main memory, disk Discussed operation of hierarchy, focusing heavily on: Caching: principle of locality, addressing cache, “4 questions” (block placement, identification, replacement, write strategy) Virtual memory: page tables & address translation, page replacement, TLB 1/17/2019 Computer Architecture Lecture 11

Review: cache optimizations Way prediction Want benefits of Direct-mapped: fast hits Set-associative: fewer conflicts Predict which way within set holds data Trace cache Intel-specific (originally—ARM uses as well) Track dynamic “traces” of decoded micro-ops/instructions Multi-banked caches Cache physically split into pieces (“banks”) Data sequentially interleaved Spread accesses across banks Allows for cache pipelining, non-blocking caches 1/17/2019 Computer Architecture Lecture 11

Review: cache optimizations Critical word first / early restart Fetch desired word in cache block first on miss Restart processor as soon as desired word received Prefetching Anticipate misses and fetch blocks into cache Hardware: next-sequential prefetching (fetch block after one causing miss) most common/effective Software: prefetch instructions 1/17/2019 Computer Architecture Lecture 11

Computer Architecture Lecture 11 Review: Storage Disks  RAID Build redundancy (parity) into disk array because MTTFarray = MTTFdisk / # disks Files “striped” across disks along with parity Different levels of RAID offer improvements RAID 1: Mirroring RAID 3: Introduced parity disk RAID 4: Use per-sector error correction  allows small reads RAID 5: Interleaved parity  allows small writes 1/17/2019 Computer Architecture Lecture 11

Review: Classifying Multiprocessors Parallel Architecture = Computer Architecture + Communication Architecture Classifying by memory: Centralized Memory (Symmetric) Multiprocessor Typically for smaller systems  bandwidth demands on both network and memory system Physically Distributed-Memory multiprocessor Scales better  bandwidth demands (mostly) for network Classifying by communication: Message-Passing Multiprocessor: processors explicitly pass messages Shared Memory Multiprocessor: processors communicate through shared address space If using centralized memory, UMA (Uniform Memory Access time) If distributed memory, NUMA (Non Uniform Memory Access time) 1/17/2019 Computer Architecture Lecture 11

Review: Cache coherence protocols Need to maintain coherence; two types of protocols for doing so Directory based: One (logically) centralized structure holds sharing information One entry per memory block Snooping: Each cache holding a copy of data has copy of its sharing information One entry per cache block Caches “snoop” bus and respond to relevant transactions You should have an understanding of both types of protocols Two ways to handle writes Write invalidate: Ensure that cache has exclusive access to block before performing write All other copies are invalidated Write update (or write broadcast): Update all cached copies of a block whenever block is written 1/17/2019 Computer Architecture Lecture 11

Review: More cache coherence Should be familiar with coherence protocol state transition diagrams What happens on different CPU transactions? (Read/Write, Hit/Miss) For snooping protocol, what happens for relevant transactions sent over bus? (Read or write miss) In snooping protocol, all state transitions occur for cache block, since cache block tracks sharing info For directory protocol, what happens for each message sent to given processor or directory? (Read/Write Miss, Invalidate and/or Fetch, Data Reply) State transitions occur in cache blocks and directory entries (directory holds sharing info) Additional cache misses: coherence misses True sharing misses: one processor writes to same location that’s later accessed by another processor False sharing misses: one processor writes to same block (but different location) later accessed by another processor 1/17/2019 Computer Architecture Lecture 11

Computer Architecture Lecture 11 Final notes Next time: Final Exam (Thursday, 4/23) Must complete course evaluation prior to starting exam Will post link as soon as it’s available Print and bring to exam Reminders HW 9 due today 1/17/2019 Computer Architecture Lecture 11