/ Computer Architecture and Design

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Cache Optimization Summary
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Lecture 13: Multiprocessors Kai Bu
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Lecture 13: Multiprocessors Kai Bu
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.
CS 704 Advanced Computer Architecture
Translation Lookaside Buffer
CS 704 Advanced Computer Architecture
Processor support devices Part 2: Caches and the MESI protocol
/ Computer Architecture and Design
/ Computer Architecture and Design
/ Computer Architecture and Design
CSC 4250 Computer Architectures
Computer Engineering 2nd Semester
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CS 147 – Parallel Processing
The University of Adelaide, School of Computer Science
Multiprocessor Cache Coherency
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
/ Computer Architecture and Design
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
Flynn’s Taxonomy Flynn classified by data and control streams in 1966
The University of Adelaide, School of Computer Science
Advanced Computer Architectures
Merry Christmas Good afternoon, class,
Lecture 23: Cache, Memory, Virtual Memory
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
Lecture 22: Cache Hierarchies, Memory
Lecture: Cache Innovations, Virtual Memory
Multiprocessors - Flynn’s taxonomy (1966)
Introduction to Multiprocessors
Lecture 24: Memory, VM, Multiproc
11 – Snooping Cache and Directory Based Multiprocessors
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 15: Memory Design
/ Computer Architecture and Design
/ Computer Architecture and Design
Lecture 25: Multiprocessors
/ Computer Architecture and Design
High Performance Computing
Lecture 25: Multiprocessors
CSC3050 – Computer Architecture
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
CPE 631 Lecture 20: Multiprocessors
CSL718 : Multiprocessors 13th April, 2006 Introduction
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

16.482 / 16.561 Computer Architecture and Design Instructor: Dr. Michael Geiger Spring 2015 Lecture 11: Final Exam Preview

Computer Architecture Lecture 11 Lecture outline Announcements/reminders HW 9 due today Final exam: Thursday, 4/23 Must complete course evaluation prior to starting exam Will post link as soon as it’s available Print and bring to exam Today’s lecture: Final Exam Preview 1/17/2019 Computer Architecture Lecture 11

Computer Architecture Lecture 11 Final exam notes Allowed to bring: Two 8.5” x 11” double-sided sheets of notes Calculator No other notes or electronic devices Test policies are same as last time Can’t remove anything from bag during exam, no sharing (pencils, erasers, etc.), only one person allowed out of room at a time Exam will last until 9:30 Will be written for ~90 minutes ... in theory Covers all lectures after Exam 1 Material starts with multiple issue/multithreading Question formats Problem solving Largely similar to homework problems, but shorter Some short answer Explain concepts in short paragraph 1/17/2019 Computer Architecture Lecture 11

Review: TLP, multithreading ILP (which is implicit) limited on realistic hardware for single thread of execution Could look at Thread-Level Parallelism (TLP) Focus on multithreading Coarse-grained: switch threads on a long latency stall Fine-grained: switch threads every cycle Simultaneous: allow multiple threads to execute in same cycle 1/17/2019 Computer Architecture Lecture 11

Review: Memory hierarchies Want large, fast, low-cost memory; can’t have it all at once Use multiple levels: cache (may have >1), main memory, disk Discussed operation of hierarchy, focusing heavily on: Caching: principle of locality, addressing cache, “4 questions” (block placement, identification, replacement, write strategy) Virtual memory: page tables & address translation, page replacement, TLB 1/17/2019 Computer Architecture Lecture 11

Review: cache optimizations Way prediction Want benefits of Direct-mapped: fast hits Set-associative: fewer conflicts Predict which way within set holds data Trace cache Intel-specific (originally—ARM uses as well) Track dynamic “traces” of decoded micro-ops/instructions Multi-banked caches Cache physically split into pieces (“banks”) Data sequentially interleaved Spread accesses across banks Allows for cache pipelining, non-blocking caches 1/17/2019 Computer Architecture Lecture 11

Review: cache optimizations Critical word first / early restart Fetch desired word in cache block first on miss Restart processor as soon as desired word received Prefetching Anticipate misses and fetch blocks into cache Hardware: next-sequential prefetching (fetch block after one causing miss) most common/effective Software: prefetch instructions 1/17/2019 Computer Architecture Lecture 11

Computer Architecture Lecture 11 Review: Storage Disks  RAID Build redundancy (parity) into disk array because MTTFarray = MTTFdisk / # disks Files “striped” across disks along with parity Different levels of RAID offer improvements RAID 1: Mirroring RAID 3: Introduced parity disk RAID 4: Use per-sector error correction  allows small reads RAID 5: Interleaved parity  allows small writes 1/17/2019 Computer Architecture Lecture 11

Review: Classifying Multiprocessors Parallel Architecture = Computer Architecture + Communication Architecture Classifying by memory: Centralized Memory (Symmetric) Multiprocessor Typically for smaller systems  bandwidth demands on both network and memory system Physically Distributed-Memory multiprocessor Scales better  bandwidth demands (mostly) for network Classifying by communication: Message-Passing Multiprocessor: processors explicitly pass messages Shared Memory Multiprocessor: processors communicate through shared address space If using centralized memory, UMA (Uniform Memory Access time) If distributed memory, NUMA (Non Uniform Memory Access time) 1/17/2019 Computer Architecture Lecture 11

Review: Cache coherence protocols Need to maintain coherence; two types of protocols for doing so Directory based: One (logically) centralized structure holds sharing information One entry per memory block Snooping: Each cache holding a copy of data has copy of its sharing information One entry per cache block Caches “snoop” bus and respond to relevant transactions You should have an understanding of both types of protocols Two ways to handle writes Write invalidate: Ensure that cache has exclusive access to block before performing write All other copies are invalidated Write update (or write broadcast): Update all cached copies of a block whenever block is written 1/17/2019 Computer Architecture Lecture 11

Review: More cache coherence Should be familiar with coherence protocol state transition diagrams What happens on different CPU transactions? (Read/Write, Hit/Miss) For snooping protocol, what happens for relevant transactions sent over bus? (Read or write miss) In snooping protocol, all state transitions occur for cache block, since cache block tracks sharing info For directory protocol, what happens for each message sent to given processor or directory? (Read/Write Miss, Invalidate and/or Fetch, Data Reply) State transitions occur in cache blocks and directory entries (directory holds sharing info) Additional cache misses: coherence misses True sharing misses: one processor writes to same location that’s later accessed by another processor False sharing misses: one processor writes to same block (but different location) later accessed by another processor 1/17/2019 Computer Architecture Lecture 11

Computer Architecture Lecture 11 Final notes Next time: Final Exam (Thursday, 4/23) Must complete course evaluation prior to starting exam Will post link as soon as it’s available Print and bring to exam Reminders HW 9 due today 1/17/2019 Computer Architecture Lecture 11