Shared Memory Multiprocessors

Slides:



Advertisements
Similar presentations
Chapter 5 Part I: Shared Memory Multiprocessors
Advertisements

L.N. Bhuyan Adapted from Patterson’s slides
1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Maninder Kaur CACHE MEMORY 24-Nov
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Lecture 20 Last lecture: Today’s lecture: Types of memory
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Background Computer System Architectures Computer System Software.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
COMP 740: Computer Architecture and Implementation
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture: Large Caches, Virtual Memory
Reactive NUMA A Design for Unifying S-COMA and CC-NUMA
The University of Adelaide, School of Computer Science
CS 147 – Parallel Processing
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers
Lecture: Large Caches, Virtual Memory
ECE 445 – Computer Organization
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
Lecture 21: Memory Hierarchy
Flynn’s Taxonomy Flynn classified by data and control streams in 1966
The University of Adelaide, School of Computer Science
Parallel and Multiprocessor Architectures – Shared Memory
Lecture 23: Cache, Memory, Virtual Memory
Chapter 5 Memory CSE 820.
Systems Architecture II
ECE 445 – Computer Organization
Chapter 5 Multiprocessor and Thread-Level Parallelism
Module IV Memory Organization.
Lecture 22: Cache Hierarchies, Memory
Multiprocessors - Flynn’s taxonomy (1966)
Introduction to Multiprocessors
Lecture 24: Memory, VM, Multiproc
CMSC 611: Advanced Computer Architecture
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
CS 704 Advanced Computer Architecture
/ Computer Architecture and Design
Lecture 22: Cache Hierarchies, Memory
CSC3050 – Computer Architecture
Lecture 21: Memory Hierarchy
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Main Memory Background
Cache Memory and Performance
Principle of Locality: Memory Hierarchies
The University of Adelaide, School of Computer Science
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Shared Memory Multiprocessors

Symmetric Multiprocessors SMPs are the most prevalent form of parallel architectures Provide global access to a global physical address Dominate server market On their way to dominating desktop “Throughput engines” for multiple sequential jobs Also, attractive for parallel programming Uniform access via ordinary loads/stores Automatic movement/replication of shared data in local caches Can support message passing programming model No operating system involvement needed; address translation and buffer protection provided by hardware

Extended Memory Hierarchies P 1 Switch Main memory n First-level $ Shared cache I/O devices Mem P 1 $ n Bus Bus-based shared memory SMP P 1 $ Inter connection network n Mem P 1 $ Inter connection network n Mem Distributed memory (NUMA) Dancehall (UMA)

Shared Cache Interconnect between processors and shared cache 1 Switch Main memory n First-level $ Interconnect between processors and shared cache Useful for connecting small number of processors (2-8) on a board or chip Scalability: Limited Interconnect comes in the way while accessing cache Cache required to have tremendous bandwidth

Bus-Based Symmetric Shared Memory I/O devices Mem P 1 $ n Bus Small to medium scale (20-30 processors) Dominating parallel machine market Scalability Bus bandwidth is the bottleneck

Dancehall Scalability Symmetry still holds P 1 $ Inter connection network n Mem Symmetry still holds Any processor is uniformly far away from any memory block Scalability Limited by interconnection network Distance between a processor and a memory block is several hops

Distributed memory Asymmetric P 1 $ Inter connection network n Mem Asymmetric Processors are closer to their local memory blocks Exploits data locality to handle cache misses locally Scalability Most scalable among all hierarchies

Cache Coherence Problem (1) Caches play key role Reduce average data access time Reduce bandwidth demands on shared interconnect Problem with private processor caches Copies of a variable can be present in multiple caches Writes may not become visible to other processors They’ll keep accessing stale value in their caches Frequent and unacceptable!

Cache Coherence Problem (2) Example Processors see different values for u after event 3 P P 2 P 1 3 4 u = ? 3 u = 7 5 u = ? $ $ $ 1 u :5 2 u :5 I/O devices u :5 Memory

Cache Memory: Background

Background Block placement Block identification Block replacement Write policy Performance

Block Placement Three categories Direct mapped Fully associative Set associative

Direct Mapped Cache A memory block has only one place to go to in cache Can also be called One-way set associative (will know why soon) Block 0 Block 0 MOD (cache size) Cache Block 7 Memory Block 31

Fully Associative Cache Any memory block can be placed anywhere in the cache Block 0 Block 0 Cache Block 7 Memory Block 31

Set Associative Cache In a w-way set associative cache Divide cache into sets; each set has w blocks A memory block has w place to go to in cache Example: 2-way set associative #sets = (#cache blocks)/w = 8/2 = 4 Block 0 Block 0 Set 0 MOD (# sets) Set 3 Cache Block 7 Memory Block 31

Block Identification (1) Physical address Block address Offset Tag Index Tag Index Offset Block address Physical address Log (#sets) Log (block size) -Base 2 logarithms

Block Identification (2) Index identifies set All stored tags in the set are compared against Tag Only one should match Tag Index Offset ? 1

Block Replacement FIFO: first in first out LRU: least recently used Random

Write Policy Write-through caches Write-back caches Values get updated in main memory immediately Main memory always has up-to-date values Leads to slower performance Easier to implement Write-back caches Values don’t get updated in main memory Main memory may contain outdated values Leads to faster performance Harder to implement

Cache Performance Average access time Miss penalty Miss rate hit time + miss rate x miss penalty Miss penalty Time taken to access memory In the order of 100s of times of hit time Miss rate Depends on several factors Design Program If miss rate is too small, average access time approaches the hit time

More on Cache Memory For more, read Sections 5.1 through 5.3 of J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Inc., Palo Alto, CA, third edition, 2002.