Shared Memory Multiprocessors

Slides:

Advertisements

Similar presentations

Chapter 5 Part I: Shared Memory Multiprocessors

Advertisements

L.N. Bhuyan Adapted from Patterson’s slides

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

Maninder Kaur CACHE MEMORY 24-Nov

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

Lecture 20 Last lecture: Today’s lecture: Types of memory

1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Background Computer System Architectures Computer System Software.

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

The University of Adelaide, School of Computer Science

CS161 – Design and Architecture of Computer

CMSC 611: Advanced Computer Architecture

COMP 740: Computer Architecture and Implementation

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

Lecture: Large Caches, Virtual Memory

Reactive NUMA A Design for Unifying S-COMA and CC-NUMA

The University of Adelaide, School of Computer Science

CS 147 – Parallel Processing

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers

Lecture: Large Caches, Virtual Memory

ECE 445 – Computer Organization

CMSC 611: Advanced Computer Architecture

Example Cache Coherence Problem

Lecture 21: Memory Hierarchy

Flynn’s Taxonomy Flynn classified by data and control streams in 1966

The University of Adelaide, School of Computer Science

Parallel and Multiprocessor Architectures – Shared Memory

Lecture 23: Cache, Memory, Virtual Memory

Chapter 5 Memory CSE 820.

Systems Architecture II

ECE 445 – Computer Organization

Chapter 5 Multiprocessor and Thread-Level Parallelism

Module IV Memory Organization.

Lecture 22: Cache Hierarchies, Memory

Multiprocessors - Flynn’s taxonomy (1966)

Introduction to Multiprocessors

Lecture 24: Memory, VM, Multiproc

CMSC 611: Advanced Computer Architecture

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

CS 704 Advanced Computer Architecture

/ Computer Architecture and Design

Lecture 22: Cache Hierarchies, Memory

CSC3050 – Computer Architecture

Lecture 21: Memory Hierarchy

Chapter 4 Multiprocessors

The University of Adelaide, School of Computer Science

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Main Memory Background

Cache Memory and Performance

Principle of Locality: Memory Hierarchies

The University of Adelaide, School of Computer Science

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Shared Memory Multiprocessors

Symmetric Multiprocessors SMPs are the most prevalent form of parallel architectures Provide global access to a global physical address Dominate server market On their way to dominating desktop “Throughput engines” for multiple sequential jobs Also, attractive for parallel programming Uniform access via ordinary loads/stores Automatic movement/replication of shared data in local caches Can support message passing programming model No operating system involvement needed; address translation and buffer protection provided by hardware

Extended Memory Hierarchies P 1 Switch Main memory n First-level $ Shared cache I/O devices Mem P 1 $ n Bus Bus-based shared memory SMP P 1 $ Inter connection network n Mem P 1 $ Inter connection network n Mem Distributed memory (NUMA) Dancehall (UMA)

Shared Cache Interconnect between processors and shared cache 1 Switch Main memory n First-level $ Interconnect between processors and shared cache Useful for connecting small number of processors (2-8) on a board or chip Scalability: Limited Interconnect comes in the way while accessing cache Cache required to have tremendous bandwidth

Bus-Based Symmetric Shared Memory I/O devices Mem P 1 $ n Bus Small to medium scale (20-30 processors) Dominating parallel machine market Scalability Bus bandwidth is the bottleneck

Dancehall Scalability Symmetry still holds P 1 $ Inter connection network n Mem Symmetry still holds Any processor is uniformly far away from any memory block Scalability Limited by interconnection network Distance between a processor and a memory block is several hops

Distributed memory Asymmetric P 1 $ Inter connection network n Mem Asymmetric Processors are closer to their local memory blocks Exploits data locality to handle cache misses locally Scalability Most scalable among all hierarchies

Cache Coherence Problem (1) Caches play key role Reduce average data access time Reduce bandwidth demands on shared interconnect Problem with private processor caches Copies of a variable can be present in multiple caches Writes may not become visible to other processors They’ll keep accessing stale value in their caches Frequent and unacceptable!

Cache Coherence Problem (2) Example Processors see different values for u after event 3 P P 2 P 1 3 4 u = ? 3 u = 7 5 u = ? $ $ $ 1 u :5 2 u :5 I/O devices u :5 Memory

Cache Memory: Background

Background Block placement Block identification Block replacement Write policy Performance

Block Placement Three categories Direct mapped Fully associative Set associative

Direct Mapped Cache A memory block has only one place to go to in cache Can also be called One-way set associative (will know why soon) Block 0 Block 0 MOD (cache size) Cache Block 7 Memory Block 31

Fully Associative Cache Any memory block can be placed anywhere in the cache Block 0 Block 0 Cache Block 7 Memory Block 31

Set Associative Cache In a w-way set associative cache Divide cache into sets; each set has w blocks A memory block has w place to go to in cache Example: 2-way set associative #sets = (#cache blocks)/w = 8/2 = 4 Block 0 Block 0 Set 0 MOD (# sets) Set 3 Cache Block 7 Memory Block 31

Block Identification (1) Physical address Block address Offset Tag Index Tag Index Offset Block address Physical address Log (#sets) Log (block size) -Base 2 logarithms

Block Identification (2) Index identifies set All stored tags in the set are compared against Tag Only one should match Tag Index Offset ? 1

Block Replacement FIFO: first in first out LRU: least recently used Random

Write Policy Write-through caches Write-back caches Values get updated in main memory immediately Main memory always has up-to-date values Leads to slower performance Easier to implement Write-back caches Values don’t get updated in main memory Main memory may contain outdated values Leads to faster performance Harder to implement

Cache Performance Average access time Miss penalty Miss rate hit time + miss rate x miss penalty Miss penalty Time taken to access memory In the order of 100s of times of hit time Miss rate Depends on several factors Design Program If miss rate is too small, average access time approaches the hit time

More on Cache Memory For more, read Sections 5.1 through 5.3 of J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Inc., Palo Alto, CA, third edition, 2002.