SE-292 High Performance Computing

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

CS 105 Tour of the Black Holes of Computing

Chapter 5 : Memory Management

Chapter 4 Memory Management Basic memory management Swapping

Page Table Implementation

1 Overview Assignment 4: hints Memory management Assignment 3: solution.

SE-292: High Performance Computing

Cache and Virtual Memory Replacement Algorithms

Module 10: Virtual Memory

Chapter 3 Memory Management

Chapter 10: Virtual Memory

Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.

CS252/Culler Lec 5.1 2/5/02 CS203A Computer Architecture Lecture 15 Cache and Memory Technology and Virtual Memory.

Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.

Some of the slides are adopted from David Patterson (UCB)

Datorteknik VirtualMemory bild 1 Virtual Memory User memory model so far: Separate Instruction and Data memory In reality they share the same memory space.

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Virtual Storage SystemCS510 Computer ArchitecturesLecture Lecture 14 Virtual Storage System.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

CMPE 421 Parallel Computer Architecture

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.

CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and

Virtual Memory Hardware Support

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

Virtual Memory. Why do we need VM? Program address space: 0 – 2^32 bytes –4GB of space Physical memory available –256MB or so Multiprogramming systems.

Computer Organization Cs 147 Prof. Lee Azita Keshmiri.

Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.

©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan

A. Frank - P. Weisberg Operating Systems Simple/Basic Paging.

©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan

CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (

Lecture 19: Virtual Memory

1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.

IT253: Computer Organization

Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)

Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

CS203 – Advanced Computer Architecture Virtual Memory.

CS161 – Design and Architecture of Computer

Virtual Memory Chapter 7.4.

ECE232: Hardware Organization and Design

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

From Address Translation to Demand Paging

CS 704 Advanced Computer Architecture

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Memory Hierarchy Virtual Memory, Address Translation

CSCI206 - Computer Organization & Programming

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Evolution in Memory Management Techniques

Andy Wang Operating Systems COP 4610 / CGS 5765

Andy Wang Operating Systems COP 4610 / CGS 5765

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

Virtual Memory Overcoming main memory size limitation

CSC3050 – Computer Architecture

Virtual Memory Lecture notes from MKP and S. Yalamanchili.

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Sarah Diesburg Operating Systems CS 3430

Andy Wang Operating Systems COP 4610 / CGS 5765

Sarah Diesburg Operating Systems COP 4610

Presentation transcript:

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc L1

Reality Check Question 1: Are real caches built to work on virtual addresses or physical addresses? Question 2: What about multiple levels in caches? Question 3: Do modern processors use pipelining of the kind that we studied? Lec18

Virtual Memory System To support memory management when multiple processes are running concurrently. Page based, segment based, ... Ensures protection across processes. Address Space: range of memory addresses a process can address (includes program (text), data, heap, and stack) 32-bit address  4 GB with VM, address generated by the processor is virtual address

Page-Based Virtual Memory A process’ address space is divided into a no. of pages (of fixed size). A page is the basic unit of transfer between secondary storage and main memory. Different processes share the physical memory Virtual address to be translated to physical address.

Virtual Pages to Physical Frame Mapping Process 1 Main Memory Process k

Page Mapping Info (Page Table) A page is mapped to any frame in the main memory. Where to store/access the mapping? Page Table Each process will have its own page table! Address Translation: virtual to physical address translation

Address Translation Virtual address (issued by processor) to Physical Address (used to access memory) Virtual Page No. 18-bits Offset 14 bits Phy. Page # Pr D V Physical Frame No. Virtual Address Physical Address Page Table

Memory Hierarchy: Secondary to Main Memory Analogous to Main memory to Cache. When the required virtual page is not in main memory: Page Hit When the required virtual page is not in main memory: Page fault Page fault penalty very high (~10’s of mSecs.) as it involves disk (secondary storage) access. Page fault ratio shd. be very low (10-4 to 10-5) Page fault handled by OS.

Page Placement A virtual page is placed anywhere in physical memory (fully associative). Page Table keeps track of the page mapping. Separate page table for each process. Page table size is quite large! Assume 32-bit address space and 16KB page size. # of entries in page table = 232 / 214 = 218 = 256K Page table size = 256 K * 4B = 1 MB = 64 pages! Page table itself may be paged (multi-level page tables)!

Page Identification Use virtual page number to index into page table. Accessing page table causes one extra memory access! Virtual Page No. 18-bits Offset 14 bits Phy. Page # Pr V D Physical Frame No. 18-bits Offset 14 bits

Page Replacement Write Policies Page replacement can use more sophisticated policies than in Cache. Least Recently Used Second-chance algorithm Recency vs. frequency Write Policies Write-back Write-allocate

Translation Look-Aside Buffer Accessing the page tables causes one extra memory access! To reduce translation time, use translation look-aside buffer (TLB) which caches recent address translations. TLB organization similar to cache orgn. (direct mapped, set-, or full-associative). Size of TLB is small (4 - 512 entries). TLBs is important for fast translation.

Translation using TLB Assume 128 entry 4-way associative TLB 14 bits Virtual Page No. Offset Physical Frame No. = Tag 13bits Ind. 5bits Phy. Page # Pr V D TLB Hit P. P. # Pr D V

Q1: Caches and Address Translation Physical Addressed Cache Virtual Address Physical Address MMU Cache Virtual Address Virtual Address Physical Address Cache MMU Lec18 (if cache miss) (to main memory) Virtual Addressed Cache

Which is less preferable? Physical addressed cache Hit time higher (cache access after translation) Virtual addressed cache Data/instruction of different processes with same virtual address in cache at the same time … Flush cache on context switch, or Include Process id as part of each cache directory entry Synonyms Virtual addresses that translate to same physical address More than one copy in cache can lead to a data consistency problem Lec18

Another possibility: Overlapped operation MMU Virtual Address Physical Address Tag comparison using physical address Indexing into cache directory using virtual address Cache Lec18 Virtual indexed physical tagged cache

Addresses and Caches `Physical Addressed Cache’ Physical Indexed Physical Tagged `Virtual Addressed Cache’ Virtual Indexed Virtual Tagged Overlapped cache indexing and translation Virtual Indexed Physical Tagged Physical Indexed Virtual Tagged (?) Lec18

Physical Indexed Physical Tagged Cache 16KB page size 64KB direct mapped cache with 32B block size Virtual Page No 18 bits Page Offset 14 bits Virtual Address MMU = Physical Cache 5 Lec18 Physical Page No 18 bits Cache Tag 16 bits C-Index 11 bits Page Offset 14 bits C offset Physical Address

Virtual Index Virtual Tagged Cache 5 VPN 18 bits C-Index 11 bits C offset Virtual Address = MMU Lec18 Hit/Miss PPN 18 bits Page Offset 14 bits Physical Address

Virtual Index Physical Tagged Cache 5 VPN 18 bits C-Index 11 bits C offset Virtual Address = MMU Lec18 Physical Address Cache Tag 16 bits P-Offset 14 bits

Multi-Level Caches Small L1 cache -- to give a low hit time, and hence faster CPU cycle time . Large L2 cache to reduce L1 cache miss penalty. L2 cache is typically set-associative to reduce L2-cache miss ratio! Typically, L1 cache is direct mapped, separate I and D cache orgn. L2 is unified and set-associative. L1 and L2 are on-chip; L3 is also getting in on-chip.

Multi-Level Caches CPU MMU Acc. Time 2 - 4 ns L2 Unified Cache I-Cache L1 D-Cache L2 Unified Cache Acc. Time 16-30 ns Acc. Time 100 ns Memory

Cache Performance One Level Cache AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1 Two-Level Caches Miss PenaltyL1 = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2 AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss RateL2 + Miss PenaltyL2)

Putting it Together: Alpha 21264 48-bit virtual addr. and 44-bit physical address. 64KB 2-way assoc. L1 I-Cache with 64byte blocks ( 512 sets) L1 I-Cache is virtually index and tagged (address translation reqd. only on a miss) 8-bit ASID for each process (to avoid cache flush on context switch)

Alpha 21264 8KB page size ( 13 bit page offset) 128 entry fully associative TLB 8MB Direct mapped unified L2 cache, 64B block size Critical word (16B) first Prefetch next 64B into instrn. prefetcher

21264 Data Cache L1 data cache uses virtual addr. index, but physical addr. tag Addr. translation along with cache access. 64KB 2-way assoc. L1 data cache with write back.

Q2: High Performance Pipelined Processors Pipelining Overlaps execution of consecutive instructions Performance of processor improves Current processors use more aggressive techniques for more performance Some exploit Instruction Level Parallelism - often, many consecutive instructions are independent of each other and can be executed in parallel (at the same time) Lec18

Instruction Level Parallelism Processors Challenge: identifying which instructions are independent Approach 1: build processor hardware to analyze and keep track of dependences Superscalar processors: Pentium 4, RS6000,… Approach 2: compiler does analysis and packs suitable instructions together for parallel execution by processor VLIW (very long instruction word) processors: Intel Itanium Lec18

ILP Processors (contd.) Pipelined IF WB MEM EX ID Superscalar IF WB MEM EX ID VLIW/EPIC IF WB MEM EX ID

Multicores Multiple cores in a single die Early efforts utilized multiple cores for multiple programs Throughput oriented rather than speedup-oriented! Can they be used by Parallel Programs?