1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University.

Slides:



Advertisements
Similar presentations
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Advertisements

D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Virtual Memory. The Limits of Physical Addressing CPU Memory A0-A31 D0-D31 “Physical addresses” of memory locations Data All programs share one address.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
1 Virtual Memory Management B.Ramamurthy. 2 Demand Paging Main memory LAS 0 LAS 1 LAS 2 (Physical Address Space -PAS) LAS - Logical Address.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
CS 153 Design of Operating Systems Spring 2015
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Memory Management 2010.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
Lecture 17: Virtual Memory, Large Caches
ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
1 Virtual Memory Management B.Ramamurthy Chapter 10.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Chapter 3 Memory Management: Virtual Memory
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
CS 153 Design of Operating Systems Spring 2015 Final Review.
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5B:Virtual Memory Adapted from Slides by Prof. Mary Jane Irwin, Penn State University Read Section 5.4,
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Virtual Memory Questions answered in this lecture: How to run process when not enough physical memory? When should a page be moved from disk to memory?
University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
Virtual Memory Chapter 7.4.
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
FileSystems.
Lecture: Large Caches, Virtual Memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
18742 Parallel Computer Architecture Caching in Multi-core Systems
Part V Memory System Design
Andy Wang Operating Systems COP 4610 / CGS 5765
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Page Replacement.
Andy Wang Operating Systems COP 4610 / CGS 5765
Virtual Memory فصل هشتم.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSC3050 – Computer Architecture
Virtual Memory: Working Sets
CSE 542: Operating Systems
Lecture 9: Caching and Demand-Paged Virtual Memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Virtual Memory 1 1.
Presentation transcript:

1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University of Toronto, Canada

2 Page Access Tracking Challenge  Storage Management Research Many sophisticated algorithms Most require accurate knowledge about memory access trace Adopted mostly for file systems or databases Not straightforward for virtual memory  Problem: Limited Page Access Tracking Hard to measure either Reuse Distance or Temporal Locality  Conventional Access Tracking Mechanisms Monitoring page faults  Most page accesses are missed. Scanning Page Table bits  High scanning overhead => low scanning frequency

3 Page Access Tracking Challenge (cont’d)  Access Tracking with Performance Counters Statistical Data Sampling:  Favours only hot pages  Hard to track reuse distance or temporal locality  Recording TLB misses - High overhead  TLB’s are small (TLB miss is very frequent)  TLB miss handling is performance-critical  Hardware Approach [Zhou et al. ASPLOS’04] + Effective for its purpose (but inflexible) - Impractical hardware resource requirements  ~ 1 MB of hardware buffer per 1GB of physical memory!  Software Approach [Yang et al. OSDI’06] Dividing pages into active and inactive sets Page-protecting members of the inactive set - Overhead can still be too high

4 Page Access Tracking in Software Performance of adaptive page replacement for FFT vs. Runtime overhead of page access tracking in software 10% overhead even with a large active set and poor performance 90% overhead to get acceptable performance

5 Page Access Tracking Hardware (PATH)  Advantages Extra hardware resources required are small (around 10KB) Off the common path Scalable (does not grow with physical memory) TLB CPU Core VADDR LOOKUP MISS Page Tables VADDR Page Access Buffer Overflow Interrupt MISS Page Access Log

6 Information Provided by PATH  Raw Form  Abstraction: Precise LRU Stack  Abstraction: Miss Rate Curve (MRC) TLB CPU Core VADDR LOOKUP MISS Page Tables VADDR Page Access Buffer Overflow Interrupt MISS Page Access Log

7 Basic Abstraction: LRU Stack  Accessed and updated for each entry on the Page Access Log  Implementation: Lookup:  Page Table-like Structure  O(1) lookup time Update  Doubly linked list  A few pointers are updated for each page access

8 Basic Abstraction: Miss Rate Curve (MRC)  Basic Info: The number of misses for a given memory size in period of time.  Basic Use: Estimating the “memory needs” of an application.

9 Computing MRC Online  Mattson’s Stack Algorithm  For LRU: Memory Sizes < LRU Distance: miss Memory Sizes >= LRU Distance: hit LRU Distance MRU Distance Page Access

10 Runtime Overhead Tradeoff  The larger the Page Access Buffer (active set) The more page accesses are filtered + The less run-time overhead - The less accurate page access trace TLB CPU Core VADDR LOOKUP MISS Page Tables VADDR Page Access Buffer Overflow Interrupt MISS Page Access Log

11 Runtime Overhead, Example: FFT Active Set Entries

12 Runtime Overhead, Example: LU-non. Active Set Entries

13 Runtime Overhead  Summary Overall, a 2K Entry Page Access Buffer seems to be the best point in the tradeoff between performance and runtime overhead. PATH’s overhead is less than 6% across a wide variety of applications. PATH’s overhead is negligible in most cases.

14 Case 1: Adaptive Page Replacement  Region-based Page Replacement Use different replacement policies for different regions in the virtual address space Rationale: each region is likely to contain a data structure with a fairly stable access pattern  Low Inter-Reference Set (LIRS) Handles sequential and looping patterns Requires tracking page accesses Originally developed for file system caching Easily enabled by the PATH-generated information

15 Region-based Replacement Using MRC for comparison:

16 Region-based Replacement (cont’d)  Dividing Memory among Regions Minimize total miss rate by giving memory to the regions that have more “benefit-per-page”.

17 Simulation Results LU-contiguous (SPLASH2)

18 Simulation Results BT (NAS Benchmark)

19 Case 2: Prefetching  Spatial Locality-based Prefetch pages spatially-adjacent to the faulted page. Advantages  Simple and easy to implement  Effective for many cases Major drawback  Oblivious to non-spatial access patterns  Temporal Locality-based Prefetch pages that are regularly accessed together. Use PATH to track temporal locality of pages.

20 Temporal Locality-based Prefetching  Page Proximity Graph (PPG) Each page is a node There exists an edge from p to q if q is regularly accessed shortly after p (temporal locality)  PPG Update: Add a page q to p’s proximity set if q appears in the LRU stack in close proximity to p repeatedly.  Basic prefetching scheme: Breadth-First traversal starting from the faulted page.

21 Prefetching LU non-contiguous (SPLASH2)

22 Conclusions  Page Access Tracking Hardware Small (10KBytes in size) Low-overhead Generic  Cases Studied Adaptive Page Replacement Process Memory Allocation (See Paper) Prefetching  Significant performance improvement can be achieved by tracking page accesses.

23 Future Directions  Other case studies NUMA page placement Super-page management  Per-thread page access tracking Augmenting page accesses with thread info  Multiprocessor issues Combining traces collected on multiple CPUs

24 Questions