Distributed Systems CS

Slides:

Advertisements

Similar presentations

Song Jiang1 and Xiaodong Zhang1,2 1College of William and Mary

Advertisements

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE

Outperforming LRU with an Adaptive Replacement Cache Algorithm Nimrod megiddo Dharmendra S. Modha IBM Almaden Research Center.

1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.

Cache Memory By JIA HUANG. "Computer Science has only three ideas: cache, hash, trash.“ - Greg Ganger, CMU.

ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)

Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.

Virtual Memory.

Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.

Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.

Page Replacement in Real Systems Questions answered in this lecture: How can the LRU page be approximated efficiently? How can users discover the page.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD

Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.

COT 4600 Operating Systems Spring 2011 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 5:00-6:00 PM.

Virtual Memory.

CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.

CS161 – Design and Architecture of Computer

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Memory Management (2).

CE 454 Computer Architecture

CS161 – Design and Architecture of Computer

Cache Performance Samira Khan March 28, 2017.

Lecture 16: Data Storage Wednesday, November 6, 2006.

Computer Architecture

Chapter 21 Virtual Memoey: Policies

Multilevel Memories (Improving performance using alittle “cash”)

How will execution time grow with SIZE?

Demand Paging Reference Reference on UNIX memory management

Database Management Systems (CS 564)

18742 Parallel Computer Architecture Caching in Multi-core Systems

Lecture 11: DMBS Internals

Lecture 10: Buffer Manager and File Organization

Demand Paging Reference Reference on UNIX memory management

Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.

Lecture 28: Virtual Memory-Address Translation

Database Applications (15-415) DBMS Internals: Part II Lecture 13, February 25, 2018 Mohammad Hammoud.

Lecture 9: Data Storage and IO Models

Distributed Systems CS

EECE.4810/EECE.5730 Operating Systems

Chapter 9: Virtual-Memory Management

Andy Wang Operating Systems COP 4610 / CGS 5765

Distributed Systems CS

Page Replacement.

Distributed Systems CS

Database Applications (15-415) DBMS Internals: Part III Lecture 14, February 27, 2018 Mohammad Hammoud.

What Happens if There is no Free Frame?

CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing， Linear Hashing Instructor: Chen Li.

Midterm Review – Part I ( Disk, Buffer and Index )

Andy Wang Operating Systems COP 4610 / CGS 5765

Operating Systems.

CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing， Linear Hashing Instructor: Chen Li.

Lecture 14: Large Cache Design II

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Database Systems (資料庫系統)

Implementation of Relational Operations

Distributed Systems CS

Distributed Systems CS

Operating Systems CMPSC 473

Virtual Memory: Working Sets

Chapter 11 Instructor: Xin Zhang

Principle of Locality: Memory Hierarchies

Chapter 9: Virtual Memory CSS503 Systems Programming

Sarah Diesburg Operating Systems CS 3430

Andy Wang Operating Systems COP 4610 / CGS 5765

Sarah Diesburg Operating Systems COP 4610

Virtual Memory 1 1.

Presentation transcript:

Distributed Systems CS 15-440 Caching – Part IV Lecture 17, November 15, 2017 Mohammad Hammoud

Today… Last Lecture: Today’s Lecture: Announcements: Cache Consistency Replacement Policies Announcements: Project 4 is out. It is due on November 27 The deadline for PS5 is extended to November 18 by midnight Quiz II is on November 16 during the recitation time

Key Questions What data should be cached and when? Fetch Policy How can updates be made visible everywhere? Consistency or Update Propagation Policy What data should be evicted to free up space? Cache Replacement Policy

Key Questions What data should be cached and when? Fetch Policy How can updates be made visible everywhere? Consistency or Update Propagation Policy What data should be evicted to free up space? Cache Replacement Policy

Key Questions What data should be cached and when? Fetch Policy How can updates be made visible everywhere? Consistency or Update Propagation Policy What data should be evicted to free up space? Cache Replacement Policy

Working Sets Given a time interval T, WorkingSet(T) is defined as the set of distinct data objects accessed during T It is a function of the width of T Its size (or what is referred to as the working set size) is all what matters It captures the adequacy of the cache size with respect to the program behavior What happens if a client process performs repetitive accesses to some data, with a working set size that is larger than the underlying cache?

The LRU Policy: Sequential Flooding To answer this question, assume: Three pages, A, B, and C as fixed-size caching units An access pattern: A, B, C, A, B, C, etc. A cache pool that consists of only two frames (i.e., equal-sized page containers) Access A: Page Fault Access B: Page Fault Access C: Page Fault Access A: Page Fault Access B: Page Fault Access C: Page Fault Access A: Page Fault A B A C B A C B A C B A C . . . Although the access pattern exhibits temporal locality, no locality was exploited! This phenomenon is known as “sequential flooding” For this access pattern, MRU works better!

Types of Accesses Why LRU did not perform well with this access pattern, although it is “repeatable”? The cache size was dwarfed by the working set size As the time interval T is increased, how would the working set size change, assuming: Sequential accesses (e.g., unrepeatable full scans) It will monotonically increase The working set will render very cache unfriendly Regular accesses, which demonstrate typical good locality It will non-monotonically increase (e.g., increase and decrease then increase and decrease, but not necessarily at equal widths across program phases) The working set will be cache friendly only if the cache size does not get dwarfed by its size Random accesses, which demonstrate no or very little locality (e.g., accesses to a hash table) The working set will exhibit cache unfriendliness if its size is much larger than the cache size

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 Frame 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: # of Hits: # of Misses: # of Hits: # of Misses:

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 7 7 7 Frame 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: 1 # of Hits: # of Misses: 1 # of Hits: # of Misses: 1

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 0 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 7 7 7 Frame 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: 2 # of Hits: # of Misses: 2 # of Hits: # of Misses: 2

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 1 0 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 7 7 7 Frame 1 1 1 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: 3 # of Hits: # of Misses: 3 # of Hits: # of Misses: 3

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 2 1 0 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 2 7 7 Frame 1 1 1 1 Frame 2 2 2 Frame 3 Frame 4 # of Hits: # of Misses: 4 # of Hits: # of Misses: 4 # of Hits: # of Misses: 4

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 0 2 1 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 2 7 7 Frame 1 1 1 1 Frame 2 2 2 Frame 3 Frame 4 # of Hits: 1 # of Misses: 4 # of Hits: 1 # of Misses: 4 # of Hits: 1 # of Misses: 4

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 3 0 2 1 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 2 3 7 Frame 1 3 1 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 1 # of Misses: 5 # of Hits: 1 # of Misses: 5 # of Hits: 1 # of Misses: 5

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 0 3 2 1 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 2 3 7 Frame 1 3 1 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 2 # of Misses: 5 # of Hits: 2 # of Misses: 5 # of Hits: 2 # of Misses: 5

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 4 0 3 2 1 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 4 3 4 Frame 1 3 4 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 2 # of Misses: 6 # of Hits: 2 # of Misses: 6 # of Hits: 2 # of Misses: 6

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 2 4 0 3 1 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 4 3 4 Frame 1 2 4 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 2 # of Misses: 7 # of Hits: 3 # of Misses: 6 # of Hits: 3 # of Misses: 6

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 3 2 4 0 1 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 4 3 4 Frame 1 3 2 4 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 2 # of Misses: 8 # of Hits: 4 # of Misses: 6 # of Hits: 4 # of Misses: 6

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace LRU Chain: 0 3 2 4 1 7 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 3 4 Frame 1 3 2 4 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 2 # of Misses: 9 # of Hits: 5 # of Misses: 6 # of Hits: 5 # of Misses: 6

Observation: The Stack Property Adding cache space never hurts, but it may or may not help This is referred to as the “Stack Property” LRU has the stack property, but not all replacement policies have it E.g., FIFO does not have it

Competing Workloads What happens if multiple workloads run in parallel, sharing the same cache? Thrashing (or interference) will arise, potentially polluting the cache, especially if one workload is an only-one-time scan How can we isolate the effects of interferences? Apply static (or fixed) partitioning, wherein the cache is sliced into multiple fixed partitions This requires a-priori knowledge of the workloads With a full knowledge in advance, OPT can be applied! Apply dynamic partitioning, wherein the cache is adaptively resized based on workloads’ evolving access patterns This requires monitoring and tracking the characteristics of workloads

Adaptive Replacement Cache As an example of a cache that applies dynamic partitioning, we will study: Adaptive Replacement Cache (ARC)

ARC Structure ARC splits the cache into two LRU lists: L1 = Top Part (T1) + Bottom Part (B1) L2 = Top Part (T2) + Bottom Part (B2) T1 B1 L1: T2 B2 L2:

ARC Structure Content: T1 and T2 contain cached objects and history B1 and B2 contain only history (e.g., keys for the cached objects) T1: Data + Metadata B1: Metadata L1: T2: Data + Metadata B2: Metadata L2:

ARC Structure Content: T1 and T2 contain cached objects and history B1 and B2 contain only history (e.g., keys for the cached objects) Together, they remember exactly twice the number of pages that fit in the cache! This can greatly help in discovering patterns of pages that were evicted! T1: Data + Metadata B1: Metadata L1: T2: Data + Metadata B2: Metadata L2:

ARC Structure Sizes: Size (T1 + T2) = c pages They remember c Size (T1) = p pages Size (T2) = c – p pages They remember c recently evicted pages Size (T1) = p B1 L1: Size (T2) = c - p B2 L2:

ARC Policy Rules: Key Idea: L1 hosts pages that have been seen only once L1 captures recency L2 hosts pages that have been seen at least twice L2 captures frequency Key Idea: Adaptively (in response to observed workload characteristics) decide how many pages to keep at L1 versus L2 When recency interferes with frequency, ARC detects that and acts in a way that preserves temporal locality at L2

ARC Policy: Details For a requested page Q, one of four cases will happen: Case I: a hit in T1 or T2 If the hit is at T1, evict the LRU page in T2 and keep a record of it in B2 Move Q to the MRU position at T2 Case II: a miss in T1 U T2, but a hit in B1 Remove Q’s record at B1 and increase T1’s size via increasing p This will automatically decrease T2 since Size(T2) = (c – p) Evict the LRU page in T2 and keep a record of it in B2 Fetch Q and place it at the MRU position in T2

ARC Policy: Details For a requested page Q, one of four cases will happen: Case III: a miss in T1 U T2, but a hit in B2 Remove Q’s record at B2 and increase T2’s size via decreasing p This will automatically decrease T1 since Size(T1) = p Evict the LRU page in T2 and keep a record of it in B2 Fetch Q and place it at the MRU position in T2 Case IV: a miss in T1 U B1 U T2 U B2 Evict the LRU page in T1 and keep a record of it in B1 Fetch Q and place it at the MRU position in T1

Scan-Resistance of ARC Observe that a new page is always placed at the MRU position in T1 From there, it gradually makes its way to the LRU position in T1 Unless it is used once again before eviction But this will not happen with one-time-only scans Hence, T2 will not be impacted by the scan!

Scan-Resistance of ARC This makes ARC scan-resistance, whereby T2 will Be effectively isolated Grow at the expense of T1 since more hits will occur at B2 (which causes an increase in T2’s size) Effectively handle temporal locality, even with mixed workloads (i.e., workloads with and without locality running concurrently)

Next Class Server-Side Replication