Lightweight recoverable virtual memory ACM SIGOPS, 1993. M. Satyanarayanan, Henry H.Mashburn, Puneet Kumar, David C Streere, James J.Kistler JongMyoung Kim, HyoSeok Lee
+ Lightweight Recoverable Virtual Memory VM subsystem Paging Problem fault-tolerant Application
ACID properties for transactions Transaction is a sequence of operations Atomicity - All operations succeed or all fail Consistency - Data in legal state before and after transaction Isolation - State changes are not visible until transaction commits Durability - If transaction succeeds, it will persist
Recoverable Virtual Memory software library to provide transactional properties for VM in use since early 1990’s Provide flexibility to application in how they use transactions RVM Operations
Recoverable Virtual Memory Minimalism Recoverable Virtual Memory Design challenge Functionality Performance System Level Software Engineering Level Usability Maintenance Lightweight - Ease of learning and use - Minimal impact upon system resource usage
Camelot Predecessor of RVM Support local & distributed nested transactions Flexibility in logging, synchronization, commitment External Page Management - Dirty recoverable addresses are not paged out until commit IPC of Mach
Lessons form Camelot Decreased scalability when compared to AFS - High CPU utilization, ~19% due to Camelot - Paging and context switching overheads Additional programming constraints - All clients must be descendants of Disk Manager task - Debugging is more difficult + Starting a Coda server under a debugger is complex - Clients must use kernel threads + Kernel thread context switches more expensive Increased complexity - Code size, complexity and tight dependence on Mach - Make maintenance and porting difficult - Hard to decide whether a problem lay in Camelot or Mach
LRVM Design Elimination Simplification - Simple layered approach - No nested & distributed transactions - No Concurrency control + serializability + Deadlocks, Starvation … - No control for media failure Elimination
LRVM Design Rationale Portability - Operating System dependence + Small, widely supported, Unix subset of Mach system call + No external page management - External data segment + Small fraction of disk space, files … + RVM’s backing store for recoverable region + Independent of the region’s VM swap space Just Library - Applications and RVM need to trust each other - Each application has its own log, not one per system
Segments and Regions External data segment The contents of regions are copied from external segment to VM during mapping
RVM Primitives Log file Flush() : for Committed No-flush transactions Truncate() : for all Committed transactions
Log Management Format of typical Log record Crash Recovery - traverse log (from tail to head) - build in-memory representation - apply modifications, update disk
Log Management Log Truncation - When truncation is completed - The area marked “Truncation Epoch” will be freed for new one To minimize implementation effort - reuse crash recovery code for truncation
Log Management Incremental Truncation - Uncommitted transactions cannot be written to the segment - uncommitted reference count is incremented as set_ranges() - uncommitted reference count is decremented when committed
Optimization Intra-transaction - Ignore duplicate set-range calls - Coalesce overlapping or adjacent memory ranges Inter-transaction - For no-flush transactions, discard old log records before a log flush
Evaluation : environment TPC-A benchmark - Hypothetical bank - with one or more branches, - multiple tellers per branch, - many customer accounts per branch localized - 70% of the transaction update accounts on 5% of the pages - 25% of the transactions update accounts on a different 15% pages - 5% of the transactions update accounts on remaining 80%
Evaluation : Transactional Throughput Paging
Evaluation : CPU cost per transaction Scalability Evaluation : CPU cost per transaction paging Less frequent Log truncation
Evaluation : Effectiveness of Optimization
Review In 1990s. - Network is slow and not stable - Terminal becomes “powerful” client + 33MHz CPU, 16MB RAM, 100MB hard drive - Mobile Users appeared + 1st IBM Thinkpad in 1992 - We can do work at client without network Is this good paper in current view? - Eliminated things cannot be serviced - Evaluation is just benchmark, not real-world solution - Comparing with Camelot Not RVM Reliability
Heekwon Park,Seungjae Ba다, Joongmoo Choi, Donghee Lee, Sam H Noh Regularities Considered Harmful: Forcing Randomness to Memory Accesses to Reduce Row Buffer Conflicts for Multi-Core, Multi-Bank Systems ACM SIGARCH, 2013. Heekwon Park,Seungjae Ba다, Joongmoo Choi, Donghee Lee, Sam H Noh JongMyoung Kim, HyoSeok Lee
Computing Environment Multi-core Architecture Multi-bank Architecture Main memory is shared shared multiple cores Memory organization - Channel - Ranks - Banks - Rows
Memory Structure Channel DIMM DIMM Components of Main memory Channel Rank Bank Row Channel DIMM DIMM Rank Rank Chip Chip Chip Chip Row Buffer Bank Bank Row 1 Row 2 Rank Rank Row 3
Accessing Memory Conflict !! Delay !! MMU Row 1 Row 2 Row 3 Row 1 Virtual Address(VA) MMU Physical Address(PA) Memory Controller Row Buffer Row 1 Row 2 Row 3 Row Buffer Conflict !! Delay !! Row 1 Row 2 Row 3
Access Experiment There are two variable CL1,CL2 CL1 is Fixed address The address of CL2 is increased A process keeps accessing CL1,CL2 in each iteration If both CL1 and CL2 are exist in the same bank, Then conflict!!
The Result of Access Experiment
How about memory partitioning ? Each core has its own memory bank It can remove conflicts which happen owing to Multi-core Env. Problems reduced Memory parallelism Large consecutive memory space Scalability
How about Randomness ? To avoid buffer conflict, The allocation of page frame is important More Conflict in Multi-core environment
Memory Container Each core has its own Memory space called “Memory container” The size of Memory container is the minimum page number which cover all banks - ex) 12MB
Goal Memory Allocate as sparsely as possible to avoid conflict Each Core allocates page in its memory container
Criticism #1 – Buffer conflict << swap Memory Swap affect performance more than conflict They Use 32GB DDR3 memory Memory Swap might not happen because of sufficient Memory If Memory Swap happened, improvement of elapsed time is minor
Criticism #2 – Memory container Memory container is NOT able to reduce Multi-core buffer conflict Managing Memory container causes another overhead! Memory Partitioning is more practical method in Multi-core Env Each process has its own Memory partition For shared variables
QnA JongMyoung Kim, HyoSeok Lee