G22.3250-001 Robert Grimm New York University Recoverable Virtual Memory.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

OS Components and Structure
1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Wtm - 1April 8, 2005© M. Satyanarayanan Experience with Recoverable Virtual Memory M. Satyanarayanan School of Computer Science Carnegie Mellon University.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by Phil Howard.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Chapter 20: Recovery. 421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Chapter 11: File System Implementation
G Robert Grimm New York University Disconnected Operation in the Coda File System.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
File System Implementation
Crash recovery All-or-nothing atomicity & logging.
Virtual Memory Art Munson CS614 Presentation February 10, 2004.
G Robert Grimm New York University Disco.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
W4118 Operating Systems OS Overview Junfeng Yang.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
G Robert Grimm New York University Recoverable Virtual Memory.
G Robert Grimm New York University Scale and Performance in Distributed File Systems: AFS and SpriteFS.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Crash recovery All-or-nothing atomicity & logging.
The Google File System.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
1 The Google File System Reporter: You-Wei Zhang.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
CS533 - Concepts of Operating Systems Virtual Memory Primitives for User Programs Presentation by David Florey.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
1 How can several users access and update the information at the same time? Real world results Model Database system Physical database Database management.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
A summary by Nick Rayner for PSU CS533, Spring 2006
Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Processes and Virtual Memory
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
CS422 Principles of Database Systems Failure Recovery Chengyu Sun California State University, Los Angeles.
Free Transactions with Rio Vista Landon Cox April 15, 2016.
CS422 Principles of Database Systems Failure Recovery
Free Transactions with Rio Vista
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Transactions and Reliability
Distributed File Systems
Chapter Overview Understanding the Database Architecture
Introduction to Operating Systems
O.S Lecture 13 Virtual Memory.
Free Transactions with Rio Vista
Outline Introduction Background Distributed DBMS Architecture
OS Components and Structure
Transactions in Distributed Systems
File System Implementation
Presentation transcript:

G Robert Grimm New York University Recoverable Virtual Memory

Altogether Now: The Three Questions  What is the problem?  What is new or different?  What are the contributions and limitations?

The Goal  Simplify the implementation of fault-tolerant distributed applications  Protect updates to persistent data structures  Use transactions  Well-understood, general mechanism  What are transactions? And, what is acid?

Lessons from Camelot  General transaction facility for Mach  Nested and distributed transactions  Recoverable virtual memory (RVM)  Implemented through Mach’s external pager interface  Coda built on RVM  File system meta-data  Persistent data for replica control  Internal housekeeping data  But not file system data

Lessons from Camelot (cont.)  Three issues  Decreased scalability compared to AFS  High CPU utilization, ~19% due to Camelot [Satya, 4/90]  Paging and context switching overheads  Additional programming constraints  All clients must inherit from Disk Manager task  Debugging is more difficult  Clients must use kernel threads  Increased complexity  Camelot exploits Mach interface, triggers new bugs  Hard to determine source: Camelot or Mach  So, what are the lessons here?

RVM Design Methodology  The designer taketh away  No nested or distributed transactions  No concurrency control  No support for recovering from media failures  Instead a simple layered approach

RVM Design Methodology (cont.)  The designer likes portability  Only use widely-available VM primitives  No external pager, no pin/unpin calls  Rely on regular file as backing store (external data segment)  Slower startup (no demand paging), duplicate paging (VM, RVM)  Trades off complexity versus resource usage  The designer likes libraries  No curmudgeonly collection of tasks  Rather a simple library  Applications and RVM need to trust each other  Each application has its own log, not one per system with own disk  Log-structured file systems and RAID might help

RVM Architecture  RVM backed by (external data) segments  RVM mapped in regions (of segments)  Mapped only once per process, must not overlap  Multiples of page size, page-aligned  RVM segment loader tracks mappings  RVM allocator manages persistent heap

RVM Implementation  No-undo/redo log  Old values buffered in virtual memory  Bounds and contents of old-value records determined by set-range operation  No-restore and no-flush more efficient (why?)

Crash Recovery and Log Truncation  Recovery: Traverse log, build in-memory representation, apply modifications, update log  Truncation: Apply recovery on old epoch of log  Advantage: Same code for recovery and truncation  Disadvantages  Increases log traffic  Degrades forward processing  Results in bursty system performance  Alternative: Incremental truncation  Not quite implemented

RVM Optimizations  Intra-transaction  Ignore duplicate set-range calls  Coalesce overlapping or adjacent memory ranges  Inter-transaction  For no-flush transactions, discard older log records before flushing them

Evaluation: Issues and Experimental Setup  Complexity: Source code size  10K RVM + 10K utilities vs. 60K Camelot + 10K utilities  Performance: Modified TPC-A benchmark  Simulates hypothetical bank  Accesses in-memory data structures  Three account access patterns  Best case: Sequential  Worst case: Random  Between the extremes: Localized  70% of transactions update 5% of the pages  25% of transactions update 15% of the pages  5% of transactions update remaining 80% of the pages

Evaluation: Throughput  What metric does the x-axis represent?  Why does Camelot perform considerably worse?

Evaluation: CPU Utilization  How did they arrive at these numbers?  I.e., did they repeat the experiments from the introduction?  Why does Camelot require so much more CPU?

Evaluation: Effectiveness of Optimizations  On Coda servers (transactions are flushed)  ~21% log size savings  On Coda clients (transactions are no-flush)  48% - 82% log size savings  More important b/c clients have fewer resources, run on battery

What Do You Think? Oh, wait…

What about RVM’s interface?

Remember Appel and Li? Persistent Stores!  Basic idea: Persistent object heap  Modifications can be committed or aborted  Advantage over traditional databases: object accesses are (almost) as fast as regular memory accesses  Implementation strategy  Database is a memory mapped file  Uncommitted writes are temporary  Requirements  Trap, Unprot, file mapping with copy-on-write  Can be simulated through ProtN, Unprot, Map2

For Real: What Do You Think?