Edelweiss: Automatic Storage Reclamation for Distributed Programming Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California,

Slides:



Advertisements
Similar presentations
RDFTL: An Event-Condition- Action Language for RDF George Papamarkos Alexandra Poulovassilis Peter T. Wood School of Computer Science and Information Systems.
Advertisements

Disorderly Distributed Programming with Bloom
Chapter 1 Overview of Databases and Transaction Processing.
W3C Workshop on Web Services Mark Nottingham
Polygon Triangulation
Implementing declarative overlays Boom Thau Loo Tyson Condie Joseph M. Hellerstein Petros Maniatis Timothy Roscoe Ion Stoica.
Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
BloomUnit Declarative testing for distributed programs Peter Alvaro UC Berkeley.
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
1.6 Behavioral Equivalence. 2 Two very important concepts in the study and analysis of programs –Equivalence between programs –Congruence between statements.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Temporal and Real-Time Databases: A Survey by Gultekin Ozsoyoglu and Richard T. Snodgrass Presentation by Didi Yao.
Fabian Kuhn, Microsoft Research, Silicon Valley
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Efficient Solutions to the Replicated Log and Dictionary Problems
CS 1114: Data Structures – memory allocation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
CS 1114: Data Structures – Implementation: part 1 Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
Maintenance Modifying the data –Add records –Delete records –Update records Modifying the design –Add fields into tables –Remove fields from a table –Change.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Database table design Single table vs. multiple tables Sen Zhang.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Research Trends in MANETs at CIIT, Islamabad Mohammad Mahboob Yasin, PhD COMSATS Institute of Information Technology.
CS 582 / CMPE 481 Distributed Systems
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Linked lists and memory allocation Prof. Noah Snavely CS1114
CS 584 Lecture 16 n Assignment -- Due Friday n C* program n Paper reviews.
Inventory Management System With Berkeley DB 1. What is Berkeley DB? Berkeley DB is an Open Source embedded database library that provides scalable, high-
The Design Of A Web Document Snapshots Delivery System David Chao College of Business San Francisco State University.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
Reliable, Robust Data Collection in Sensor Networks Murali Rangan Russell Sears Fall 2005 – Sensornet.
Heapsort Based off slides by: David Matuszek
Relativistic Red Black Trees. Relativistic Programming Concurrent reading and writing improves performance and scalability – concurrent readers may disagree.
Data Access Patterns Some of the problems with data access from OO programs: 1.Data source and OO program use different data modelling concepts 2.Decoupling.
Consistency And Replication
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
1 Adaptive QoS Framework for Wireless Sensor Networks Lucy He Honeywell Technology & Solutions Lab No. 430 Guo Li Bin Road, Pudong New Area, Shanghai,
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
1 The Design of a Robust Peer-to-Peer System Rodrigo Rodrigues, Barbara Liskov, Liuba Shrira Presented by Yi Chen Some slides are borrowed from the authors’
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Cloud Programming: From Doom and Gloom to BOOM and Bloom Peter Alvaro, Neil Conway Faculty Recs: Joseph M. Hellerstein, Rastislav Bodik Collaborators:
Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
Object Oriented Analysis & Design Game Patterns. Contents  What patterns are  Delegation  Game Loop  Scene Graph  Double Buffering  Component 
HNDIT23082 Lecture 06:Software Maintenance. Reasons for changes Errors in the existing system Changes in requirements Technological advances Legislation.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
CS533 – Spring Jeanie M. Schwenk Experiences and Processes and Monitors with Mesa What is Mesa? “Mesa is a strongly typed, block structured programming.
Concurrent Tries with Efficient Non-blocking Snapshots Aleksandar Prokopec Phil Bagwell Martin Odersky École Polytechnique Fédérale de Lausanne Nathan.
CS 240A: Databases and Knowledge Bases Analysis of Active Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
A Key Management Scheme for Distributed Sensor Networks Laurent Eschaenauer and Virgil D. Gligor.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
An Asymptotically Optimal Multiversion B-Tree P. Widmayer B. Becker S. Gschwind T. Ohler B. Seeger Presented by Stan Rost.
An overlay for latency gradated multicasting Anwitaman Datta SCE, NTU Singapore Ion Stoica, Mike Franklin EECS, UC Berkeley
“Distributed Algorithms” by Nancy A. Lynch SHARED MEMORY vs NETWORKS Presented By: Sumit Sukhramani Kent State University.
Skip Lists – Why? BSTs –Worse case insertion, search O(n) –Best case insertion, search O(log n) –Where your run fits in O(n) – O(log n) depends on the.
Consistency Analysis in Bloom: a CALM and Collected Approach Authors: Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak Presented by:
Reliable group communication
CRDTs and Coordination Avoidance (Lecture 8, cs262a)
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Lecture 06:Software Maintenance
Network Coding for Wireless Sensor Network Storage
Presentation transcript:

Edelweiss: Automatic Storage Reclamation for Distributed Programming Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California, Berkeley

Mutable shared state Frequent source of bugs Hard to scale

Event Logging Accumulate & exchange sets of immutable events  No mutation/deletion To delete: add new event  “Event X should be ignored” Current state: query over event log

Event Logging i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) Example: Key-Value Store Mutable State tbl = Hash.new Insert(k, v): tbl[k] = v Delete(k): tbl.delete(k) View(): tbl Update-in-place Deletion Set union Compute “live” keys

Benefits of Event Logging 1.Concurrency 2.Replication 3.Undo/redo 4.Point-in-time query, audit trails (Sometimes: performance!)

Example Applications Multi-version concurrency control (MVCC) Write-ahead logging (WAL) Stream processing Log-structured file systems Also: CRDTs, tombstones, purely functional data structures, accounting ledgers.

Observation: Logs consume unbounded storage Solution: Discard log entries that are “no longer useful” (garbage collection)

Observation: Logs consume unbounded storage Challenge: Discard log entries that are “no longer useful” (garbage collection)

Traditional Approach “No longer useful” defined by application semantics –No framework support –Every system requires custom GC logic –Reinvented many times >25 papers propose ~same scheme!

Engineering Challenges 1.Difficult to implement correctly –Too aggressive: destroy live data –Too conservative: storage leak 1.Ongoing maintenance burden –GC scheme and application code must be updated together

Our Approach 1.New language: Edelweiss –Based on Datalog –No constructs for deletion or mutation! 2.Automatically generate safe, application- specific distributed GC protocols 3.Present several in-depth case studies –Reliable unicast/broadcast, key-value store, causal consistency, atomic registers

Base Data (“Event Logs”) Derived Data ( “Live View”) Query

The queries define how log entries contribute to the view. Goal: Find log entries that will never contribute to the view in the future. A log entry is useful iff it might contribute to the view.

Semantics of Base Data Accumulate and broadcast to other nodes Datalog: monotonic –Set union: grows over time CALM Theorem [CIDR’11]: event log guaranteed to be eventually consistent

Semantics of Derived Data Grows and shrinks over time –e.g., KVS keys added and removed Hence, not monotonic

Common Pattern Live View = set difference between growing sets Key-Value StoreInsertions that haven’t been deleted Reliable BroadcastOutbound messages that haven’t been acknowledged Causal Consistency Writes that haven’t been replaced by a causally later write to the same key

Semantics of Set Difference X = Y – Z –Z grows: X shrinks –If t appears in Z, t will never again appear in X –“Anti-monotone with respect to Z” i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) Can reclaim from i_log upon match in d_log

Other Analysis Techniques Reclaim from negative notin input –Often called “tombstones” –E.g., how to reclaim from d_log in the KVS Reclaim from join input tables Disseminate GC metadata automatically Exploit user knowledge for better GC –Punctuations [Tucker & Maier ‘03]

Whole Program Analysis For each query q, find condition when input t will never contribute to q’s output –“Reclamation condition” (RC) For each tuple t, find the conjunction of the RCs for t over all queries –When all consumers no longer need t: safe to reclaim

Edelweiss Input Program Source To Source Rewriter Datalog Output Program Datalog Evaluator Datalog Evaluator “Positive” program: no deletion or state mutation Compute RCs, add deletion rules Input program + deletion rules

Comparison of Program Size Only 19 rules!

Takeaways No storage management code! –Similar to malloc / free vs. GC Programs are concise and declarative –Developer: just compute current view –Log entries removed automatically Reclamation logic  application code always in sync

Conclusions Event logging: powerful design pattern –Problem: need for hand-written distributed storage reclamation code Datalog: natural fit for event logging Storage reclamation as a compiler rewrite? Results: –Automatic, safe GC synthesis! –High-level, declarative programs No storage management code Focus on solving domain problem

Thank You!

Future Work: Checkpoints Closely related to simple event logging –Summarize many log entries with a single “checkpoint” record –View = last checkpoint + Query(¢Logs) General goal: reclaim space by structural transformation, not just discarding data

Future Work: Theory Current analysis is somewhat ad hoc If program does not reclaim storage, two possibilities: 1.Program is “not reclaimable” in principle (Possible program bug!) 2.Our analysis is not complete (Possible analysis bug!) How to characterize the class of “not reclaimable” programs?

Reclaiming KVS Deletions Good question X.notin(Y): how to reclaim from Y? 1.Y is a dense ordered set; compress it. 2.Prove that each Y tuple matches exactly one X tuple i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) k is a key of i_log