Data Versioning Systems Research Proficiency Exam Ningning Zhu Advisor Tzi-cker Chiueh Computer Science Department State University Of New York at Stony.

Slides:



Advertisements
Similar presentations
Improving Transaction-Time DBMS Performance and Functionality David Lomet Microsoft Research Feifei Li Florida State University.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
11-May-15CSE 542: Operating Systems1 File system trace papers The Zebra striped network file system. Hartman, J. H. and Ousterhout, J. K. SOSP '93. (ACM.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Chapter 11: File System Implementation
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
File System Implementation
File Management Systems
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Efficient Storage and Retrieval of Data
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
National Manager Database Services
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Network File Systems II Frangipani: A Scalable Distributed File System A Low-bandwidth Network File System.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Module Title? DBMS Introduction to Database Management System.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
Some key-value stores using log-structure Zhichao Liang LevelDB Riak.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
The Design of POSTGRES Storage System Author: M. Stonebraker Speaker: Abhishek Shrivastava.
Log-structured File System Sriram Govindan
26-Oct-15CSE 542: Operating Systems1 File system trace papers The Design and Implementation of a Log- Structured File System. M. Rosenblum, and J.K. Ousterhout.
Resolving Journaling of Journal Anomaly in Android I/O: Multi-Version B-tree with Lazy Split Wook-Hee Kim 1, Beomseok Nam 1, Dongil Park 2, Youjip Won.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Serverless Network File Systems Overview by Joseph Thompson.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
SYS364 Database Design Continued. Database Design Definitions Initial ERD’s Normalization of data Final ERD’s Database Management Database Models File.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
© 2006 EMC Corporation. All rights reserved. The Host Environment Module 2.1.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
CS 540 Database Management Systems
Review CS File Systems - Partitions What is a hard disk partition?
ZFS The Future Of File Systems
CS 540 Database Management Systems
Managing Multi-User Databases
Module 11: File Structure
File-System Implementation
Chapter 11: File System Implementation
Physical Database Design
Physical Database Design and Performance
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Storage Virtualization
Introduction of Week 6 Assignment Discussion
Real IBM C exam questions and answers
Chapter 12: File System Implementation
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Module 11: Data Storage Structure
Overview Continuation from Monday (File system implementation)
Introduction to Operating Systems
Overview: File system implementation (cont)
File System Implementation
IBM Tivoli Storage Manager
Improving performance
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Data Versioning Systems Research Proficiency Exam Ningning Zhu Advisor Tzi-cker Chiueh Computer Science Department State University Of New York at Stony Brook Feb 10, 2003

Definitions Data Object Granularity of Data Object file, tuple, database table, database logical volume, database, block device Version of a Data Object A consistent state, a snapshot, a point-in-time image Data Repository Version Repository

Why need data versioning? Documentation Versioning Control Human mistakes Malicious attacks Software failure History Study

Data Versioning Vs. Other Techniques Backup Mirroring Replication Redundancy Perpetual storage

Design Issues Resource Consumption Storage capacity, CPU Storage bandwidth, network bandwidth Performance old versions, current object Throughput, latency Maintenance Effort

Design Options Who perform ? User, Application, file system, database system, object store, virtual disks, block-device Where and what to save? Separate version repository? Full image vs. delta How? Frequency Scope

Data Versioning Techniques Save Represent Extract

Save: naive approach (1)

Save: Split Mirror (2)

Save: copy-old-while-update-new (3)

Save: keep-old-and-create-new (4)

Represent (1) Full image Easy to extract, consume more resource Delta Reference direction reference object Differencing algorithm Chain of delta and full image

Represent: Chain structure (2) Forward delta V1, D(1,2), D(2,3), V4, (D4,5), D(5,6), V7 Forward delta with version jumping V1, D(1,2), D(1,3), V4, (D4,5), D(4,6), V7 Reverse delta V1, D(3,2), D(4,3), V4, D(6,5), D(7,6), V7

Represent: differencing algorithm (3) Insert/Delete (diff) vs. Insert/Copy (bdiff) Rabin fingerprint Given a sequence of bytes: SHA-1: Collision free hashing function

XDFS Drawback of traditional version control Slow extraction, fragmentation, lack of atomicity support XDFS A user-level file system with versioning support Separate version labeling with delta compression Effective delta chain Built upon Berkeley DB

Log Structured File System-SpriteLFS Access assumption: small write Data Structure Inode Inode map Indirect block Segment summary Segment usage table Superblock(fixed disk location) Checkpoint region(fixed disk location) Directory change log

Research Data Versioning System File System Elephant Comprehensive Versioning File System Object-store Self-Secure-Storage-System Oceanstore Database System Postgres and Fastrek Storage System Petal and Frangipani

Elephant File System (1) Retention Policy Keep one Keep all Keep safe Keep landmark (intelligently add landmark)

Elephant File System (2) Metadata organization

S4: Self-Secure Storage System (1) Object-store interface Log everything Audit log Efficient metadata logging

S4: Metadata Inefficiency (2)

CVFS: Comprehensive Versioning (1) Journal based logging vs. Multi-version B-tree

CVFS: Comprehensive Versioning (2) Journal-based vs. Multi-version B-tree Assumptions about metadata access Optimizations: Cleaner: pointers in version repository Both forward delta and reverse delta Checkpointing and clustering Bounded old version access by forcing checkpoint

Oceanstore: decentralized storage A global-scale persistent storage A deep archival system Data Entity is identified by Internal data structure is similar to S4. Use B+ tree for object block indexing

Postgres:a multi-version database(1) Versioning support “Save” of a version in the database context Optimized towards “extract” Database Structure and Operation Tables made up of tuples First and secondary indices Transaction log: Update  Delete + Insert

Postgres: record structure (2) Extra fields for versioning: OID: record ID, shared by versions of this record Xmin: TID of the inserting transaction Tmin: Commit time of Xmin Xmax: TID of the deleting transaction Tmax: Commit time of Xmax PTR: forward pointer from old  new

Postgres: Save (3)

Postgres: Represent & Extract (4) Full image + forward delta SQL query with TIME parameter Build indices using R-tree for ops: Contained in, overlap with Secondary indices When a delta record is inserted, if secondary indices need to be changed, an full image need to be constructed

Postgres: Frequency of extraction (5) No archive Timestamp never filled in Light archive Extract time from TIME meta table Heavy archive First use, extract time from TIME metadata, then fill the field Later use, directly from data record

Postgres: Hardware Assumption (6) Another level of archival storage WORM (optical disks) Optimizations: Indexing Accessing method Query plan Combine indexing at magnetic disks and archival storage

Fastrek: application of versioning Built on top of Postgres Tracking read operation Tracking write operation Tmin, Tmax Data dependency analysis Fast and intelligent repair

Petal and Frangipani Petal: a distributed storage supports virtual disk snapshot -> Frangipani: A distributed file system built on top of Petal Versioning by creating virtual disks snapshot Coarse granularity: mainly for back purpose

Commercial Data Versioning Systems Network Appliance IBM EMC

Network Appliance: WAFL Network Appliance Customized for NFS and RAID Automatic checkpointing Utilize NVRAM: fast recovery Good performance: update batching, least blocking upon versioning Easy extraction:.snapshot directory

WAFL: system layout

WAFL:Limited Versioning

Network Appliance: SnapMirror Built upon WAFL Synchronous Mirroring Semi-synchronous Mirroring Asynchronous Mirroring 15 minutes interval, save 50% of update SnapMirror: Get block information from blockmap Schedule mirroring at block-device level

IBM (Flash Copy ESS) A block-device mirroring system Copy-old-while-update-new Use ESS cache and fast write to mask write latency Use bitmap to keep track each block of old version and new version

EMC (TimeFinder) Split mirror Implementation

Proposal: Non-point-in-time versioning What is the most valuable state? Operation-based journaling Natural metadata journaling efficiency Design Transparent mirroring and versioning Primary site non-journaling, mirror site journaling against intrusion, mistake Applied to network file server

Repairable File Service: architecture

Represent: operation-based Delta: NFS packets Journal: Reverse delta chain No checkpointing overhead A chain of 2 months will cost <$100 Efficiency metadata journaling bytes for inode, directory update One hash table entry for indirect block update

Save: a hybrid approach Data block update Copy-old-create-new Metadata update: Naïve: Read old, write old, update new Variation of Naïve: Guess old,write old, update-new Variation of Naïve: Get old, write old, update-new

User Level Journaling File System

System Layout

Extract: intelligent and fast repair Dependency logging Dependency analysis Fast Repair Fast extract of most valuable state of a data system Drawback: Poor performance for other extract specification

Conclusion (1) Hardware technology -> DV possible Capacity Random access storage CPU time Penalty of data loss -> DV a necessity Data loss System down time DV technology: Journaling, B+, differencing algorithm

Conclusion (2) DV at application level DV at file system/database level DV at storage system/block device level A combined and flexible solution to satisfy all DV requirement at low cost.

Future Trend (1) Comprehensive versioning Perpetual versioning High performance versioning Comparable to non-versioning system Intrusion oriented versioning Testing new untrusted application Reduce system maintenance cost Semantic extraction

Future Trend (2) In decentralized storage system, integrate and separate DV with Replication Redundancy Mirroring Encryption Avoid similar functionality being implemented at by multiple modules