File System for Mobile Computing Quick overview of AFS identify the issues for file system design in wireless and mobile environments Design Options for.

Slides:



Advertisements
Similar presentations
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Advertisements

CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Overview of Mobile Computing (3): File System. File System for Mobile Computing Issues for file system design in wireless and mobile environments Design.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
Distributed Systems 2006 Styles of Client/Server Computing.
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
Computer Science Lecture 21, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
HA LVS Coda - Devjani Sinha. High Availability-HA n critical commercial applications move on the Internet n An elegant highly available system may have.
Disconnected Operation In The Coda File System James J Kistler & M Satyanarayanan Carnegie Mellon University Presented By Prashanth L Anmol N M Yulong.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Client-Server Computing in Mobile Environments
Distributed File System: Design Comparisons II Pei Cao.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Distributed Deadlocks and Transaction Recovery.
Distributed Systems Principles and Paradigms Chapter 10 Distributed File Systems 01 Introduction 02 Communication 03 Processes 04 Naming 05 Synchronization.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
DBSQL 7-1 Copyright © Genetic Computer School 2009 Chapter 7 Transaction Management, Database Security and Recovery.
Distributed File Systems
Distributed File Systems Case Studies: Sprite Coda.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Chapter 20 Distributed File Systems Copyright © 2008.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
11/7/2012ISC329 Isabelle Bichindaritz1 Transaction Management & Concurrency Control.
DISCONNECTED OPERATION IN THE CODA FILE SYSTEM J. J. Kistler M. Sataynarayanan Carnegie-Mellon University.
Chapter 15 Recovery. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.15-2 Topics in this Chapter Transactions Transaction Recovery System.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,
Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis.
Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
Distributed File Systems
Information/File Access and Sharing Coda: A Case Study J. Kistler, M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transaction on.
Caching Consistency and Concurrency Control Contact: Dingshan He
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
An Architecture for Mobile Databases By Vishal Desai.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last class: Distributed File Systems Issues in distributed file systems Sun’s Network File System.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
An Introduction to GPFS
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Storage Systems CSE 598d, Spring 2007 Lecture 13: File Systems March 8, 2007.
Mobile File Systems.
Nomadic File Systems Uri Moszkowicz 05/02/02.
Distributed File Systems
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Today: Coda, xFS Case Study: Coda File System
Chapter 10 Transaction Management and Concurrency Control
Distributed File Systems
Distributed File Systems
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Distributed File Systems
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
System-Level Support CIS 640.
Presentation transcript:

File System for Mobile Computing Quick overview of AFS identify the issues for file system design in wireless and mobile environments Design Options for mobile file system Coda File System

Sharing Semantics Unix Semantics (Read after write): every operation on a file is instantly visible to all processes. Session semantics: a process that intends to write does so to its own copy. After the process closes the file, the changes are then made visible to other processes. Close-to-open cache consistency by NFS: clients flush all changes back to the server on each file close, and check for file changes on the server on each file open.

Commonly-used mechanisms and techniques in distributed file systems Caching at clients o Exploit temporal locality of reference o Key issue: the size of cached units. Options: individual pages of files; whole files, large block o Where to maintain? In main memory; on disk o Cache validation: client to contact server; server notify clients when cached data is rendered stale o Spatial locality of a file: read-ahead of file data Transferring data in bulk o Amortize fixed protocol overheads over many consecutive pages of a file o Depend on spatial locality of reference within files for effectiveness

Mechanisms (cont’d) Hints o Improve performance if correct; has no semantically negative consequence if wrong o Most often used for file location information Encryption o Prevent unauthorized release and modification of information Mount points o Glue file name spaces into a single, seamless, hierarchically structured name space o Two ways: each client individually mounts subtrees from servers; embed mount information in the data stored in the file servers.

Andrew File System Goal: distributed file system for large-scale systems General principles: o Name space: private client name space, & a globally shared and location independent name space o Unit of sharing: volume (a variable file set forming a partial subtree of the name space; the basis of disk quotas) o Cache coherence: upon each open, contact server to verify the cache to be up to date (AFS-1) o Locating server: volume location database (caches volume server mapping) o Pathname traversal: client caches directory; client initiates lookup one at a time; client does lookup

AFS (continued) General principle (cont’d) o Availability: each pathname directory must be available o State: callback state o Caching: What is cached: status, directory, whole files Client has 2 caches: status cache (kept in VM) to rapidly service stat. Calls; data cache (kept in local disk) Where is it cached: disk Cache size: fixed (64KB) Modifications propagate on close. Modifications to directory are immediately visible No more cache consistency check on open. Cache is assumed to be correct until the client is informed via callback Callbacks: when a client caches a file, the server promises to notify it upon any changes (inconsistency) Callback problems: –Load at server increases if many clients cache same file –Client and server may be out of sync (server crash and recovery)

ACID properties for transactions Atomicity: a transaction either commits or aborts. If a transaction commits, all its effects remain; otherwise, effects are undone. o Each transaction appears indivisible w.r.t. crashes Consistency: a transaction is a correct transformation of the system state. o It preserves the state invariants Isolation/Serializability: concurrent transactions are isolated from the updates of other incomplete transactions. These updates do not constitute a consistent state. o Transactions appear indivisible to each other Durability: once a transaction commits, its effects will persist even if there are system failures.

Mobile File Systems Requirements: –access the same file as if connected –retain the same consistency semantics for shared files as if connected –availability and reliability as if connected –ACID (atomic/recoverability, consistent, isolated/serializablity, durable) properties for transactions Constraints: –disconnection and/or partial connection –low bandwidth connection –variable bandwidth and latency connection –connection cost

Mobile File system Four major aspects of disconnected or partially connected operations: –hoarding: what to pre-fetch –consistency: what to keep consistent when connectivity is partial –emulation: how to operate when disconnected –conflict resolution: how to resolve conflicts

Design Options for Hoarding Application hints (Coda): application provides a hoarding database o +: can list all files needed o --: cannot predict accurately in advance o --: do not know all system files used Prior run (disconnected AFS): application runs the same program o +: no need to explicitly provide hoard database o +: AFS automatically does caching o --: no single run of application typically brings over all files o --: cannot predict all applications o --: tedious to do “cat filename” for each data file u want

Design Options for Hoarding Snapshot spying (updated version of coda): look at a time window and hoard all files used in this time window o +: works well along with hoarding database o -: still has the single run problem Semantic distance (Ficus): measure the correlation between file accesses (e.g. the distance btw. file opens and closes) and cluster files o +: based on long term user behavior o +: independent of a time window o --: cannot relate concurrent access o ?: do not know how well it works

Options for Hoarding (Cont’d) Application context: create working sets for each application, and load all the files for desired applications o +: solves the single run problem o +: works well for application/system files o --: still cannot predict which applications user will run o --: do not work well for data files (e.g. emacs will create a working set of all previous data files unless pruned carefully)

Options for Consistency Management Optimistic consistency during disconnection: cannot contact server during disconnection, so assume NO conflict o +: allows disconnected operation o --: causes potential conflict Conservative consistency policy: lock the file before disconnection (or when callback recall fails) o +: prevents conflicts o --: if a portable caches files and disconnects, backbone users cannot access file (I.e., requires full connectivity among clients currently caching a file)

Consistency Management Conservative on shared files, optimistic on private files: monitor file sharing activity at server (how many users perform concurrent read/write sharing of a file), and be conservative for read/write shared files o +: reduces conflict while improving access to unshared files o --: cannot accurately monitor files, particularly when servers are replicated and network may be partitioned Replay: when the communication pipe is thin, replay the actions (commands) rather than keep files consistent o +: good for actions that create large files (e.g. gcc) o --: significant processing overhead o --: almost impossible to keep environments and context fully consistent

Consistency Management Block-by-block consistency: when the communication pipe is thin, fetch/update critical blocks on demand o +: works well in conjunction with (3), can propagate changes to files which are known to be shared more often o --: needs optimistic concurrency as backup o --: does not use application semantics Application-dependent consistency: application imposes structure on file and specifies which parts of the file need to be kept consistent o +: semantic consistency users bandwidth intelligently o +: can work for both aware and unaware applications o --: complex mechanisms to handle unstructured files o --: per-user mechanism, since different applications may potentially use different templates on the same file

Emulation Goal: client emulates whatever basic distributed file system is being adapted to support disconnection Tasks involved: o Create files Can either create temporary or permanent file ids Can hoard the directory structure in order to reduce potential conflicts o Maintain logs: For session semantics (1), log only file-mutating closes For replay semantics (4), log all operations For block-by-block semantics (5), log all writes For application-dependent semantics (6), log only writes which need to be kept consistent Compress logs by deleting entries upon unlink, overwrites, etc. o Propagate logs: Upon partial or total reconnection, propagate log back to server Can prioritize updates to propagate

Conflict Detection Detect write-write conflicts: detect conflict conservatively when the same base version of the file has been updated concurrently during disconnection by both the portable and the backbone o +: simple o --: inadequate in some cases Detect read-write and write-write conflicts: provide serializability o +: satisfies a key requirement for transactions o --: complex

Conflict detection and resolution Application specific conflict detection procedures: application provides the rules to detect conflicts and merge updates when files have been updated concurrently o +: works well for structured files o --: cumbersome and difficult for unstructured files Ownership: owner has the “correct” copy in case of conflict o +: simple semantics o --: some external resolution still needs to be performed by user who is notified that his/her changes have been discarded

Conflict resolution Application specific conflict resolution procedures: application provides the rules to resolve conflicts o +: can automate fully o --: very difficult to write applications in such an environment without adequate library support Multi-level read-write: need to introduce multi-level read/write semantics o All reads and writes are provisional until they have been propagated and conflicts have been resolved o Notion of provisional vs. committed operations

Main Features for Coda Main goals: availability and scalability Disconnected operation for mobile computing High performance thru client side persistent caching Server replication Security support Continued operation during partial network failures in server networks Good scalability Application transparent adaptation Well defined semantics of sharing, even in the presence of network failures

Trickle Reintegration A mechanism that propagates updates to the servers asyn., while minimally impacting foreground activity Deferring the update propagation to servers, Conceptually similar to write-back caching Make write-disconnect state permanent Keep log optimization thru an aging window

Mobile File Systems Options: –hoard, cache or prefetch part/whole files: what to cache, where to cache, what grain to cache at –a variety of consistency semantics: optimistic, conservative, application-dependent, block-by-block, –relaxed properties (e.g. only isolation) for transactions –reconciliation upon reconnection: when to propagate updates, which updates to propagate given limited/partial connection capability, how to resolve detected conflicts –application-level hints or directions for caching/consistency/partial consistency: use of semantics for validation, caching and consistency –profiling for hoarding/caching –reservation of system resources, and loss profiles to arbitrate between applications during conflict