Outline Objective –Attribute file naming: “Why can’t I find my files?” –Hoarding techniques and their use in disconnected file systems Administrative –New.

Slides:



Advertisements
Similar presentations
Automating Hoarding Prasun Dewan Department of Computer Science University of North Carolina
Advertisements

File System for Mobile Computing Quick overview of AFS identify the issues for file system design in wireless and mobile environments Design Options for.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Overview of Mobile Computing (3): File System. File System for Mobile Computing Issues for file system design in wireless and mobile environments Design.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Disconnected Operation in the Coda File System James J. Kistler and M. Satyanarayanan Carnegie Mellon University Presented by Deepak Mehtani.
Disconnected Operation in the Coda File System James J. Kistler and M. Satyanarayanan Carnegie Mellon University Presented by Cong.
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Virtual Memory Chapter 8.
Disconnected Operation In The Coda File System James J Kistler & M Satyanarayanan Carnegie Mellon University Presented By Prashanth L Anmol N M Yulong.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
Web Caching Schemes For The Internet – cont. By Jia Wang.
Naming Names in computer systems are used to share resources, to uniquely identify entities, to refer to locations and so on. An important issue with naming.
File Management Chapter 12.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
Client-Server Computing in Mobile Environments
Distributed Computing COEN 317 DC2: Naming, part 1.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Replication for Mobile Computing Prasun Dewan Department of Computer Science University of North Carolina
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
Distributed Computing COEN 317 DC2: Naming, part 1.
Distributed File Systems Case Studies: Sprite Coda.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
1 Mobile File Systems: Disconnected and Weakly Connected File Systems 3/29/2004 Richard Yang.
Replication and Consistency (3). Reference r Automated Hoarding for Mobile Computers by G. Kuenning and G. Popek.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
1 Shared Files Sharing files among team members A shared file appearing simultaneously in different directories Share file by link File system becomes.
Feb 27, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
DISCONNECTED OPERATION IN THE CODA FILE SYSTEM J. J. Kistler M. Sataynarayanan Carnegie-Mellon University.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
Proxy, Observer, Symbolic Links Rebecca Chernoff.
Distributed File Systems
Information/File Access and Sharing Coda: A Case Study J. Kistler, M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transaction on.
Caching Consistency and Concurrency Control Contact: Dingshan He
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Outline for Today’s Lecture Administrative: Objective: –Disconnection in file systems –Energy management.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Storage Systems CSE 598d, Spring 2007 OS Support for DB Management DB File System April 3, 2007 Mark Johnson.
Outline for Today Objective –Metadata complications –More on naming Attribute-based file naming: “Why can’t I find my files?” Administrative –Not yet.
NETW3005 Virtual Memory. Reading For this lecture, you should have read Chapter 9 (Sections 1-7). NETW3005 (Operating Systems) Lecture 08 - Virtual Memory2.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Lecture 14 Page 1 CS 111 Online File Systems: Naming, Reliability, and Advanced Issues CS 111 On-Line MS Program Operating Systems Peter Reiher.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Feb 22, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Send today with people in your project group. People seem to be dropping off and I.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Mobile File Systems.
Jonathan Walpole Computer Science Portland State University
Coda / AFS Thomas Brown Albert Ng.
Nomadic File Systems Uri Moszkowicz 05/02/02.
Introduction to Operating Systems
Disconnected Operation in the Coda File System
Presented By Abhishek Khaitan Adhip Joshi
Today: Coda, xFS Case Study: Coda File System
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Specialized Cloud Architectures
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Presentation transcript:

Outline Objective –Attribute file naming: “Why can’t I find my files?” –Hoarding techniques and their use in disconnected file systems Administrative –New programming project up –Revisions in schedule – I’m out of town on April 13

An Example of the Problem usr project coursearchive cwd fall02fall01 fall00 fall99 fall03 spring02 spring99 spring01 spring00 cps210 cps220 Find the lecture where hoarding was discussed

An Example of the Problem usr project coursearchive cwd fall02fall01 fall00 fall99 fall03 spring02 spring99 spring01 spring00 cps210 cps220 … Find the lecture where hoarding was discussed

spring00 spring01 spring02 spring99 An Example of the Problem usr project coursearchive cwd fall02fall01 fall00 fall99 fall03 spring02 spring99 spring01 spring00 cps210 cps220 Find the lecture where hoarding was discussed cps210 With symbolic links

It gets worse: /home/home5/carla/talks 2 laptops (one lives at work, one at home) desktop machine at home Forest not a tree! –Growing more like kudzu An Example of the Problem

Attributes in File Systems Metadata: How to assign? –User provided – too much work –Content analysis – restricted by formats Semantic file system provided transducers –Context analysis Access-based or inter-file relationships Once you have them –Virtual directories – “views” –Indexing

spring00 spring01 spring02 spring99 Virtual Directories usr project coursearchive cwd fall02fall01 fall00 fall99 fall03 spring02 spring99 spring01 spring00 cps210 cps220 Find the lecture where hoarding was discussed Query: Automated symbolic links

Lecture10.ppt Virtual Directories usr project coursearchive cwd fall02fall01 fall00 fall99 fall03 spring02 spring99 spring01 spring00 cps210 cps220 Find the lecture where hoarding was discussed Query: AND Lecture10.ppt devices.ppt raid.ppt Versions?

Issues with Virtual Directories What if I want to create a file under a virtual directory that doesn’t have a path location already? How does the system maintain consistency? We should make sure that when a file changes, its contents are still consistent with the query. –What if somewhere a new file is created that should match the query and be included? –What if currently matching file is changed to not match? How do I construct a query that captures exactly the set of files I wish to group together?

Example: HAC File System (Gopal & Manber, OSDI99) Semantic directories created within the hierarchy (given a pathname in the tree) by issuing a query over the scope inherited from parent –Physically exist as directory files containing symlinks Creates symbolic links to all files that satisfy query User can also explicitly add symbolic links to this semantic directory as well as remove ones returned by the query as posed. –Query is a starting point for organization. Reevaluate queries whenever something in scope changes. Uses Glimpse – an index-based file search tool for content- based queries.

Context-based Relationships Premise: Context is what user might remember best. Previous work –Hoarding for disconnected access (inter-file relationships) –Google: textual context for link and feedback from search behavior (assumption of popularity over many users)

Access-based Use context of user’s session at access time Application knowledge – modify apps to provide hints –Example: subject of associated with attached file Feedback from “find” type queries –Searches are for rarely accessed files and usually only one user – limits statistical info

Traced File Creation Behavior

Inter-file Attributes can be shared/propagated among related files Determining relationships –User access patterns – temporal locality –Inter-file content analysis Similarity – duplication -- hashing Versions

Challenges Evaluation – issues of deployment for user study Mechanisms –Storage of large numbers of attributes that get automatically generated –User interface Context switches –Creating false positive relationships

“Cache It”

Prefetching To avoid the access latency of moving the data in for that first cache miss. Prediction! “Guessing” what data will be needed in the future. –It’s not for free: Consequences of guessing wrong Overhead

Background: Inter-file Relationships

Hoarding - Prefetching for Disconnected Information Access Caching for availability (not just latency) Cache misses, when operating disconnected, have no redeeming value. (Unlike in connected mode, they can’t be used as the triggering mechanism for filling the cache.) How to preload the cache for subsequent disconnection? Planned or unplanned. What does it mean for replacement?

SEER’s Hoarding Scheme: Semantic Distance Observer monitors user access patterns, classifying each access by type. Correlator calculates semantic distance among files Clustering algorithm assign each file to one or more projects Only entire projects are hoarded.

Defining Semantic Distance Temporal semantic distance - elapsed time between two file references Time scale effects :-( Sequence-based semantic distance - number of intervening file references between 2, of interest. At what point? Open? Close? Lifetime semantic distance - accounts for concurrently open files - overlapping lifetimes

How to turn semantic distance between two references into semantic distance between files? Summarize - geometric mean. Using months of data. Only store n nearest neighbors for each file and files within distance M External investigators can incorporate some extra info (e.g. heuristics used by Tait, makefile) 0 1 3

Real World Complications Meaningless clutter in the reference stream (e.g. find command) Shared libraries - an apparent link between unrelated files - want to hoard but not use in distance calculations and clustering Rare but critical files, temp files, directories Multi-tasking clutter Delete and recreate by same filename. Examine metadata then open – 1 or 2 accesses? SEER tracing itself – avoid accesses by root

Evaluation Metric –Hoard misses usually do not allow continuation of activity (stops trace) – counting misses is meaningless. –Time to 1 st miss – would depend on hoard size –Miss-free hoard size – size necessary to ensure no misses Method –Live deployment – difficulty in making comparisons Only long enough disconnections Subtract off suspensions –Trace-driven simulation -- reproducible What kind of traces are valid?

Context of Hoarding

Disconnected and Weakly Connected Coda Satya, Kistler, Mummert, Ebling, Kumar, and Lu, “Experience with Disconnected Operation in a Mobile Computing Environment”, USENIX Symp. On Mobile and Location- Independent Computing, Mummert, Ebling, and Satya, “Exploiting Weak Connectivitiy for Mobile File Access”, SOSP95.

Coda Single location-transparent UNIX FS. Scalability - coarse granularity (whole-file caching, volume management) First class (server) replication and second class (client caching) replication Optimistic replication & consistency maintenance.  Designed for disconnected operation for mobile computing clients

Optimistic vs. Pessimistic High availability Conflicting updates are the potential problem - requiring detection and resolution. Avoids conflicts by holding of shared or exclusive locks. How to arrange when disconnection is involuntary? Leases [Gray, SOSP89] puts a time-bound on locks but what about expiration?

Client-cache State Transitions emulationreintegration hoarding physical reconnection logical reconnection disconnection

Hoard Database Per-workstation, per-user set of pathnames with priority User can explicit tailor HDB using scripts called hoard profiles Delimited observations of reference behavior (snapshot spying with bookends)

Coda Hoarding State Balancing act - caching for 2 purposes at once: –performance of current accesses, –availability of future disconnected access. Prioritized algorithm - Priority of object for retention in cache is f(hoard priority, recent usage). Hoard walking (periodically or on request) maintains equilibrium - no uncached object has higher priority than any of cached objects

The Hoard Walk Hoard walk - phase 1 - reevaluate name bindings (e.g., any new children created by other clients?) Hoard walk - phase 2 - recalculate priorities in cache and in HDB, evict and fetch to restore equilibrium

Hierarchical Cache Mgt Ancestors of a cached object must be cached in order to resolve pathname. Directories with cached children are assigned infinite priority

Callbacks During Hoarding Traditional callbacks - invalidate object and refetch on demand With threat of disconnection –Purge files and refetch on demand or hoard walk –Directories - mark as stale and fix on reference or hoard walk, available until then just in case.

Emulation State Pseudo-server, subject to validation upon reconnection Cache management by priority –modified objects assigned infinite priority –freeing up disk space - compression, replacement to floppy, backout updates Replay log also occupies non-volatile storage (RVM - recoverable virtual memory)

Client-cache State Transitions with Weak Connectivity emulation write disconnected hoarding physical reconnection strong connection disconnection weak connection disconnection

Cache Misses with Weak Connectivity At least now it’s possible to service misses but $$$ and it’s a foreground activity (noticable impact). Maybe not User patience threshold - estimated service time compared with what is acceptable Defer misses by adding to HDB and letting hoard walk deal with it User interaction during hoard walk.

Other Hoarding Strategies Detection of “file working sets” Tait, Acharya, Lei, and Chang, “Intelligent File Hoarding for Mobile Computers”, MOBICOM95. Capture semantic relationships among files in “semantic distance” measure - SEER Kuenning and Popek, “Automated Hoarding for Mobile Computers”, SOSP97.

Tait’s Hoarding Scheme: Transparent Analytical Spying Designed for voluntary disconnection (loading the briefcase to leave) Automatic detection of working sets on a per-application basis –Continuous monitoring and logging file access patterns –Trees representing program executions are constructed

Program Execution Tree /bin/prog /u/me/proc1/u/me/proc2 /u/me/mydata /tmp/datafile Root exec’ed by shell exec create open exec

Generalize at Hoard Time To determine all of a program’s dependencies Essentially a union of all saved trees (a limited number based on LRU) from previous executions of this program Per-application not per-user basis (global) Set of heuristics to fix over-generalization (if requested) –differentiate between data (private?) and application files (e.g. extensions, directory, time)