Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar.

Slides:



Advertisements
Similar presentations
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Advertisements

G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Disconnected Operation in the Coda File System James J. Kistler and M. Satyanarayanan Carnegie Mellon University Presented by Deepak Mehtani.
Disconnected Operation in the Coda File System James J. Kistler and M. Satyanarayanan Carnegie Mellon University Presented by Cong.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Computer Science Lecture 21, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Disconnected Operation In The Coda File System James J Kistler & M Satyanarayanan Carnegie Mellon University Presented By Prashanth L Anmol N M Yulong.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Data Networking Fundamentals Unit 7 7/2/ Modified by: Brierley.
The Google File System.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Client-Server Computing in Mobile Environments
Case Study - GFS.
File Systems (2). Readings r Silbershatz et al: 11.8.
Slingshot: Deploying Stateful Services in Wireless Hotspots Ya-Yunn Su Jason Flinn University of Michigan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Configuring File Services Lesson 6. Skills Matrix Technology SkillObjective DomainObjective # Configuring a File ServerConfigure a file server4.1 Using.
Ch 1. Mobile Adaptive Computing Myungchul Kim
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed Systems Principles and Paradigms Chapter 10 Distributed File Systems 01 Introduction 02 Communication 03 Processes 04 Naming 05 Synchronization.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Data Staging on Untrusted Surrogates Jason Flinn Shafeeq Sinnamohideen Niraj Tolia Mahadev Satyanarayanan Intel Research Pittsburgh, University of Michigan,
Slingshot: Deploying Stateful Services in Wireless Hotspots Ya-Yunn Su Jason Flinn University of Michigan Presenter: Youngki, Lee.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Chapter 20 Distributed File Systems Copyright © 2008.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
DISCONNECTED OPERATION IN THE CODA FILE SYSTEM J. J. Kistler M. Sataynarayanan Carnegie-Mellon University.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
Information/File Access and Sharing Coda: A Case Study J. Kistler, M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transaction on.
ENERGY-EFFICIENCY AND STORAGE FLEXIBILITY IN THE BLUE FILE SYSTEM E. B. Nightingale and J. Flinn University of Michigan.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Feb 22, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Send today with people in your project group. People seem to be dropping off and I.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Chapter 1 Characterization of Distributed Systems
Configuring File Services
Coda / AFS Thomas Brown Albert Ng.
Nomadic File Systems Uri Moszkowicz 05/02/02.
Chapter 1: Introduction
Slingshot: Deploying Stateful Services in Wireless Hotspots
Data Networking Fundamentals
Today: Coda, xFS Case Study: Coda File System
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed File Systems
Distributed File Systems
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed File Systems
Distributed File Systems
System-Level Support CIS 640.
Presentation transcript:

Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar

Outline Ubiquitous Data History - NFS, AFS CODA File system Cedar LBNFS Operation shipping MFS Data Staging on untrusted surrogates Portable soul pads Portable & distributed storage GFS Conclusion

Ubiquitous Data “In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec “Fundamentally, the ability to access all information from anywhere and have ONE unified and synchronized information repository is critical to making appliances useful.” Ubiquitous data access will put existing data management techniques to the test, in all aspects – searching, location, reliability, consistency, …

Ubiquitous Data Access State of the Art Everyone uses a database system and/or search engine every day Although they may not realize it! (the true test of “ubiquity”). The Internet and WWW have become a ubiquitous means of global data dissemination and exchange. Databases play a crucial but largely invisible role here. XML and related standards are enabling increasingly sophisticated interoperation. Wireless access provides anytime-anywhere access and enables location-centric applications.

Characteristics of Ubiquitous Data systems functionality scalability serializability optimality interoperability  personalization  globalization  synchronization  flow regulation  integration

History NFS (1985) Sun Microsystems NFS allows one computer attached to a network to access the file systems present on the hard disk of another computer on the N/w.

AFS (Andrews File System) AFS was developed at CMU AFS has many benefits in security & scalability areas AFS uses Kerberos for authentication Read and write operations on an open file are directed only to the locally cached copy When modified file is closed, the changed portions are copied back to the file server Cache consistency is maintained by a mechanism called callback AFS influenced lot of today’s distributed file systems like CODA

CODA

CODA File System CODA is a Network File System that achieves high availability by techniques using two techniques: Server Replication & Disconnected Operation Disconnected operation is the mode of operation that enables a client to continue accessing critical data during temporary failures of network connectivity Server replication involves maintaining read-write replicas at more than one server. The replication sites for a volume is its volume storage group (VSG) Main idea behind this is caching of data to improve availability

Design On each client, a user level process called Venus, manages a file cache on the local disk. It is ‘venus’ that bears the brunt of disconnected operation

Venus States Venus operates in three states Hoarding Emulation Reintegration

Hoarding When there is good connectivity between client and server In this state venus hoards useful data in anticipation of disconnection It should estimate the files used later and prefetch them for disconnected operation Hoard Walking: maintains client cache in equilibrium, caches high priority files for high availability. Periodically restores equilibrium by performing hoard walk.

Emulation When client is very weakly or disconnected with server Venus acts as pseudo server, assumes full responsibility for access When a client asks for a file, venus provides the file if it is stored in cache If the requested file is not present in cache it reports a error, but not as a cache miss Logging: During emulation venus records sufficient information to replay update activity when it reintegrates.

Reintegration When network connectivity is resumed between client and server Reintegration is a transitory state through which venus passes in changing roles from pseudo-server to cache manager Venus propagates changes made during emultion, and update its cache to reflect current server state Conflict handling

Drawbacks Updates are not visible to other clients Cache misses may impede progress Exhaustion of cache space is a concern Update conflicts become more likely Updates are at a risk due to theft, loss or damage

Google gears

Cedar

Mobile database access over low-bandwidth Networks Relational databases is core of business process Cedar is useful for mobile commerce, traveling sales people, disaster recovery Stale client replica can be used to reduce data transmission volume Basics of database

Cedar Architecture

Content Addressable Storage Storing information that can be retrieved based on its content System will record a content address, which is an identifier uniquely and permanently linked to the information content itself. A request to retrieve information from a CAS system must provide the content identifier, from which the system can determine the physical location of the data and retrieve it Any change to a data element will necessarily change its content address CAS device will not permit editing information once it has been stored.

Cedar Protocol

Transparency of cedar Application Transparency Database Transparency Adaptive Interposition Commonality detection Exploring structure in data Generating compact CAS descriptions

Creating and refreshing client replicas Hoard Granularity Database hoard profiles Tools for handling Refreshing stale client replicas

Results of Cedar

Drawbacks of cedar

LBFS-Low bandwidth Network File System

LBFS-Low Bandwidth Network File System A NFS for efficient use of network in the face of low connectivity LBFS exploits the similarities between files or versions of the same file to save bandwidth Avoids sending of data over network when same data can already be found in server file system or client cache Applied together with compression and caching to improve performance

Design LBFS server divides the file it stores into chunks and indexes the chunks by hash value. Client indexes a large persistent cache Whenever requesting data transfer, each system identifies the chunks already in the system

Reading a file in LBFS

Observations

Drawbacks Same files appear different when encrypted differently- so LBFS is not useful here Synchronization problems with different chunk sizes Useful only when there exists minimal commonality between files

Operation Shipping

Operation Shipping for Mobile File Systems How to propagate an updated large file from a weakly connected client to its server? operation shipping or operation based update propagation can be used to solve the problem. Value shipping

Operation shipping The user operation is send to a surrogate client that is strongly connected to the server The surrogate replays the user operation, regenerates the files, checks whether they are identical to original files, and, if so, sends the files to the servers on behalf of the client. Forward error correction is used to restore minor re- execution discrepancies.

Operation shipping

Observations: Network traffic reductions from 12 to 400 time Speedups in the range from 1.4 to nearly 50 times. Correctness of the re-executed file is ensured May not be feasible when the surrogate doesn't support the user operation There are some side effects that makes the re-executed file to be different from that of main file. In such cases we have to fall back for value shipping.

Data Staging on Untrusted Surrogates

Data staging on Untrusted Surrogates How untrusted computers can be used to facilitate secure mobile data access? Data staging can improve the performance of Distributed file systems Data staging opportunistically prefetches files and caches them on a nearby surrogates. Surrogates are untrusted and unmanaged: we use end to end and secure hashes to provide privacy and authenticity of data. Results show reduction in average latency by 54%

System model

observations

Pros/cons PROS Reduces the latency between server and a client Increases pervasiveness by supporting small devices with small memory and limited power CONS Surrogates are manually located at present Malicious surrogates provide risks like eavesdrop, denial of service, corruption of data, etc.

Portable Soul pads

Architecture ISR (Internet Suspend/Respond) User’s computation state is stored as a check- pointed virtual machine image. Remote Desktop

Soul pad Knoppix for Auto-configuring host OS VMware workstation for the VMM Windows or Linux for guest OS

Observations Soul pad provide AES 128 block encryption When USB drive is removed all the memory that is related to soul pad operations is erased. Backups are created on network file systems when ever host has internet connection. Resume & Suspend Latencies Application Response times Instruction set Architecture diversity

Practical Implementation Mojopac Install Mojopac on USB pen drive Install software on Mojopac Use that software on which ever system you want Copyrights violations need to be changed

Integrating Portable and Distributed Storage

Architecture Each have their own pros and cons Performance and availability increases by integrating portable and distributed storage Lookaside caching

GFS Google file system

GFS A scalable large distributed data-intensive applications. Fault tolerant while running on inexpensive hardware. Google’s storage platform for generation and processing of data. Hundreds of terabytes of storage access thousands of disks on thousands of machines and accessed by hundreds of clients

GFS Architecture

Working of GFS Single master, Multiple chunk servers, Multiple Users fixed-size chunks (giant blocks) (how big? 64MB) 64-bit ids for each chunk clients read/write chunks directly from chunkservers chunks are the unit of replication Master maintains all metadata namespace and access control map from filenames to chunk ids current locations for each chunk metadata is cached at clients

Other Google technologies Bigtable: A Distributed Storage System for Structured Data Used for Google Earth and Google Finance. Bigtable has successfully provided a flexible, high- performance solution for all of these Google products

References 1. Disconnected Operation in the Coda File System – James J. Kistler, CMU 2. Exploiting weak connectivity for Mobile File Access - Lily B. Mummert, CMU 3. A Low Bandwidth Network File system – Athicha Muthithachareon,MIT 4. Data staging on untrusted surrogates – Jason Flinn, Intel Research 5. Operation shipping for Mobile File systems – Yai Lee, IEEE 6. Improving Mobile Database Access over WANs – Niraj Tolia, CMU 7. Reincarnating PCs with portable soulpads– Ramon Caceres, IBM Research 8. Pervasive personal computing in internet suspend system – satya, CMU 9. Integrating portable and distributed storage – Niraj Tolia, CMU 10. The Google File System – Sanjay Ghemawat, Google 11. Coda File System – M Satyanarayan, CMU