Download presentation
Presentation is loading. Please wait.
Published byPhoebe Wilkins Modified over 9 years ago
1
Ubiquitous Data Access Doppalapudi Raghu Chaitanya Jaliparthi Gangadhar
2
Outline Ubiquitous Data History - NFS, AFS CODA File system Cedar LBNFS Operation shipping MFS Data Staging on untrusted surrogates Portable soul pads Portable & distributed storage GFS Conclusion
3
Ubiquitous Data “In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998 “Fundamentally, the ability to access all information from anywhere and have ONE unified and synchronized information repository is critical to making appliances useful.” Ubiquitous data access will put existing data management techniques to the test, in all aspects – searching, location, reliability, consistency, …
4
Ubiquitous Data Access State of the Art Everyone uses a database system and/or search engine every day Although they may not realize it! (the true test of “ubiquity”). The Internet and WWW have become a ubiquitous means of global data dissemination and exchange. Databases play a crucial but largely invisible role here. XML and related standards are enabling increasingly sophisticated interoperation. Wireless access provides anytime-anywhere access and enables location-centric applications.
5
Characteristics of Ubiquitous Data systems functionality scalability serializability optimality interoperability personalization globalization synchronization flow regulation integration
6
History NFS (1985) Sun Microsystems NFS allows one computer attached to a network to access the file systems present on the hard disk of another computer on the N/w.
7
AFS (Andrews File System) AFS was developed at CMU AFS has many benefits in security & scalability areas AFS uses Kerberos for authentication Read and write operations on an open file are directed only to the locally cached copy When modified file is closed, the changed portions are copied back to the file server Cache consistency is maintained by a mechanism called callback AFS influenced lot of today’s distributed file systems like CODA
8
CODA
9
CODA File System CODA is a Network File System that achieves high availability by techniques using two techniques: Server Replication & Disconnected Operation Disconnected operation is the mode of operation that enables a client to continue accessing critical data during temporary failures of network connectivity Server replication involves maintaining read-write replicas at more than one server. The replication sites for a volume is its volume storage group (VSG) Main idea behind this is caching of data to improve availability
10
Design On each client, a user level process called Venus, manages a file cache on the local disk. It is ‘venus’ that bears the brunt of disconnected operation
11
Venus States Venus operates in three states Hoarding Emulation Reintegration
12
Hoarding When there is good connectivity between client and server In this state venus hoards useful data in anticipation of disconnection It should estimate the files used later and prefetch them for disconnected operation Hoard Walking: maintains client cache in equilibrium, caches high priority files for high availability. Periodically restores equilibrium by performing hoard walk.
13
Emulation When client is very weakly or disconnected with server Venus acts as pseudo server, assumes full responsibility for access When a client asks for a file, venus provides the file if it is stored in cache If the requested file is not present in cache it reports a error, but not as a cache miss Logging: During emulation venus records sufficient information to replay update activity when it reintegrates.
14
Reintegration When network connectivity is resumed between client and server Reintegration is a transitory state through which venus passes in changing roles from pseudo-server to cache manager Venus propagates changes made during emultion, and update its cache to reflect current server state Conflict handling
15
Drawbacks Updates are not visible to other clients Cache misses may impede progress Exhaustion of cache space is a concern Update conflicts become more likely Updates are at a risk due to theft, loss or damage
16
Google gears
17
Cedar
18
Mobile database access over low-bandwidth Networks Relational databases is core of business process Cedar is useful for mobile commerce, traveling sales people, disaster recovery Stale client replica can be used to reduce data transmission volume Basics of database
19
Cedar Architecture
20
Content Addressable Storage Storing information that can be retrieved based on its content System will record a content address, which is an identifier uniquely and permanently linked to the information content itself. A request to retrieve information from a CAS system must provide the content identifier, from which the system can determine the physical location of the data and retrieve it Any change to a data element will necessarily change its content address CAS device will not permit editing information once it has been stored.
21
Cedar Protocol
22
Transparency of cedar Application Transparency Database Transparency Adaptive Interposition Commonality detection Exploring structure in data Generating compact CAS descriptions
23
Creating and refreshing client replicas Hoard Granularity Database hoard profiles Tools for handling Refreshing stale client replicas
24
Results of Cedar
25
Drawbacks of cedar
26
LBFS-Low bandwidth Network File System
27
LBFS-Low Bandwidth Network File System A NFS for efficient use of network in the face of low connectivity LBFS exploits the similarities between files or versions of the same file to save bandwidth Avoids sending of data over network when same data can already be found in server file system or client cache Applied together with compression and caching to improve performance
28
Design LBFS server divides the file it stores into chunks and indexes the chunks by hash value. Client indexes a large persistent cache Whenever requesting data transfer, each system identifies the chunks already in the system
29
Reading a file in LBFS
30
Observations
31
Drawbacks Same files appear different when encrypted differently- so LBFS is not useful here Synchronization problems with different chunk sizes Useful only when there exists minimal commonality between files
32
Operation Shipping
33
Operation Shipping for Mobile File Systems How to propagate an updated large file from a weakly connected client to its server? operation shipping or operation based update propagation can be used to solve the problem. Value shipping
34
Operation shipping The user operation is send to a surrogate client that is strongly connected to the server The surrogate replays the user operation, regenerates the files, checks whether they are identical to original files, and, if so, sends the files to the servers on behalf of the client. Forward error correction is used to restore minor re- execution discrepancies.
35
Operation shipping
36
Observations: Network traffic reductions from 12 to 400 time Speedups in the range from 1.4 to nearly 50 times. Correctness of the re-executed file is ensured May not be feasible when the surrogate doesn't support the user operation There are some side effects that makes the re-executed file to be different from that of main file. In such cases we have to fall back for value shipping.
37
Data Staging on Untrusted Surrogates
38
Data staging on Untrusted Surrogates How untrusted computers can be used to facilitate secure mobile data access? Data staging can improve the performance of Distributed file systems Data staging opportunistically prefetches files and caches them on a nearby surrogates. Surrogates are untrusted and unmanaged: we use end to end and secure hashes to provide privacy and authenticity of data. Results show reduction in average latency by 54%
39
System model
40
observations
41
Pros/cons PROS Reduces the latency between server and a client Increases pervasiveness by supporting small devices with small memory and limited power CONS Surrogates are manually located at present Malicious surrogates provide risks like eavesdrop, denial of service, corruption of data, etc.
42
Portable Soul pads
43
Architecture ISR (Internet Suspend/Respond) User’s computation state is stored as a check- pointed virtual machine image. Remote Desktop
44
Soul pad Knoppix for Auto-configuring host OS VMware workstation for the VMM Windows or Linux for guest OS
45
Observations Soul pad provide AES 128 block encryption When USB drive is removed all the memory that is related to soul pad operations is erased. Backups are created on network file systems when ever host has internet connection. Resume & Suspend Latencies Application Response times Instruction set Architecture diversity
46
Practical Implementation Mojopac Install Mojopac on USB pen drive Install software on Mojopac Use that software on which ever system you want Copyrights violations need to be changed
47
Integrating Portable and Distributed Storage
48
Architecture Each have their own pros and cons Performance and availability increases by integrating portable and distributed storage Lookaside caching
49
GFS Google file system
50
GFS A scalable large distributed data-intensive applications. Fault tolerant while running on inexpensive hardware. Google’s storage platform for generation and processing of data. Hundreds of terabytes of storage access thousands of disks on thousands of machines and accessed by hundreds of clients
51
GFS Architecture
52
Working of GFS Single master, Multiple chunk servers, Multiple Users fixed-size chunks (giant blocks) (how big? 64MB) 64-bit ids for each chunk clients read/write chunks directly from chunkservers chunks are the unit of replication Master maintains all metadata namespace and access control map from filenames to chunk ids current locations for each chunk metadata is cached at clients
53
Other Google technologies Bigtable: A Distributed Storage System for Structured Data Used for Google Earth and Google Finance. Bigtable has successfully provided a flexible, high- performance solution for all of these Google products
54
References 1. Disconnected Operation in the Coda File System – James J. Kistler, CMU 2. Exploiting weak connectivity for Mobile File Access - Lily B. Mummert, CMU 3. A Low Bandwidth Network File system – Athicha Muthithachareon,MIT 4. Data staging on untrusted surrogates – Jason Flinn, Intel Research 5. Operation shipping for Mobile File systems – Yai Lee, IEEE 6. Improving Mobile Database Access over WANs – Niraj Tolia, CMU 7. Reincarnating PCs with portable soulpads– Ramon Caceres, IBM Research 8. Pervasive personal computing in internet suspend system – satya, CMU 9. Integrating portable and distributed storage – Niraj Tolia, CMU 10. The Google File System – Sanjay Ghemawat, Google 11. Coda File System – M Satyanarayan, CMU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.