Download presentation
Presentation is loading. Please wait.
Published byGodfrey Carpenter Modified over 9 years ago
1
Data Staging on Untrusted Surrogates Jason Flinn Shafeeq Sinnamohideen Niraj Tolia Mahadev Satyanarayanan Intel Research Pittsburgh, University of Michigan, Carnegie Mellon University
2
Mobile Data Access: Expectation vs. Reality Mobile computers increasingly connected expectation of ubiquitous data access distributed file systems can help Does reality match expectations? Size, weight, energy constaints Less storage, processing power, etc. How to match reality and expectations? Use untrusted, unmanaged infrastructure!
3
Problem: Limited Storage Latency often the real performance-killer File systems: many sequential RPCs Network latency not improving (much)! What if one can’t cache all files of interest? Borrow storage from nearby surrogate Use as a “L2 file cache” Client Surrogate File server
4
Problem: Limited Battery Energy File system consumes a lot of energy: Network communication Storage (disk spin-ups, reads, writes) Surrogate helps preserve client battery Use surrogate cache to avoid disk spin-ups Prefetch updates to surrogate, not client
5
Problem: Limited Bandwidth How to fetch large updates in a short window? Example: passing through airport gate 11 Mbps (or more) local wireless bandwidth Wide-area Internet bandwidth often less InfoStation (Wu, Badrinath, et al.) Cache updates before mobile user arrives Blast data as user passes through cell Surrogate: mechanism for caching file data.
6
Location, Location, Location Requirement: surrogate located near the client! Must be opportunistic (use what’s there) Vision: surrogates ubiquitously deployed Computers getting ever cheaper Already 802.11b wireless networks in cafes Can’t trust or assume good behavior!
7
Outline Motivation Architecture and design Implementation Evaluation Related work and conclusions
8
Data Staging Architecture Surrogate Data Pump Staging Server Modifications & Unstaged reads files Encrypted files Staged reads File keys and hashes (via secure channel) File Client Desktop Proxy File Server File Client Wimpy Client Server High Latency Coda File system traffic
9
Trust (or Lack Thereof) Trusted: client, file server, desktop, file system Untrusted: surrogate, network How to deal with untrusted surrogate? End-to-end encryption (privacy) Cryptographic hashes (authenticity) Read-only data (can’t “lose” updates) Monitor performance (mitigate DoS)
10
Ease of Management Can’t require a system administrator! Build on commodity software Apache with Perl scripts (643 LoC) No long-term state OK to trip over power cord! Allow file system diversity Minimalist API Currently support Coda and NFS
11
Surrogate API Register()Get lease, quota for surrogate Renew()Renew a lease Deregister()Explicitly stop using surrogate Stage()Put data on the surrogate Unstage()Remove data from surrogate Get()Retrieve data from surrogate
12
Which Files to Stage? Must predict the files most likely to be accessed Prediction orthogonal to data staging Client proxy has hooks for prediction code Hoarding: user manually specifies files, dirs Clustering: per-activity LRU caching Manual Copy Coda Hoarding User-Driven Clustering SEER Less Transparent More Transparent
13
Client Proxy Data Structures Client proxy final arbiter of validity For each staged file, maintains: Valid bit Data length Encryption key and secure hash File idValid?LengthKeyHash 0x3fdcYes32,5580xeabc…0xea67… 0x3fe6No23,4580xabc3…0x7345…
14
Staging Data Client proxy sends list of files to data pump For each file, data pump: Reads file and attributes from file system Encrypts file, generates hash over data Sends encrypted data to surrogate Sends key, hash, length to client Staging asynchronous with client file accesses If file staged, client gets it from surrogate Otherwise, gets it from file server
15
Outline Motivation Architecture and design Implementation Evaluation Related work and conclusions
16
Experimental Setup Coda file server Ethernet Client: IPAQ 3850 64 MB Coda cache 802.11b Wireless Access Point 30 ms delay Surrogate Cold cache: no data on client or surrogate Warm cache: data initially on client and surrogate
17
Benchmark: Image Trace Record accesses to digital photo library in Coda Take the first 10,148 accesses 150 MB unique data, 401 MB total data read Replay trace as fast as possible (DFSTrace) Variables: Wastage ratio: extra data prefetched Miss ratio: amount of data never prefetched Assume wastage ratio 33%, miss ratio 0% Then do sensitivity analysis
18
Baseline Image Results Staging reduces execution time 45-48%!
19
Sensitivity Analysis Higher miss ratio has relatively greater effect
20
Longer-Duration File Traces Used Mummert’s Coda file system traces Traces of client activity (open, mkdir, etc.) Duration: 16-55 hours Working set size: 57-254 MB Methodology: Keep inter-request delays when prefetching Eliminate delays afterwards
21
File Trace Results Up to 48% reduction in cumulative file access delay
22
Request Latency Breakdown
23
Related Work Web Caching (Akamai, Squid) Different data access patterns, consistency Fluid Replication (Kim02) Assume more trust and management OceanStore (Kubiatowicz02) Staging minimalist, file-system agnostic Builds on work in file prefetching, InfoStations
24
Conclusion Possible to significantly improve distributed file system performance with untrusted, unmanaged infrastructure! Future work: Grow set of supported file systems Surrogate discovery and migration Support for energy-awareness http://info.pittsburgh.intel-research.net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.