Download presentation
Presentation is loading. Please wait.
1
OceanStore Status and Directions ROC/OceanStore Retreat 1/16/01 John Kubiatowicz University of California at Berkeley
2
OceanStore:2ROC/OceanStore Jan’01 Questions about ubiquitous information: Where is persistent information stored? –Want: Geographic independence for availability, durability, and freedom to adapt to circumstances How is it protected? –Want: Encryption for privacy, signatures for authenticity, and Byzantine commitment for integrity Can we make it indestructible? –Want: Redundancy with continuous repair and redistribution for long-term durability Is it hard to manage? –Want: automatic optimization, diagnosis and repair
3
OceanStore:3ROC/OceanStore Jan’01 Everyone’s Data, One Utility Millions of servers, billions of clients …. 1000-YEAR durability (excepting fall of society) Maintains Privacy, Access Control, Authenticity Incrementally Scalable (“Evolvable”) Self Maintaining! Not quite peer-to-peer: Utilizing servers in infrastructure Some computational nodes more equal than others
4
OceanStore:4ROC/OceanStore Jan’01 Want Automatic Maintenance Can’t possibly manage billions of servers by hand! System should: –Be Fault-Tolerance (High MTTF) –Repair itself (Low MTTR through adaptation) –Incorporate new elements Can we guarantee data is available for 1000 years? –New servers added from time to time –Old servers removed from time to time –Everything just works Many components with geographic separation –System not disabled by natural disasters –Can adapt to changes in demand and regional outages –Gain in stability through statistics
5
OceanStore:5ROC/OceanStore Jan’01 OceanStore Assumptions Untrusted Infrastructure: –The OceanStore is comprised of untrusted components –Only ciphertext within the infrastructure Responsible Party: –Some organization (i.e. service provider) guarantees that your data is consistent and durable –Not trusted with content of data, merely its integrity Mostly Well-Connected: –Data producers and consumers are connected to a high- bandwidth network most of the time –Exploit multicast for quicker consistency when possible Promiscuous Caching: –Data may be cached anywhere, anytime
6
OceanStore:6ROC/OceanStore Jan’01 This Talk: making it real! (Or: you will hear reality from my students)
7
OceanStore:7ROC/OceanStore Jan’01 The Path of an OceanStore Update Second-Tier Caches Multicast trees Inner-Ring Servers Clients
8
OceanStore:8ROC/OceanStore Jan’01 Important Components: Data Object: (Distribution-enabled data format) –Must support copy-on-write and versioning efficiently –Must allow sparse population of data in caches –Must smoothly interface with archive Inner Ring: (Byzantine Agreement) –Check write access control –Choose seriallize updates/resolve micro-conflicts –Sign result with Threshold Signature –Erasure code result and send fragments Second Tier Server: (Promiscuous Caches) –Serve local clients –Tie itself into Dissemination tree –Apply updates that it receives through tree –Decision point for caching policies: tentative vs committed
9
OceanStore:9ROC/OceanStore Jan’01 Implementation Framework Asynchronous DiskAsynchronous Network Network Operating System Java Virtual Machine Thread Scheduler X Y Consistency Location & Routing Archival Introspection Modules DispatchDispatch 4 2 3 1 4 Event-driven Implementation Model in Java –Divided into a sequence of communicating “stages” –Communication between stages in the form of “snoopable” messages –> 100,000 lines of Java, Comments, Test scripts –Substantially functioning!
10
OceanStore:10ROC/OceanStore Jan’01 GUIDs for Naming Unique, location independent identifiers: –Every version of every unique entity has a permanent, Version-GUID (or VGUID): Hash over content Versioning supports time-travel –Each object has a permanent (version-independent) Archival-GUID (or AGUID): –Signed Associations between AGUIDs and latest VGUIDs are produced by inner ring (called Heartbeats) Naming hierarchy: –Users map from names to AGUIDs via hierarchy of OceanStore objects Each link is an AGUID Foo Bar Baz Myfile Out-of-Band “Root link”
11
OceanStore:11ROC/OceanStore Jan’01 Data Object Structure All about flexibility and validation
12
OceanStore:12ROC/OceanStore Jan’01 Status: Data Object Development Second-Tier Replica support: functional –Second-tier caches can hold multiple versions –Tie themselves into multicast trees Several dissemination tree algorithms explored Updates forwarded from inner ring through trees Complete B-Tree object structure developed –Data blocks named with unforgeable hashes Hashes can point to archival fragments/live blocks –Supports copy on write –Top block defines complete version Missing blocks filled in from archive or other replicas Update commits with distributed threshold signatures –Byzantine commitment not quite integrated into prototype Traffic generator for testing
13
OceanStore:13ROC/OceanStore Jan’01 Exploiting Law of Large Numbers for Durability
14
OceanStore:14ROC/OceanStore Jan’01 The Dissemination Process Model Builder Set Creator Introspection Human Input Network Monitoring model Disseminator set probe type fragments
15
OceanStore:15ROC/OceanStore Jan’01 Achieving Low MTTR: Global Heartbeats Trigger repair when level of redundancy to low Continuous sweep (slowly over time)
16
OceanStore:16ROC/OceanStore Jan’01 Status: Archival Infrastructure Archival Fragments generated by Inner Ring –Multi-stage-based implementation at inner ring –Storage servers hold fragments –Caching servers (2 nd - tier replicas) hold data objects Independence Analysis (mostly there) –Node discovery technique exists –Analysis of long-running reliability data –Dissemination-set creator: initial versions Storage servers (Naïve but functional): –Initial implementation: cache + object store –Ongoing tuning efforts –Redesign in the works
17
OceanStore:17ROC/OceanStore Jan’01 Location Independent Routing Paradigm: Routing –Route messages to objects by GUID regardless of location Fast, probabilistic search for “routing cache”: –Built from attenuated bloom filters –Approximation to gradient search Redundant Plaxton Mesh used for underlying routing infrastructure: –Randomized data structure with locality properties –Redundant, insensitive to faults, and repairable –Amenable to continuous adaptation to adjust for: Changing network behavior Faulty servers Denial of service attacks Tomorrow: 3 talks on Routing
18
OceanStore:18ROC/OceanStore Jan’01 Status: Location Independent Routing Basic Tapestry infrastructure is operational –Single-path static routing: works –Multi-path adaptive routing: mostly there –Dynamic Integration of new nodes: implemented Network adaptation almost there (Patchwork) –Framework for Measurement of network properties –Periodic beacons measure loss and network latency Exploitation of Differences in nodes: –Brocade backbone supplement to Tapestry: Improves routing –Differentiation in service experiments ongoing Theoretical Results on Tapestry –Construction/Analysis of Dynamic Integration Algorithms –Voluntary/involuntary node deletion algorithms –View of Tapestry as data structure for solving nearest neighbor Attenuated Bloom Filters are operational –Implemented and functional –Optimizes short-distance routing infrastructure!
19
OceanStore:19ROC/OceanStore Jan’01 Introspection: The New Architectural Creed Using Moore’s law gains for something other than performance Examples: –Online algorithmic validation –Model building for data rearrangement Availability Better prefetching –Extreme Durability (1000-year time scale?) Use of erasure coding and continuous repair –Stability through Statistics Use of redundancy to gain more predictable behavior Systems version of Thermodynamics! –Continuous Dynamic Optimization of other sorts Adapt Compute Monitor
20
OceanStore:20ROC/OceanStore Jan’01 Status: Introspection Development of OIL framework for introspection: this framework is operational –Collection facilities can observe all events in the system –Multiple aggregation models available Example 1: Clustering for prefetching –Currently builds Hidden Markov-model of access patterns utilizing OIL framework –Almost there: Use models to better prefetch objects Placement of replices assisted by bloom filters (almost) Example 2: Observation of network behavior –Framework for observation of network latencies –Adaptation of network topology: almost there Example 3: Grammer building for prefetching –Experiment of introspection at processor level –Talk later today about this (Mark Whitney)
21
OceanStore:21ROC/OceanStore Jan’01 Status: Medium Scale Test and Emulation Two medium clusters from IBM SUR Grant –Each cluster 21 servers: Each with two 1 GHz processors One GByte of RAM, 73 GB of Disk –1 GB Switch per cluster –MIRNET switch Plan to have continuous OceanStore components running – in approximately 1 month Emulation technology: currently works –Able to simulate large-scale network by simulating network latencies –Multiple OceanStore nodes emulated/node
22
OceanStore:22ROC/OceanStore Jan’01 Reality: Web Caching through OceanStore
23
OceanStore:23ROC/OceanStore Jan’01 Day Dreams? (Becoming real) NFS File system built in OceanStore (Exists) –Still have to integrate ACLs –Update to latest prototype Windows Installable File system (Planning) –“USB Keys” hold cryptographic keys and personal identity –Automatic downloading and verification of filesystem IMAP OceanStore gateway (Planning) Lotus Notes Domino Server –Exploring use of work flow on top of OceanStore
24
OceanStore:24ROC/OceanStore Jan’01 OceanStore Conclusions OceanStore: everyone’s data, one big utility –Global Utility model for persistent data storage Very Soon: Working OceanStore cluster!!!! –Event-driven programming in Java –You will hear about components today and tomorrow OceanStore assumptions: –Untrusted infrastructure with a responsible party –Mostly connected with conflict resolution –Continuous on-line optimization
25
OceanStore:25ROC/OceanStore Jan’01 For more info: OceanStore vision paper for ASPLOS 2000 “OceanStore: An Architecture for Global-Scale Persistent Storage” OceanStore paper on Maintenance (IEEE IC): “Maintenance-Free Global Data Storage” Both available on OceanStore web site: http://oceanstore.cs.berkeley.edu/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.