Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Outline Infrastructure assumptions Concurrency control & caching consistency issues in the infrastructure Survey concurrency control & caching consistency solutions in existing systems –Storage Tank –Oceanstore –Coda Discussion
Infrastructure Assumptions Entities –Clients –OSD’s –Regional Managers Mobility –Clients could have high mobility –OSD’s have mediate mobility –Regional Managers are relatively static
Infrastructure Assumptions (cont.) Connectivity –Disconnection is possible at any place –Clients could have weak connectivity (low bandwidth/long latency) Any of the three kinds of entities can be dynamically created and inserted into the infrastructure
Client Behavior Caching information for performance as well as in expectation of disconnection High mobility –Transfer of regional managers –Changing of concurrency control & caching consistency information Weak connectivity –Reduce message volume
OSD Behavior Mobility –Transfer of regional managers –Handing over of concurrency control & caching consistency management task –Redirecting requests to new regional managers
Regional Manager Behavior Support transferring of clients and OSD’s Efficiently performs handing over Disconnection: –Regional managers get partitioned –Maintain strong consistency within connected partitions –Maintain enough information for reintegration
Exploiting Object Features No single solution could satisfy all situations Object should have its own requirement Our design should identify these requirements, abstract them into several levels, and applies corresponding mechanism accordingly
Survey of Several Existing Systems IBM Storage Tank Oceanstore Coda
Storage Tank with OSD
IBM Storage Tank Protocol A locking and data consistency model Allows the IBM Storage Tank distributed storage system to look and behave like a local file system Objective: provides strong data consistency between clients and servers in distributed environment
Storage Tank Features Concurrency control –Semi-preemptive session locks –Byte-range locks (mandatory and advisory) –Cache coherency data locks Sequential consistency Direct I/O for caching applications (database) Publish consistency for web updates Aggressive caching –Write-back caching of data and metadata –Session state via semi-preemptive locks
Storage Tank Features (cont.) Data consistency across client failures –Leases for failure detection and coordinated recovery Implicit leases Opportunistic renewal
Storage Tank Client Cache Data Metadata Locks
Comments on Storage Tank designed to provide performance that is comparable to that of file systems built on bus-attached, high-performance storage. Works in data center model Restricted to enterprise-wide data sharing, physically
Oceanstore’s Update model An update is a list of predicate-action pairs If some predicate = true, the update commits Each update is applied atomically Can perform many useful predicates and actions against encrypted data –Search over encrypted data –Delete and append using a position-dependent block cipher
Oceanstore Consistency Solution User a two tier architecture –Primary tier: uses distributed consistency Replicas use Byzantine agreement protocol Replicas sign decisions using proactive signatures –Secondary tier: acts as a distributed read/write cache Kept up-to-date via “push” or “pull” Supports connected and disconnected modes of operation
Oceanstore Update Serialization Clients optimistically timestamp updates with commit times Secondary replicas order updates by timestamps tentatively Primary tier picks total order guided by timestamps using Byzantine agreement protocol
Comments on Oceanstore Similar infrastructure It does not separate metadata and data
Coda Volume Management Volume Storage Group (VSG) : set of servers with replicas of a volume Degree of replication and identity of replication site are specified when a volume is created Above info. is stored in volume replication database presenting at every server Venus keeps track of Available VSG (AVSG) for every volume from which it has cached data
Coda Read/Write Strategy Client obtains data from one member of its AVSG called the preferred server Other servers are contacted to verify that the preferred server does have latest copy When a file is closed after modification it is transferred in parallel to all members of the AVSG
Coda’s Disconnected Operation Aim: to provide a file system with resilience to network failures Venus performs as a pseudo-server Updates have to be revalidated with respect to integrity and protection by real servers Venus records sufficient information to replay update activity in a per-volume log called replay log
Coda’s Reintegration The replay log is shipped in parallel to the AVSG, and executed independently at each member Each replica of an object is tagged with a storeid Storeid of objects mentioned in replay log vs. storeid of server’s replica of the object 1) Lock referenced objects, 2) validate and execute each operation, 3) data transfer, and 4) commits transaction and release locks
Comments on Coda Designed for specific application, campus environment particularly Optimistic replica control –Conflicting updates –Security of cached replica