Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer Sciences, Purdue University.
Plethora: Design Goals ● To build a wide-area read-write object repository from semi-static peers for supporting a single seamless distributed storage resource. ● To support desirable features of end-user performance, global resource utilization, robustness, and application support.
Plethora: Motivation ● A number of applications require: – Large aggregate storage; – Supporting distributed access to data; – Collaborative operations on shared datasets; – Content-based retrieval; – Distributed services infrastructure; and – High degree of availability and robustness. Such applications motivate design decisions in Plethora.
Trends in Storage Software
Sample Applications: GriPhyN ● The Grid Physics Network (GriPhyN) is a classic example of a large dataset that is accessed by a number of people. ● Data is generated at the rate of roughly 1 PB/year in the form of high-energy physics experiment readouts (each experiment corresponds to roughly a MB of data). ● Researchers across the world access selected experiments.
Sample Applications: GriPhyN Tier 0 is the data source (in this case CERN), tier 1 is a national center (Fermi Labs), tier 2 are regional centers, tier 3 consists of workgroup servers, tier 4 are individual desktops.
Sample Application: Collaborative Design ● The volume of data associated with typical product lifecycle stages (concept, design, analysis, manufacturing, and field support) grows exponentially. ● At the same time, the need for effective data access, sharing, capture, and protection becomes increasingly important. ● Scalable and distributed solutions to these problems are critical components of PLM.
Sample Application: Collaborative Design ● Desirable Characteristics: – Complexity and Interoperability – Distributed Design Collaboration – Reuse and Versioning – Availability – Performance
Collaborative Design: State-of-the-art Source:
Collaborative Design: State-of-the-art ● Client-server model ideal for local area environments. Does not scale to larger number of installations. ● Mechanisms for availability rely on conventional mechanisms such as snapshots. These do not facilitate real-time recovery or account for network failures. ● Minimal support for end-user performance in terms of client-side support. ● Little or no support in terms of content-based location, application-specific consistency mechanisms, versioning techniques.
Plethora: System Overview ● Plethora Routing Core: Routing data requests to appropriate sites. ● Robustness: Novel erasure coding schemes. ● Versioning Semantics: Supporting read-write access efficiently and data reuse over wide- area networks. ● Content-based Location and Placement: Routing queries on content.
Plethora Routing Core ● Design goals: reliability, performance, end- user latency. – Locality enhancing multi-level overlays of participating sites. – Efficient caching techniques for end-user latency. – Network maintenance via redundant overlay links and real time monitoring and updation.
Plethora: Robustness ● Novel erasure coding techniques: – Conventional (n,m) techniques can reconstruct data if any m of n total data blocks can be accessed. – These techniques are resilient to multiple network and disk failures. – These techniques, however, have considerable communication and computing overhead for block updates, block reconstruction, and for reconstituting the code. – Plethora relies on novel codes that minimize these overheads.
Plethora: Versioning Semantics ● Scaling to wide-area systems require alternate concurrent data access semantics. Plethora relies on versioning semantics to facilitate performance. – Each access is to a version of an object. – Updates to objects are not reflected globally unless they are committed. – The resulting version tree for each object can be reconciled in an application specific manner. ● Versioning systems are ideally suited to high latency environment with real-time applications. They also facilitate version-based data reuse.
Plethora: Content-Based Location ● Content-based location is critical for supporting design applications. – Each data object has keys corresponding to searchable attributes installed in the Plethora routing core (keys are derived using conventional hashing techniques). – The routing core is then used to route queries generated at clients (using the same hash function) to locate data objects. – By giving applications the ability to install keys, powerful content-based searching capability supported by Plethora.
Plethora: Deliverables ● Fully functional Plethora client. ● Extensive system-level and application-level scaling studies and performance characterization (simulations and deployment). ● Sample applications demonstrating large storage capabilities, access performance, collaboration facilities, and mobile applications that maximize value for sponsor.