Download presentation
Presentation is loading. Please wait.
Published byVincent Warren Modified over 6 years ago
1
Replication Degree Customization for High Availability
Kai Shen 12/7/2018 Replication Degree Customization for High Availability Ming Zhong Kai Shen Joel Seiferas Google University of Rochester EuroSys 2008
2
Kai Shen 12/7/2018 Problem Context Replication degree tradeoff between availability and space Skewed data popularity distributions intuitive to highly replicate popular objects goal: improve availability under certain space constraint Should we worry about space constraint today? decentralized wide-area systems – high machine failure/inaccessibility rate demand high-degree replication 0.265 failure rate from a Planetlab machine accessibility trace centrally-managed local-area clusters – often requiring data in-memory and relatively high memory cost 12/7/2018 EuroSys 2008 EuroSys 2008
3
Related Work uniform-degree replication for availability or durability
Kai Shen 12/7/2018 Related Work uniform-degree replication for availability or durability Farsite, CFS, GFS, Glacier, IrisStore, MOAT simple adaptive approach in some production systems higher replication for most popular objects; lower replication for the rest other utilizations of skewed data object popularity fast search/lookup – Cohen et al., Beehive other ways to improve data availability in distributed systems replica placement – Farsite erasure coding – Reed-Solomon 12/7/2018 EuroSys 2008 EuroSys 2008
4
Basic Analytical Result
Kai Shen 12/7/2018 Basic Analytical Result Problem formulation: p is machine failure probability ⇒ unavailability of an object with k replicas is pow(p,k) a system with n objects: object i popularity is ri, size is si find object replication degrees k1, k2, …, kn to minimize expected unavailability ∑1≤i≤n ri ∙ pow(p,ki) subject to space constraint ∑1≤i≤n si ∙ ki ≤ K Result: Lagrangian function: ∑1≤i≤n ri ∙ pow(p,ki) + λ ∙ (∑ 1≤i≤n si ∙ ki – K) optimization is reached when the function’s partial derivatives on ki’s and λ are all zero therefore ki = C + log1/p(ri/si), C is a constant 12/7/2018 EuroSys 2008 EuroSys 2008
5
Challenges in Systems Context
Kai Shen 12/7/2018 Challenges in Systems Context Basic analytical result: optimal object replication degree ki = C + log1/p(ri/si) p is machine failure probability; ri is object popularity; si is object size Systems issues: complex system models: multi-object operations, nonuniform machine failure rates realistic workload behaviors: skewness and stability of object popularity maintenance overhead: replica creation/deletion under dynamic system changes 12/7/2018 EuroSys 2008 EuroSys 2008
6
Kai Shen 12/7/2018 Complex System Models Basic analytical result can be adapted to handle complex system models. Multi-object operations multi-object op unavailability is approximately the sum of the unavailabilities of individual objects when xi‘s are small, 1-Π1≤i≤m (1-xi) ≈ ∑1≤i≤m xi Nonuniform machine failure rates redefine p as the average per-machine failure rate 12/7/2018 EuroSys 2008 EuroSys 2008
7
Object Popularity/Size Skewness
Kai Shen 12/7/2018 Object Popularity/Size Skewness Popularity to size skewness correlation Trace-driven examination: Most popular objects are not necessarily subject to most replication 12/7/2018 EuroSys 2008 EuroSys 2008
8
Object Popularity Stability
Kai Shen 12/7/2018 Object Popularity Stability Stable object popularity is important for learning the object popularity and for low adjustment overhead Illustration of stability across month-long trace segments: Not designed to handle flash crowds 12/7/2018 EuroSys 2008 EuroSys 2008
9
Dynamic Maintenance Overhead
Kai Shen 12/7/2018 Dynamic Maintenance Overhead System adaptation may require dynamic maintenance object popularity may change over time Low maintenance overhead due to stable object popularities stable replica assignment in the analytical result object replication degree ki = C + log1/p(ri/si) coarse-grain adaptation e.g., support only two replication degrees – low/high 12/7/2018 EuroSys 2008 EuroSys 2008
10
Trace-driven Evaluation – Availability Improvement
Kai Shen 12/7/2018 Trace-driven Evaluation – Availability Improvement Availability improvement on four real application traces and two machine failure patterns Compared to uniform replication under same space constraint: 12/7/2018 EuroSys 2008 EuroSys 2008
11
Trace-driven Evaluation – Changing Space Constraint
Kai Shen 12/7/2018 Trace-driven Evaluation – Changing Space Constraint Results on the Planetlab machine failure pattern Availability improvement is independent of space limit High replication needed for some decentralized wide-area systems 12/7/2018 EuroSys 2008 EuroSys 2008
12
Trace-driven Evaluation – Dynamic Maintenance Overhead
Kai Shen 12/7/2018 Trace-driven Evaluation – Dynamic Maintenance Overhead Overhead of weekly changes on replication degree number of replica creations/deletions size of replica creations 12/7/2018 EuroSys 2008 EuroSys 2008
13
Kai Shen 12/7/2018 Conclusion Results analytical result: optimal replication when object replication degree is linear to log(popularity/size) address systems issues in complex system models, realistic workload behaviors, and maintenance overhead Big picture: skewed & stable data distributions motivate per-object adaptation in distributed system management adapt replication degree for high availability [this paper] adapt Bloom filter hash number for low false-positive rate [other result] adapt co-placement of correlated data objects for fast multi-object operations [other result] 12/7/2018 EuroSys 2008 EuroSys 2008
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.