Download presentation
Presentation is loading. Please wait.
Published byAnnabel Ross Modified over 9 years ago
1
1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015
2
2 Motivations: Reliability and Availability 2 out of 5 enterprises that experience a disaster go out of business within 5 years [Gartner Report] Outages cost >$250K/hour (25%) or >$5M/hour (4%) [Eagle Rock Online Survey]
3
3 Our Contributions A taxonomy of the design choices for remote mirroring An asynchronous protocol that is designed to handle many kinds and sequences of failures Checking the correctness of the protocol using I/O automata-based simulation
4
4 Remote Mirroring Overview Competing goals: High performance, low cost, and low data loss App Mirroring Module Mirroring Module Local Remote Wide Area Network App Primary Secondary Primary Log Secondary Log
5
5 Design Choices Synchronous vs. asynchronous –Propagate update to mirror before vs. after write request returned to application Divergence: zero bounded, op/byte/time bounded, resource bounded, unbounded –Amount of data allowed to be out-of-sync between mirrors As-is logging vs. write coalescing –Store all versions vs. a subset of versions of overwritten data in log
6
6 Seneca’s Choices Synchronous vs. asynchronous –Low data loss vs. smooth traffic and high performance Divergence: zero bounded, op/byte/time bounded, resource bounded, unbounded –Low data loss vs. smooth traffic and high availability As-is logging vs. write coalescing –Little secondary log space vs. low primary log space and low traffic
7
7 A Taxonomy 4 Async- Bitmap 3 Async- Coalesce 2 Async 1 Sync Avail+ Cost – Loss + Perf + Cost – Loss +
8
8 Evaluation of Seneca’s Choices Metrics: Impact of asynchronous propagation and write coalescing on WAN traffic and log space TracesCapacityLengthMean Write Rate Cello20021.44TB24 hours0.78 MB/s SAP4TB15 mins1.95 MB/s RDW500GB1.4 hours0.34 MB/s OpenMail640GB1 hour1.70 MB/s
9
9 Simulation Results Mean traffic: 5-40% reduction with write coalescing allowed within 30 sec intervals 95 th percentile usage: reduced from 93% of 4 T3 lines to 85% of 3 T3 lines Log space: 100 GB log will cover a network outage for 14-81 hours Mean Traffic vs. Coalescing Interval Traffic CDF w/ Coalescing On/Off Log Size vs. Coalescing Interval
10
10 How To Get Things Right Hard problems: –Rolling disasters Primary fails => secondary inconsistent => system inaccessible –Failover dilemmas Primary fails before propagation Secondary takes over and continues to update Old primary returns Our approach: –Finite state machines
11
11 Local Seneca StateRemote Seneca State Primary State Secondary State
12
12 Checking Correctness Simulation –Started with Input/Output Automata (a model checking language) –Constrained random walks in the state space –Implemented in C Correctness criteria –Coverage, safety and liveness Latest results –Detected and fixed many non-trivial implementation bugs in a relatively short time –Average failure injections before a bug is detected: 16435 –Mean Time Between Failures for the protocol proper: 4100 years –The latest bug took 1.77M writes, 75.9K failures, 22.4K recoveries and 6.6M internal events to detect
13
13 Summary A taxonomy of design space for remote mirroring Evaluation of Seneca’s design choices A finite state machine description of the Seneca remote mirroring protocol Checking the correctness of Seneca with simulations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.