CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington SOSP 1999 Presented by: Fabián E. Bustamante

2 Porcupine – goals & requirements Use commodity hardware to build a large, scalable mail service Main goal – scalability in terms of –Manageability - large but easy to manage Self-configure w/ respect to load and data distribution Self-heal with respect to failure & recovery –Availability – survive failures gracefully Failure may prevent some users to access email –Performance – scale linear with cluster size Target – 100s of machines ~ billions of mail msgs/day

3 Key Techniques and Relationships Functional Homogeneity “any node can perform any task” Automatic Reconfiguration Dynamic Scheduling Replication Manageability Performance Availability Framework Techniques Goals

4 Why Email? Mail is important –Real demand – Saito now works for Google Mail is hard –Write intensive –Low locality Mail is easy –Well-defined API –Large parallelism –Weak consistency

5 Conventional Mail Solution Static partitioning Performance problems: –No dynamic load balancing Manageability problems: –Manual data partition Availability problems: –Limited fault tolerance SMTP/IMAP/POP Jeanine’s mbox Luca’s mbox Joe’s mbox Suzy’s mbox NFS servers

6 Porcupine Architecture Node A... Node B Node Z... SMTP server POP server IMAP server Mail map Mailbox storage User profile Replication Manager Membership Manager RPC Load Balancer User map

7 Porcupine Operations DNS-RR selection Protocol handling User lookup Load Balancing Message store Internet AB... A 1. “send mail to luca” 2. Who manages luca?  A 3. “Verify luca” 5. Pick the best nodes to store new msg  C 4. “OK, luca has msgs on C and D 6. “Store msg” B C... C

8 Basic Data Structures “luca” BCACABAC Luca: {A,C} ann: {B} BCACABAC suzy: {A,C}joe: {B} BCACABAC Apply hash function User map Mail map /user info Mailbox storage ABC Luca’s MSGs Suzy’s MSGs Bob’s MSGs Joe’s MSGs Ann’s MSGs Suzy’s MSGs

9 Porcupine Advantages Advantages: –Optimal resource utilization –Automatic reconfiguration and task re-distribution upon node failure/recovery –Fine-grain load balancing Results: –Better Availability –Better Manageability –Better Performance

10 Performance Goals –Scale performance linearly with cluster size Strategy: Avoid creating hot spots –Partition data uniformly among nodes –Fine-grain data partition

11 Measurement Environment 30 node cluster of not-quite-all-identical PCs –100Mb/s Ethernet + 1Gb/s hubs –Linux 2.2.7 –42,000 lines of C++ code Synthetic load Compare to sendmail+popd

12 How does Performance Scale? 68m/day 25m/day

13 Availability Goals: –Maintain function after failures –React quickly to changes regardless of cluster size –Graceful performance degradation / improvement Strategy: Two complementary mechanisms –Hard state: email messages, user profile –  Optimistic fine-grain replication –Soft state: user map, mail map –  Reconstruction after membership change

14 Soft-state Reconstruction 1. Membership protocol Usermap recomputation 2. Distributed disk scan Timeline A B BCABABAC luca: {A,C} joe: {C} BCABABAC ann: {B} BCABABAC suzy: {A,B} C BAABABAB luca: {A,C} joe: {C} BAABABAB suzy: ann: ann: {B} BCABABAC suzy: {A,B} ACACACAC luca: {A,C} joe: {C} ACACACAC suzy: {A,B} ann: {B} BCABABAC suzy: {A,B}

15 Reaction to Configuration Changes

16 Hard-state Replication Goals: –Keep serving hard state after failures –Handle unusual failure modes Strategy: Exploit Internet semantics –Optimistic, eventually consistent replication –Per-message, per-user-profile replication –Efficient during normal operation –Small window of inconsistency

17 Replication Efficiency 68m/day 24m/day

18 Replication Efficiency 68m/day 24m/day 33m/day Pretending – remove disk flushing from disk logging routines.

19 Load balancing: Storing messages Goals: –Handle skewed workload well –Support hardware heterogeneity –No voodoo parameter tuning Strategy: Spread-based load balancing –Spread: soft limit on # of nodes per mailbox Large spread  better load balance Small spread  better affinity –Load balanced within spread –Use # of pending I/O requests as the load measure

20 Support of Heterogeneous Clusters +16.8m/day (+25%) +0.5m/day (+0.8%) Node heterogeneity – 0% all nodes ~ at same speed, 3,7 & 10% - percentage of nodes w/ very fast disks Relative performance improvement.

21 Conclusions Fast, available, and manageable clusters can be built for write-intensive service Key ideas can be extended beyond mail –Functional homogeneity –Automatic reconfiguration –Replication –Load balancing Ongoing work –More efficient membership protocol –Extending Porcupine beyond mail: Usenet, Calendar, etc –More generic replication mechanism

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

Similar presentations

Presentation on theme: "CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

Similar presentations

Presentation on theme: "CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington."— Presentation transcript:

Similar presentations

About project

Feedback