Logically Centralized, Physically Distributed Mark Stuart Day Cisco Systems
Standard disclaimer No matter what I say in this talk, I’m not making any Lotus product commitments. Cisco
Outline What people want What people can have An ancient example: –Replicated mail repository A recent example: –Content distribution network Conclusions
What people want Single name/location for single logical service Service never goes down Service grows/shrinks smoothly
What people can have Single name/location for single logical service Service never goes down Service grows/shrinks smoothly Occasional weird errors that violate user expectations
Some ancient history MIT-LCS-TR-376, Date: May 1987 REPLICATION AND RECONFIGURATION IN A DISTRIBUTED MAIL REPOSITORY Author(s): Day, M.S. Pages: 110 Price: $18.00AD Number: A Keywords: data replication, software reconfiguration, availability, reliability, scalable systems, distributed programs, electronic mail repositories, programming languages
Mail system architecture (think of Grapevine) Mailbox 1 Mailbox 2 Mailbox 3 Mailbox 4 Client Directory
Highly available Mailbox 1 Mailbox 2 Mailbox 1 Mailbox 2 Mailbox 1 Mailbox 2 Client Directory
How did it work? Systems success –Nice capability for quorum adjustment –New directory algorithm for deletions –Cool dynamic reconfiguration User failure –“What do you mean I can’t delete that message?” –“Where’s that message gone?”
A recent example: Content distribution networks Akamai, Digital Island, Mirror Image, Adero, … $Millions in revenue $Billions in market capitalization Might be worth knowing something about
The bad old days (without content distribution) Client Origin Server GET some/piece/o/content
New and improved (with content distribution) Client Origin Server Delivery Node Delivery Node Delivery Node Request Router Request Router Content Router Content Router GET
Virtues Client unchanged Origin server mostly unchanged –Content URLs may be modified Add delivery nodes transparently Move content around transparently
Caveats Lots of detail missing –Request routing: HTTP redirection, DNS interception, IP hijacking –Content routing: application-level multicast, IP multicast Both request routing and content routing are nontrivial problems
Weird user-visible errors Routed to failed box –Content fails to appear –Depending on routing/caching, maybe no content from that domain ever appears again for that client
Making weird errors into not-so-weird errors Deploy “next-click failover” –Delivery nodes clustered into “supernodes” with switch –Supernode monitors failures –IP addresses of failed nodes remapped onto live nodes Result is similar to common Web behavior –“What the hey?” [click] “Oh, OK.”
Conclusion People want something that’s logically centralized, physically distributed But they don’t want the weird errors that come with distribution A great thing about the Web: –People are already used to some weird errors