Download presentation
Presentation is loading. Please wait.
Published byElfrieda Hunt Modified over 8 years ago
1
© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania Decentralized systems February 15, 2016
2
© 2016 A. Haeberlen, Z. Ives Announcements HW1 MS2 is due February 19 Try to finish a few days early (testing/debugging...) Another Basic Testing Guide will be available NOT an exhaustive list of all the features you need to implement! Some MS1 features will be tested again Please use the feedback from your MS1 grade report to improve your server (grade reports should be available later this week) Reading: Stoica et al., "Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications", SIGCOMM 2001 http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf 2 University of Pennsylvania
3
© 2016 A. Haeberlen, Z. Ives The road ahead Remember our goal: Understand large web systems like Google, Facebook,... So far, we have seen: "Frontend" technology Data representation, indexing Focus was on a single machine Coming up next: How to build large services with lots of machines... such as a crawler Main challenges: Scalability, robustness 3 University of Pennsylvania Salil S. (F0t0Synth), http://www.flickr.com/photos/ss2001/4531189792/
4
© 2016 A. Haeberlen, Z. Ives Plan for the next two lectures A few words on Java servlets Decentralization Partly centralized systems Example: BitTorrent Consistent hashing Distributed hashtables Fully decentralized systems KBR; Chord Pastry Attacks on KBR 4 University of Pennsylvania NEXT
5
© 2016 A. Haeberlen, Z. Ives 5 How do we distribute a B+ tree? We need to host the root at one machine and distribute the rest Implications for scalability? Consider building the index as well as searching What limits scalability? Implications for robustness? Consider benign faults, rational behavior, and malicious attacks
6
© 2016 A. Haeberlen, Z. Ives Problem: Centralized structure Some systems are fully or partly centralized Some nodes maintain important state that only they know Some nodes perform functions only they can perform Some 'players' contribute most or all of the resources Is this a good thing or a bad thing? Good: Simple, easier to get consistency,... Bad: Single point of failure, load imbalance,... Are there alternatives? Sometimes centralization is inherent (Example?) Sometimes it is a consequence of the system design If the latter, we can do something about it! 6 University of Pennsylvania
7
© 2016 A. Haeberlen, Z. Ives Approach: Decentralization How can we make systems less centralized? Idea #1: Utilize resources of ALL nodes ALL the nodes can help the system by contributing storage, bandwidth, computation,... Idea #2: Remove centralized components Avoid having individual nodes or systems that are crucial to the operation of the system 7 University of Pennsylvania
8
© 2016 A. Haeberlen, Z. Ives Spectrum of approaches 8 University of Pennsylvania Centralized Client/server Pastry, Gnutella BitTorrent, Skype,... Decentralized Partly centralized
9
© 2016 A. Haeberlen, Z. Ives Examples of deployed systems Examples of partly centralized systems: Skype (telephony) Akamai NetSession (content distribution) BitTorrent (content distribution) SETI@home/BOINC (volunteer computing) Amazon Dynamo (key-value store) Examples of decentralized systems: Freenet (censorship-resistant data store) Gnutella (file sharing) CoralCDN (content distribution) BGP (the Internet's interdomain routing system) NNTP and SMTP (news and mail distribution) 9 University of Pennsylvania
10
© 2016 A. Haeberlen, Z. Ives "P2P = File sharing" Some of the early P2P applications were used for file sharing (Napster, Gnutella,...) Some people even believe they are the same But P2P is not the same as file sharing! File sharing: A specific application P2P: A design principle for distributed systems And file sharing is not the only application! Other examples: Streaming media, telephony, content distribution, routing, volunteer computing,... 10 University of Pennsylvania
11
© 2016 A. Haeberlen, Z. Ives Recap: Decentralization Sometimes a single machine is not enough Several machines must work together to implement service Systems can be centralized to various degrees Is there a single machine, or a small set of machines, that do most of the work, or are involved in every single operation? Centralized or decentralized? Pro centralized: Simpler, easier to get consistency,... Pro decentralized: No single point of failure, load balance, scalability,... 11 University of Pennsylvania
12
© 2016 A. Haeberlen, Z. Ives Plan for the next two lectures A few words on Java servlets Decentralization Partly centralized systems Example: BitTorrent Consistent hashing Distributed hashtables Fully decentralized systems KBR; Chord Pastry Attacks on KBR 12 University of Pennsylvania NEXT
13
© 2016 A. Haeberlen, Z. Ives Characteristics of partly centralized systems Contains some centralized components Example: Central controller that maintains a list of participating nodes But: Centralized component is not involved in resource-intensive operations Example: Data is downloaded or uploaded directly to peers 13 University of Pennsylvania
14
© 2016 A. Haeberlen, Z. Ives An example Suppose we want to ship a DVD image to 10,000 clients. How do we do this? Option #1: Server does all the work Example: 1 Gbps upstream Need about 190 hours Option #2: Let the clients help 1 Mbps upstream x 10,000 = 10 Gbps! Even if the server has only 1 Mbps, can finish in 19 hours! 14 University of Pennsylvania 1 Gbps 1 Mbps x 1 x 10,000
15
© 2016 A. Haeberlen, Z. Ives Swarming 15 University of Pennsylvania Fixed-size pieces Client now has entire file, turns into a 'seeder' Node that originally has the file
16
© 2016 A. Haeberlen, Z. Ives Trackers and torrent files How do clients find peers to connect to? Clients connect to a special tracker node Tracker responds with the IP+port of a few other peers who are downloading the same file Modern BitTorrent clients are trackerless and use a DHT instead (more about this later) How do clients find the tracker? Clients begin by downloading a 'torrent file' (e.g., from a web server), which has the URL of the tracker Torrent file also contains a SHA1 hash of each file block Why is this needed? 16 University of Pennsylvania
17
© 2016 A. Haeberlen, Z. Ives BitTorrent Simplified BitTorrent session: 1. Download the 'torrent file' 2. Connect to the tracker and get a list of peers 3. Connect to the peers - initially as a 'leecher' 4. While file is not yet fully downloaded: Advertise to peers which blocks are available locally Request blocks from peers Compare hash of downloaded blocks to hash in torrent file (why?) 5. Turn into a 'seeder', i.e., continue uploading to peers without downloading 17 University of Pennsylvania
18
© 2016 A. Haeberlen, Z. Ives Incentives Many users would rather not upload content Some users pay per byte (e.g., cellular networks) Uploading may take bandwidth from other applications Upload traffic may introduce jitter or queueing delay (VoIP!) Danger: Tragedy of the commons Everyone wants to download, but nobody uploads Idea: Provide an incentive for uploading Many possible incentives (name a few!) BitTorrent's approach is based on reciprocity 18 University of Pennsylvania
19
© 2016 A. Haeberlen, Z. Ives Tit for tat Idea: Upload to peers with best download rate Result: Everyone has an incentive to upload Instance of an old, successful idea Goes back to Axelrod's tournament (iterated prisoner's dilemma) Attempts to achieve pareto optimality How this is used in BitTorrent: At any given time, peer uploads to a fixed # of other peers Peers are chosen based on current download rate All other peers are 'choked' (no uploads) Additionally, one peer is optimistically 'unchoked' (why?) 19 University of Pennsylvania
20
© 2016 A. Haeberlen, Z. Ives Other examples Akamai NetSession SETI@home End-system multicast 20 University of Pennsylvania
21
© 2016 A. Haeberlen, Z. Ives Recap: Partly centralized systems Contain a few centralized components Example: HDFS namenode, BitTorrent tracker,... However, most of the actual work is done by the peers Some pros and cons: More scalable than centralized systems 'Organic growth': More peers potentially means more demand, but also more resources But: Centralized component is single point of failure and eventually becomes a bottleneck 21 University of Pennsylvania
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.