Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Today’s Objectives Class mechanics http://www.ece.ubc.ca/~matei/EECE411/ Understand real-world applications in terms of: Motivation and objectives Resource requirements: compute/storage/network resources Architecture (“distributed systems” part) Examples: Recent p2p applications Start thinking of computer networks from the perspective of a networked-application Why? More intuitive EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications P2P Definition(s) Def 1: “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned on/off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between Lots of (P2P?) systems that fit nowhere … Def1: Emphasis: on what resources are integrated Problem: it is vague what one means by ‘edges’. A core network person would consider everything except routers and wires to be sitting on the edges of the network. Example: Seti@Home Def2: Emphasis: how resources are integrated architectural / organizational solution chosen to integrate resources Problem: quite restrictive: (1) most people use P2P term for a lot of applications that do not fit this definition; (2) applications –like Gnutella- that were fitting this definition are moving away from it. Vague again: “most communication is symmetric” Example: Gnutella, DHTs (CAN, Chord, Tapestry, Pastry) EECE 411: Design of Distributed Software Applications
P2P Impact: Widespread adoption Skype: 560M registered users (Q2’10) 120M active, 8M paying 15M user online Number of users for file-sharing applications (estimate www.slyck.com, Sept ‘06) P2P design techniques are now mainstream! eDonkey 3,108,909 FastTrack (Kazaa) 2,114,120 Gnutella 2,899,788 Cvernet 691,750 Filetopia 3,405 EECE 411: Design of Distributed Software Applications
P2P Impact (2): Huge resource users P2P generated traffic now dominates the Internet load (30-50% of the traffic) Internet2 traffic statistics Cornell.edu (March ’02): 60% P2P EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications P2P Impact (3) – Demonstrate that volatile, small, non-proprietary resources can be efficiently harnessed Resources: CPU, storage space, But also: network bandwidth, availability, user attention and expertise Boinc statistics EECE 411: Design of Distributed Software Applications
P2P Impact (4) – Social / Business Data distribution at (almost) zero almost cost Forces companies to change their business models Digital content production and distribution Telecommunications companies New collaboration models Crowd-sourcing! EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Roadmap Definitions Impact Applications Mechanisms A case study EECE 411: Design of Distributed Software Applications
Applications: Number crunching Examples: Folding@Home, UnitedDevices, etc Characteristics (e.g., Folding@Home): Massive parallelism Low bandwidth/computation ratio Error tolerance Users do donate *real* resources Problems Centralized. Does it scale? Cheating! Approach suitable for a particular class of problems. How to extend the model to problems that are not massively parallel $1.5M / year extra consumed power $1.5M per year in consumed power EECE 411: Design of Distributed Software Applications
Applications: Content distribution (files, video) The ‘killer application’ to date Too many to list them all: BitTorrent, FastTrack (KaZaA, KazaaLite, iMesh), Gnutella (LimeWire,BearShare) Two independent problems Distributed index Fast content download Environment: unreliable, non-cooperative EECE 411: Design of Distributed Software Applications
Applications: Performance evaluation Poor online performance costs businesses $25 billion per year (Zone Research) 28% of attempted online purchases fail (BCG) Slow page download is the primary reason for transaction abandonment Business transactions are at particular risk User expectations for page download are around 4 seconds Performance evaluation & monitoring requires multiple vantage points Connectivity statistics Routing errors Evaluate Web-site performance form end-user perspective EECE 411: Design of Distributed Software Applications
Measurements: The Performance “Blind Spot” Back-end Infrastructure Network Landscape Last-mile “Blind Spot” Datacenter Testing “Beacon” Web server ISP Database Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring BMC Mercury Interactive Tivoli ProactiveNet HP OpenView Computer Associates Consumer User Keynote Systems Mercury Interactive BMC/SiteAngel Service Metrics Critical to estimate end-to-end performance EECE 411: Design of Distributed Software Applications Slide source: www.porivo.com
Measurements: End-to-end Performance Back-end Infrastructure Network Landscape Web server ISP Database Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring Consumer User End-to-end Web Performance Testing EECE 411: Design of Distributed Software Applications Slide source: www.porivo.com Slide source: www.porivo.com
EECE 411: Design of Distributed Software Applications More applications … Backup storage (HiveNet, OceanStore) Collaborative environments Spam filtering Anonymous email Censorship-resistant publishing systems (Ethernity, Freenet) EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Roadmap Definitions Impact Applications Mechanisms A Case Study EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Mechanisms (I) To obtain a resilient system: use redundancy for data and services integrate multiple components with uncorrelated failure curves. To reduce cost and improve the QoS delivered: move service delivery closer to the user integrate multiple clients with uncorrelated demand curves (lower over-provisioning at resource providers) EECE 411: Design of Distributed Software Applications
Example (I): Cooperative Web serving Other Server Origin Server www.matei.com Problem: Flash-crowds! dnssrv DNS Query Resolver Browser www.matei.com 216.165.108.10 EECE 411: Design of Distributed Software Applications
Example (I): Cooperative Web serving Origin Server httpprx dnssrv httpprx Fetch data from nearby DNS Redirection Return proxy, preferably one near client Cooperative Web Caching Resolver Browser akamai.cnn.com 216.165.108.10 EECE 411: Design of Distributed Software Applications
Example (II): Server consolidation ibm.com external site (2001) Daily fluctuations (3x) Workday cycle Weekends off M T W Th F S S Light load: concentrate load on a minimal set of servers Step down surplus servers to low-power state Activate surplus servers on demand Optimization: place workload to optimize cooling efficiency CPU idle 93w CPU max 120w boot 136w disk spin 6-10w off/hib 2-3w work watts Idling consumes 60% to 70% of peak power demand. EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Dynamic Provisioning Static provisioning dedicates resources Typical of “co-lo” hosting Reprovision manually as needed But load is dynamic Must overprovision for surges High variable cost of capacity Need dynamic provisioning to achieve true economies of scale Load multiplexing Tradeoff cost vs. quality Service level agreements Dynamic resource acquisition EECE 411: Design of Distributed Software Applications
Power Management via MUSE: IBM Trace Run (Before) Power draw (watts) Latency (ms*50) Throughput (requests/s) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003) EECE 411: Design of Distributed Software Applications
Power Management via MUSE: IBM Trace Run (After) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003) EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Mechanisms (II) To detect anomalies, to generate good statistics: Use multiple views Example: Web-server performance characterization To provide anonymity: use large number of independent components (“hide in the crowd”) and make search impossible (or at least costly) Example: onion routing EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Roadmap Definitions Impact Uses and Examples Mechanisms A case study File sharing: The Gnutella Network & BitTorrent EECE 411: Design of Distributed Software Applications
Basic Primitives for File Sharing Join: How do I begin participating? Publish: How do I advertise my file(s)? Search: How do I find a file? Fetch: How do I retrieve a file? Lots of different solutions for each of these four primitives. EECE 411: Design of Distributed Software Applications
What makes these systems interesting? Large scale Self-organizing networks Fast growth Gnutella: more than 50x during first half of 2001; 50x again 2001 to 2006 Open architecture, simple and flexible protocols Interesting mix of social and technical issues EECE 411: Design of Distributed Software Applications
Gnutella search mechanism Boston Chicago MIT UBC Beatles: Yellow Submarine Q:Beatles Calgary Gnutella nodes TCP overlay tunnels Routers Search steps: Initiates search for “Yellow Submarine” Sends message to all neighbors Neighbors forward message Initiate reply message Reply message is back-propagated File download I want to explain you briefly how Gnutella network works: Gnutella nodes set up TCP tunnels to other existing Gnutella nodes. And all messages are forwarded on this overlay. If a node at UBC is looking for a Beatles album … creates a query message and the query is “flooded” into the network. We have build tools: to extract the topology of the Gnutella overlay, and to intercept the traffic. EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Gnutella: Overview Join: on startup, client contacts a few other nodes; these become its “neighbors” Publish: no need Search: Flooding: pass query to neighbors, who pass the query in turn to their own neighbors, and so on... Back-propagation in case of success Fetch: get the file directly from peer (HTTP) [Note: this was the original design. Later the network moved to a two-layer structure] EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications BitTorrent Ingredients A “seed” node that has the file A “.torrent” meta-file is built for the file A web-sever (usually) to index torrents A “tracker” node is associated with each file Identified in the .torrent File is split into fixed-size segments (e.g., 256KB) EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications How does it work Web page with link to .torrent A B C Peer Downloader “US” [Seed] [Leech] Tracker Web Server .torrent EECE 411: Design of Distributed Software Applications
Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Web Server EECE 411: Design of Distributed Software Applications
Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Response-peer list Web Server EECE 411: Design of Distributed Software Applications
Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Shake-hand Web Server EECE 411: Design of Distributed Software Applications
Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server EECE 411: Design of Distributed Software Applications
Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server EECE 411: Design of Distributed Software Applications
Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Response-peer list pieces Web Server EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications BitTorrent: Overview Join: nothing just find a server/community Publish: create ‘tracker’, spread .torrent file Search: for file: (not included in the protocol) the community is supposed to provide search tools for segments: exchange segment IDs maps with other peers. Fetch: exchange segments with other peers (HTTP) EECE 411: Design of Distributed Software Applications
Gnutella vs. BitTorrent: Discussion System properties Reliability? Scalability? Fairness? Overheads? Quality of Service Search coverage for content? Ability to download content fast? Ability to survive flash crowds? The rest of this course: How to build (distributed) systems with desirable characteristics. EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications Assignment 0 To do: Subscribe to mailing list EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications
Gnutella -- Network Resilience Topology Random 30% die Targeted 4% die from Saroiu et al., MMCN 2002 EECE 411: Design of Distributed Software Applications
Gnutella: Query distribution Highly heterogeneous distribution for query popularity similar to Web pages popularity caching will work well from Kunwadee et al., 2002 EECE 411: Design of Distributed Software Applications
Gnutella: Topology issues (1) 56kbps Modem 10Mbps LAN 1.5Mbps DSL EECE 411: Design of Distributed Software Applications
Gnutella Topology Mismatch EECE 411: Design of Distributed Software Applications