Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu http://www.ece.ubc.ca/~matei.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Advertisements

Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Mohamed Hafeeda, Ahsan Habib et al. Presented By: Abhishek Gupta.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Application Layer Overlays IS250 Spring 2010 John Chuang.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
CSE 190: Internet E-Commerce Lecture 16: Performance.
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
Object Naming & Content based Object Search 2/3/2003.
CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu.
P2P technologies, PlanetLab, and their relevance to Grid work Matei Ripeanu The University of Chicago.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
© 2009 AT&T Intellectual Property. All rights reserved. Multimedia content growth: From IP networks to Medianets Cisco-IEEE ComSoc Webinar. Sept. 23, 2009.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
Content Distribution March 8, : Application Layer1.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Gordon Kass CEO & President 919/ x26 Porivo Technologies Inc. Measuring end-to-end web performance.
P2P Architecture Case Study: Gnutella Network
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
1 BitTorrent System Efrat Oune Bar-Ilan What is BitTorrent? BitTorrent is a peer-to-peer file distribution system (built for intensive daily use.
1 P2P Computing. 2 What is P2P? Server-Client model.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
P2P Web Standard IS3734/19/10 Michael Radzin. What is P2P? Peer to Peer Networking (P2P) is a “direct communications initiations session.” Modern uses.
CH2 System models.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Identifying Application Impacts on Network Design Designing and Supporting Computer.
Introduction of P2P systems
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
What makes a network good? Ch 2.1: Principles of Network Apps 2: Application Layer1.
Network Technologies essentials Week 9: Distributed file sharing & multimedia Compilation made by Tim Moors, UNSW Australia Original slides by David Wetherall,
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
2: Application Layer 1 Chapter 2: Application layer r 2.1 Principles of network applications r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail  SMTP,
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Kiew-Hong Chua a.k.a Francis Computer Network Presentation 12/5/00.
Flashback: A Peer-to-Peer Web Server for Flash Crowds Presented by Tom Batkiewicz CS 587x Fall ‘07.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Peer to Peer Networking. Network Models => Mainframe Ex: Terminal User needs direct connection to mainframe Secure Account driven  administrator controlled.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
November 19, 2016 Guide:- Mrs. Kale J. S. Presented By:- Hamand Amol Sambhaji. Hamand Amol Sambhaji. Pardeshi Dhananjay Rajendra. Pardeshi Dhananjay Rajendra.
Coral: A Peer-to-peer Content Distribution Network
An example of peer-to-peer application
BitTorrent Vs Gnutella.
Principles of Network Applications
CHAPTER 3 Architectures for Distributed Systems
Peer to Peer Networking and Application
Distributed Content in the Network: A Backbone View
Internet and Web Simple client-server model
Content Distribution Networks + P2P File Sharing
Content Distribution Networks + P2P File Sharing
Presentation transcript:

Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Today’s Objectives Class mechanics http://www.ece.ubc.ca/~matei/EECE411/ Understand real-world applications in terms of: Motivation and objectives Resource requirements: compute/storage/network resources Architecture (“distributed systems” part) Examples: Recent p2p applications Start thinking of computer networks from the perspective of a networked-application Why? More intuitive EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications P2P Definition(s) Def 1: “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned on/off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between Lots of (P2P?) systems that fit nowhere … Def1: Emphasis: on what resources are integrated Problem: it is vague what one means by ‘edges’. A core network person would consider everything except routers and wires to be sitting on the edges of the network. Example: Seti@Home Def2: Emphasis: how resources are integrated architectural / organizational solution chosen to integrate resources Problem: quite restrictive: (1) most people use P2P term for a lot of applications that do not fit this definition; (2) applications –like Gnutella- that were fitting this definition are moving away from it. Vague again: “most communication is symmetric” Example: Gnutella, DHTs (CAN, Chord, Tapestry, Pastry) EECE 411: Design of Distributed Software Applications

P2P Impact: Widespread adoption Skype: 560M registered users (Q2’10) 120M active, 8M paying 15M user online Number of users for file-sharing applications (estimate www.slyck.com, Sept ‘06) P2P design techniques are now mainstream!   eDonkey 3,108,909 FastTrack (Kazaa) 2,114,120 Gnutella 2,899,788 Cvernet 691,750  Filetopia 3,405 EECE 411: Design of Distributed Software Applications

P2P Impact (2): Huge resource users P2P generated traffic now dominates the Internet load (30-50% of the traffic) Internet2 traffic statistics Cornell.edu (March ’02): 60% P2P EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications P2P Impact (3) – Demonstrate that volatile, small, non-proprietary resources can be efficiently harnessed Resources: CPU, storage space, But also: network bandwidth, availability, user attention and expertise Boinc statistics EECE 411: Design of Distributed Software Applications

P2P Impact (4) – Social / Business Data distribution at (almost) zero almost cost Forces companies to change their business models Digital content production and distribution Telecommunications companies New collaboration models Crowd-sourcing! EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Roadmap Definitions Impact Applications Mechanisms A case study EECE 411: Design of Distributed Software Applications

Applications: Number crunching Examples: Folding@Home, UnitedDevices, etc Characteristics (e.g., Folding@Home): Massive parallelism Low bandwidth/computation ratio Error tolerance Users do donate *real* resources Problems Centralized. Does it scale? Cheating! Approach suitable for a particular class of problems. How to extend the model to problems that are not massively parallel $1.5M / year extra consumed power $1.5M per year in consumed power EECE 411: Design of Distributed Software Applications

Applications: Content distribution (files, video) The ‘killer application’ to date Too many to list them all: BitTorrent, FastTrack (KaZaA, KazaaLite, iMesh), Gnutella (LimeWire,BearShare) Two independent problems Distributed index Fast content download Environment: unreliable, non-cooperative EECE 411: Design of Distributed Software Applications

Applications: Performance evaluation Poor online performance costs businesses $25 billion per year (Zone Research) 28% of attempted online purchases fail (BCG) Slow page download is the primary reason for transaction abandonment Business transactions are at particular risk User expectations for page download are around 4 seconds Performance evaluation & monitoring requires multiple vantage points Connectivity statistics Routing errors Evaluate Web-site performance form end-user perspective EECE 411: Design of Distributed Software Applications

Measurements: The Performance “Blind Spot” Back-end Infrastructure Network Landscape Last-mile “Blind Spot” Datacenter Testing “Beacon” Web server ISP Database Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring BMC Mercury Interactive Tivoli ProactiveNet HP OpenView Computer Associates Consumer User Keynote Systems Mercury Interactive BMC/SiteAngel Service Metrics Critical to estimate end-to-end performance EECE 411: Design of Distributed Software Applications Slide source: www.porivo.com

Measurements: End-to-end Performance Back-end Infrastructure Network Landscape Web server  ISP Database  Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring Consumer User End-to-end Web Performance Testing EECE 411: Design of Distributed Software Applications Slide source: www.porivo.com Slide source: www.porivo.com

EECE 411: Design of Distributed Software Applications More applications … Backup storage (HiveNet, OceanStore) Collaborative environments Spam filtering Anonymous email Censorship-resistant publishing systems (Ethernity, Freenet) EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Roadmap Definitions Impact Applications Mechanisms A Case Study EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Mechanisms (I) To obtain a resilient system: use redundancy for data and services integrate multiple components with uncorrelated failure curves. To reduce cost and improve the QoS delivered: move service delivery closer to the user integrate multiple clients with uncorrelated demand curves (lower over-provisioning at resource providers) EECE 411: Design of Distributed Software Applications

Example (I): Cooperative Web serving Other Server Origin Server www.matei.com Problem: Flash-crowds! dnssrv DNS Query Resolver Browser www.matei.com 216.165.108.10 EECE 411: Design of Distributed Software Applications

Example (I): Cooperative Web serving Origin Server httpprx  dnssrv httpprx Fetch data from nearby DNS Redirection Return proxy, preferably one near client Cooperative Web Caching Resolver Browser akamai.cnn.com 216.165.108.10 EECE 411: Design of Distributed Software Applications

Example (II): Server consolidation ibm.com external site (2001) Daily fluctuations (3x) Workday cycle Weekends off M T W Th F S S Light load: concentrate load on a minimal set of servers Step down surplus servers to low-power state Activate surplus servers on demand Optimization: place workload to optimize cooling efficiency CPU idle 93w CPU max 120w boot 136w disk spin 6-10w off/hib 2-3w work watts Idling consumes 60% to 70% of peak power demand. EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Dynamic Provisioning Static provisioning dedicates resources Typical of “co-lo” hosting Reprovision manually as needed But load is dynamic Must overprovision for surges High variable cost of capacity Need dynamic provisioning to achieve true economies of scale Load multiplexing Tradeoff cost vs. quality Service level agreements Dynamic resource acquisition EECE 411: Design of Distributed Software Applications

Power Management via MUSE: IBM Trace Run (Before) Power draw (watts) Latency (ms*50) Throughput (requests/s) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003) EECE 411: Design of Distributed Software Applications

Power Management via MUSE: IBM Trace Run (After) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003) EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Mechanisms (II) To detect anomalies, to generate good statistics: Use multiple views Example: Web-server performance characterization To provide anonymity: use large number of independent components (“hide in the crowd”) and make search impossible (or at least costly) Example: onion routing EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Roadmap Definitions Impact Uses and Examples Mechanisms A case study File sharing: The Gnutella Network & BitTorrent EECE 411: Design of Distributed Software Applications

Basic Primitives for File Sharing Join: How do I begin participating? Publish: How do I advertise my file(s)? Search: How do I find a file? Fetch: How do I retrieve a file? Lots of different solutions for each of these four primitives. EECE 411: Design of Distributed Software Applications

What makes these systems interesting? Large scale Self-organizing networks Fast growth Gnutella: more than 50x during first half of 2001; 50x again 2001 to 2006 Open architecture, simple and flexible protocols Interesting mix of social and technical issues EECE 411: Design of Distributed Software Applications

Gnutella search mechanism Boston Chicago MIT UBC Beatles: Yellow Submarine Q:Beatles Calgary Gnutella nodes TCP overlay tunnels Routers Search steps: Initiates search for “Yellow Submarine” Sends message to all neighbors Neighbors forward message Initiate reply message Reply message is back-propagated File download I want to explain you briefly how Gnutella network works: Gnutella nodes set up TCP tunnels to other existing Gnutella nodes. And all messages are forwarded on this overlay. If a node at UBC is looking for a Beatles album … creates a query message and the query is “flooded” into the network. We have build tools: to extract the topology of the Gnutella overlay, and to intercept the traffic. EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Gnutella: Overview Join: on startup, client contacts a few other nodes; these become its “neighbors” Publish: no need Search: Flooding: pass query to neighbors, who pass the query in turn to their own neighbors, and so on... Back-propagation in case of success Fetch: get the file directly from peer (HTTP) [Note: this was the original design. Later the network moved to a two-layer structure] EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications BitTorrent Ingredients A “seed” node that has the file A “.torrent” meta-file is built for the file A web-sever (usually) to index torrents A “tracker” node is associated with each file Identified in the .torrent File is split into fixed-size segments (e.g., 256KB) EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications How does it work Web page with link to .torrent A B C Peer Downloader “US” [Seed] [Leech] Tracker Web Server .torrent EECE 411: Design of Distributed Software Applications

Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Web Server EECE 411: Design of Distributed Software Applications

Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Response-peer list Web Server EECE 411: Design of Distributed Software Applications

Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Shake-hand Web Server EECE 411: Design of Distributed Software Applications

Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server EECE 411: Design of Distributed Software Applications

Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server EECE 411: Design of Distributed Software Applications

Overview – system components Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Response-peer list pieces Web Server EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications BitTorrent: Overview Join: nothing just find a server/community Publish: create ‘tracker’, spread .torrent file Search: for file: (not included in the protocol) the community is supposed to provide search tools for segments: exchange segment IDs maps with other peers. Fetch: exchange segments with other peers (HTTP) EECE 411: Design of Distributed Software Applications

Gnutella vs. BitTorrent: Discussion System properties Reliability? Scalability? Fairness? Overheads? Quality of Service Search coverage for content? Ability to download content fast? Ability to survive flash crowds? The rest of this course: How to build (distributed) systems with desirable characteristics. EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications Assignment 0 To do: Subscribe to mailing list EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications

Gnutella -- Network Resilience Topology Random 30% die Targeted 4% die from Saroiu et al., MMCN 2002 EECE 411: Design of Distributed Software Applications

Gnutella: Query distribution Highly heterogeneous distribution for query popularity similar to Web pages popularity  caching will work well from Kunwadee et al., 2002 EECE 411: Design of Distributed Software Applications

Gnutella: Topology issues (1) 56kbps Modem 10Mbps LAN 1.5Mbps DSL EECE 411: Design of Distributed Software Applications

Gnutella Topology Mismatch EECE 411: Design of Distributed Software Applications