Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer to Peer Networking and Application

Similar presentations


Presentation on theme: "Peer to Peer Networking and Application"— Presentation transcript:

1 Peer to Peer Networking and Application
Hongfei Yan Question: What is (Wireless / Computer) Networking? Answer: In the world of computers, networking is the practice of linking two or more computing devices together for the purpose of sharing data. Networks are built with a mix of computer hardware and computer software. Refer to Matei Ripeanu‘s slides

2 Today’s Objectives Understand real-world applications in terms of:
Motivation and objectives Requirements: compute/storage/network resources Architecture (“distributed systems” part) Examples: Recent p2p applications Start thinking of computer networks from the perspective of a networked-application More intuitive

3 P2P Definition(s) Def 1: “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned on/off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between Lots of (P2P?) systems that fit nowhere … Def1: Emphasis: on what resources are integrated Problem: it is vague what one means by ‘edges’. A core network person would consider everything except routers and wires to be sitting on the edges of the network. Example: Def2: Emphasis: how resources are integrated architectural / organizational solution chosen to integrate resources Problem: quite restrictive: (1) most people use P2P term for a lot of applications that do not fit this definition; (2) applications –like Gnutella- that were fitting this definition are moving away from it. Vague again: “most communication is symmetric” Example: Gnutella, DHTs (CAN, Chord, Tapestry, Pastry)

4 P2P Impact: Widespread adoption
Skype: >300M users KaZaA – 389 millions downloads (1M/week) one of the most popular applications ever! Number of users for file-sharing applications (estimate Sept ‘06) eDonkey 3,108,909 FastTrack (Kazaa) 2,114,120 Gnutella 2,899,788 Cvernet 691,750  Filetopia 3,405

5 P2P Impact (2): Huge resource users
P2P generated traffic now dominates the Internet load Internet2 traffic statistics Cornell.edu (March ’02): 60% P2P UChicago estimate (March ‘01): Gnutella control traffic about 1% of all Internet traffic.

6 Floating Point Operations
P2P Impact (3) – Demonstrate that underused resources can be efficiently harnessed Resources: CPU, storage space, But also: network bandwidth, availability, user attention and expertise statistics Total Last 24 Hours Users 4,236,090 2,365 Results received 764M 1.13M Total CPU time 1.3 M years 1.3 K years Floating Point Operations e+21 e+18 (51.40 TeraFLOPs/sec)

7 P2P Impact (4) – Social / Business
Data copying and distribution at (almost) zero almost cost Might force a companies to change their business models Digital content production and distribution Telecomunications companies But what business models? P2P is great for ‘volunteer’ computing, yet its applicability is unclear for infrastructures

8 Roadmap Definitions Impact Applications Mechanisms
An In-depth Case Study

9 Applications: Number crunching
Examples: Entropia, UnitedDevices, etc Characteristics (for Massive parallelism Low bandwidth/computation ratio Fixed-rate data processing task Error tolerance Users do donate *real* resources Problems Centralized. Does it scale? How to prevent cheating? Approach suitable for a particular class of problems. How to extend the model to problems that are not massively parallel $1.5M / year extra consumed power $1.5M per year in consumed power

10 Applications: File sharing & content distribution
The ‘killer application’ to date Too many to list them all: BitTorrent, Napster, FastTrack (KaZaA, KazaaLite, iMesh), Gnutella (LimeWire, Morpheus, BearShare) Two independent problems Distributed index Fast content download Environment: unreliable, non-cooperative

11 Applications: Performance evaluation
Performance evaluation & monitoring requires multiple measurement points Connectivity statistics Routing errors Evaluate Web-site performance form end-user perspective Poor online performance costs businesses $25 billion per year (Zone Research) 28% of attempted online purchases fail (BCG) Slow page performance is the primary reason for transaction abandonment Business transactions are at particular risk Eight seconds was kind of the threshold, but that’s a ridiculous notion now. The real expectations are around 4 seconds

12 Measurements: The Performance “Blind Spot”
Back-end Infrastructure Network Landscape Last-mile “Blind Spot” Datacenter Testing “Beacon” Web server ISP Database Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring BMC Mercury Interactive Tivoli ProactiveNet HP OpenView Computer Associates Consumer User Keynote Systems Mercury Interactive BMC/SiteAngel Service Metrics Critical to estimate end-to-end performance Slide source:

13 Measurements: End-to-end Performance
Back-end Infrastructure Network Landscape Web server Database ISP Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring Consumer User End-to-end Web Performance Testing Slide source: Slide source:

14 More applications … Backup storage (HiveNet, OceanStore)
Instant messaging (Yahoo, AOL) Collaborative environments Spam filtering Anonymous Censorship-resistant publishing systems (Ethernity, Freenet)

15 Roadmap Definitions Impact Uses and Examples Mechanisms
An In-depth Case Study

16 Mechanisms (I) To obtain a resilient system:
integrate multiple components with uncorrelated failure curves. use replication for data and services to move the service ‘closer’ to the user

17 Example: Content Delivery Networks
Origin Server ? httpprx ? dnssrv httpprx Fetch data from nearby DNS Redirection Return proxy, preferably one near client Cooperative Web Caching Resolver Browser akamai.cnn.com

18 Mechanisms (II) To improve the quality of the service delivered and reduce costs: integrate multiple providers with uncorrelated demand curves (lower over-provisioning at each of them) move service delivery closer to the user

19 Example: Server consolidation
Averages utilization 40% Mainframes are idle 90% Unix servers are idle 95% PC servers are idle But peak utilizations are … 0-15% Mainframes are idle in peak-hour 70% PC servers are idle in peak-hour Source: “Grid Computing” Dr Daron G Green

20 Load Is Dynamic ibm.com external site February 2001
Daily fluctuations (3x) Workday cycle Weekends off M T W Th F S S World Cup soccer site May-June 1998 Seasonal fluctuations Event surges (11x) Week

21 For Example: Energy-Conscious Provisioning
CPU idle 93w CPU max 120w boot 136w disk spin 6-10w off/hib 2-3w work watts Idling consumes 60% to 70% of peak power demand. Light load: concentrate traffic on a minimal set of servers Step down surplus servers to low-power state Activate surplus servers on demand Even smarter: also manage air conditioning

22 EECE 411: Design of Distributed Software Applications
Mechanisms (III) To provide anonymity: use large number of independent components (“hide in the crowd”) and make search impossible (or at least costly) Example: onion routing EECE 411: Design of Distributed Software Applications

23 Mechanisms (IV) To detect anomalies, to generate good statistics:
Use multiple views Example: Web-server performance characterization

24 Roadmap Definitions Impact Uses and Examples Mechanisms
An In-depth Case Study File sharing – The Gnutella Network

25 Basic Primitives for File Sharing
Join: How do I begin participating? Publish: How do I advertise my file(s)? Search: How do I find a file? Fetch: How do I retrieve a file? Lots of different solutions for each of these four primitives. We’ll look at Gnutella network.

26 What makes Gnutella network interesting?
Large scale today up to 2M nodes, 1000TB data, 100M files today Self-organizing network Fast growth in its early stages more than 50 times during first half of 2001 (50 times again 2001 to 2006) Open architecture, simple and flexible protocol Interesting mix of social and technical issues EECE 411: Design of Distributed Software Applications

27 Gnutella search mechanism
Boston Chicago MIT UBC Beatles: Yellow Submarine Q:Beatles Calgary Gnutella nodes TCP overlay tunnels Routers Search steps: Initiates search for “Yellow Submarine” Sends message to all neighbors Neighbors forward message Initiate reply message Reply message is back-propagated File download I want to explain you briefly how Gnutella network works: Gnutella nodes set up TCP tunnels to other existing Gnutella nodes. And all messages are forwarded on this overlay. If a node at UBC is looking for a Beatles album … creates a query message and the query is “flooded” into the network. We have build tools: to extract the topology of the Gnutella overlay, and to intercept the traffic. EECE 411: Design of Distributed Software Applications

28 EECE 411: Design of Distributed Software Applications
Gnutella: Overview Join: on startup, client contacts a few other nodes; these become its “neighbors” Publish: no need Search: Flooding: ask neighbors, who is their neighbors, and so on... when/if found, reply to sender. Back-propagation in case of success Fetch: get the file directly from peer (HTTP) [Note: this was the original design. Later the network moved to a two-layer structure] EECE 411: Design of Distributed Software Applications

29 EECE 411: Design of Distributed Software Applications
Gnutella: Summary Gnutella: self-organizing, large-scale, P2P application based on a hierarchical overlay network. It works! Growth hindered by inefficient resource use. I’ll summarize the this presentation: Some solutions to help the network scale: Organize the overlay network to match the underlying infrastructure topology. Investigate methods for reducing traffic (query routing/filtering, better information organization). Exploit locality in user interest  small world network (vorbit despre proiectul nostru de la Chicago) Exploit caches  all while maintaining the self-organizing characteristics EECE 411: Design of Distributed Software Applications

30 P2P: Summary Huge impact
Many architecture styles -- pros and cons for each centralized, flooding, swarming, Lessons learned: Single points of failure are bad Flooding messages to everyone does not scale Underlying network topology is important Not all nodes are equal Need incentives to discourage freeloading Privacy and security are important

31 References Eng Keong Lua, Jon Crowcroft, Marcelo Pias, Ravi Sharma and Steven Lim, "A survey and comparison of peer-to-peer overlay network schemes", IEEE Communications Surveys & Tutorials, (7)2: 22-73, Apr., 2005 An excellent survey of modern peer-to-peer systems, covering structured as well as unstructured networks. This paper forms a good introduction for those wanting to get deeper into the subject but do not really know where to start.


Download ppt "Peer to Peer Networking and Application"

Similar presentations


Ads by Google