cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
2 Introduction In this lecture, we will discuss: Content Distribution Networks Peer-to-Peer (P2P) networks. These are alternatives to the Client-Server architecture discussed in the last lecture. Some slides: Copyright James Kurose & Keith Ross.
3 Client-Server architecture Server provides content to clients on an always-on basis. Content consists of objects with names (URLs) –Content transferred includes both static and dynamic objects. Scalability problems: –We can only scale up by centralizing content storage –While possible, there are costs to this.
4 Web caches (proxy server) User sets browser: Web accesses via cache Browser sends all HTTP requests to cache –object in cache: cache –returns object –else cache requests object from origin server, then returns object to client Goal: To satisfy client request without involving origin server client Proxy server client HTTP request HTTP response HTTP request HTTP response origin server origin server
5 Web Caching (2) Cache acts as both client and server Cache can do up-to-date check using If-modified-since HTTP header –Issue: should cache take risk and deliver cached object without checking? –Heuristics are used. Typically cache is installed by ISP (university, company, residential ISP) Why Web caching? Reduce response time for client requests. Reduce traffic on an institution’s access link. Internet dense with caches enables “poor” content providers to effectively deliver content
6 Content Distribution Networks (CDNs) Dedicated collections of servers located strategically across a wide area Hold content on behalf of Content Providers Content usually comprises only static objects Proprietary architectures (closed systems)
7 Content Distribution Networks (CDNs) The content providers are the CDN customers. Content replication CDN company installs hundreds of CDN servers throughout Internet –In lower-tier ISPs, close to users CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates servers. Origin server in UK CDN distribution node CDN server in South America CDN server in USA CDN server in Asia
8 CDN Example origin server Distributes HTML Replaces: with HTTP request for DNS query for HTTP request for Origin server CDNs authoritative DNS server Nearby CDN server CDN company cdn.com Distributes gif files Uses its authoritative DNS server to route redirect requests
9 More about CDNs Routing requests CDN creates a “map”, indicating distances from leaf ISPs and CDN nodes When query arrives at authoritative DNS server: – server determines ISP from which query originates –uses “map” to determine best CDN server Not only Web pages Streaming stored audio/video Streaming real-time audio/video –CDN nodes create application-layer overlay network
10 Peer-to-Peer (P2P) Networks In these networks, any node may perform any function A node may be a client (requesting a service) and a server (providing that same service) Recently arisen as models for distribution of music and video, and increasingly used for content distribution Usually voluntary participation Content usually static objects (eg, music files, video files).
11 P2P File Sharing: Example Alice runs P2P client application on her notebook computer Intermittently connects to Internet; gets new IP address for each connection Asks for “Hey Jude” Application displays other peers that have copy of Hey Jude. Alice chooses one of the peers, Bob File is copied from Bob’s PC to Alice’s notebook While Alice downloads, other users uploading from Alice Alice’s peer is both a Web client and a transient Web server. All peers are servers = highly scalable!
12 P2P: Locating content How does Alice know where to look in the P2P network for “Hey Jude”? Several approaches: Centralized Directory Decentralized Directory Query Flooding.
13 P2P: Centralized Directory Example: Napster (original) 1) When peer connects, it informs central server: –IP address –content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob centralized directory server peers Alice Bob
14 P2P: Problems with centralized directory Single point of failure Performance bottleneck Copyright infringement Legal vulnerability File transfer is decentralized, but locating content is highly centralized.
15 P2P: Decentralized Directory Example: Kazaa Each peer is either a group leader or assigned to a group leader. Group leader tracks the content in all its children. Peer queries group leader; group leader may query other group leaders.
16 Decentralized directories Overlay network Peers are nodes Edges between peers and their group leaders Edges between some pairs of group leaders Virtual neighbors Bootstrap node A new peer connecting to the P2P network is either assigned to a group leader or designated as a group leader.
17 Decentralized directories (2) Advantages No centralized directory server –Location service distributed over peers –More difficult to shut down Disadvantages Bootstrap node needed –Vulnerable to shut-down Group leaders can get overloaded
18 P2P: Query Flooding Example: Gnutella No hierarchy Use bootstrap node to learn about other peers Send query to neighbors Neighbors forward query If queried peer has object, it sends message back to querying peer join
19 P2P: Query flooding (2) Advantages Peers have similar responsibilities: no group leaders Highly decentralized No peer maintains directory information Disadvantages Excessive query traffic Query radius: may not have content when present Bootstrap node Maintenance of overlay network.
20 Intelligent Query Flooding Research at University of Liverpool has explored intelligent query flooding methods Queries are sent only to nodes believed to know the answer or believed to be close to a node which knows the answer –This requires some notion of semantic distance. This can result in faster and more efficient searches
21 Comparison One study at University of Washington, USA, found: P2P accounts for more than 75% of all internet traffic Median size of P2P object is 3 orders of magnitude larger than median size of client-server object The fact that P2P is used to transfer large objects is the main reason that P2P accounts for such a great proportion of traffic The study of P2P networks is still in its infancy – there are many issues unknown.