Chapter 2 Application Layer - 2

Chapter 2 Application Layer - 2
CMPT 371 Data Communications and Networking Chapter 2 Application Layer - 2 2: Application Layer

Chapter 2 outline 2.1 Principles of app layer protocols
2.2 Web and HTTP 2.3 FTP 2.4 Electronic Mail SMTP, POP3, IMAP 2.5 DNS 2.6 Content distribution Network Web caching Content distribution networks P2P file sharing 2.7 Cloud 2: Application Layer

Content Distribution Problem of a single server Content Distribution
Bottleneck, single point of failure, … Content Distribution Distribute (Replicate) contents at different place Direct requests to appropriate places 2: Application Layer

Client-side Caching 2: Application Layer

Limit of Client-side Caching
Not shared ! 2: Application Layer

Web caches (proxy server)
Goal: satisfy client request without involving origin server user sets browser: Web accesses via proxy browser sends all HTTP requests to proxy object in cache: cache returns object else cache requests object from origin server, then returns object to client Proxy: both client and server; typically installed by ISP (university, company, residential ISP) origin server Proxy server HTTP request HTTP request client HTTP response HTTP response HTTP request HTTP response client origin server 2: Application Layer

More about Web caching Why Web caching?
Reduce response time for client request. Reduce traffic on an institution’s access link. Internet dense with caches enables “poor” content providers to effectively deliver content 2: Application Layer

More about Web caching Why is caching effective for Web, even if cache space is quite limited ? Zipf distribution 2: Application Layer

Caching example (1) origin Assumptions servers
average object size = 100,000 bits avg. request rate from institution’s browser to origin serves = 15/sec delay from institutional router to any origin server and back to router = 2 sec Consequences utilization on LAN = 15% utilization on access link = 100% ! total delay = Internet delay + access delay + LAN delay = 2 sec + some minutes + some milliseconds public Internet 1.5 Mbps access link institutional network 10 Mbps LAN institutional cache 2: Application Layer

Caching example (2) origin Possible solution servers
increase bandwidth of access link to, say, 10 Mbps Consequences (if 10 Mbps) utilization on LAN = 15% utilization on access link = 15% Total delay = Internet delay + access delay + LAN delay = 2 sec + some msecs + some milliseconds often a costly upgrade public Internet upgraded from 1.5 to 10 Mbps institutional network 10 Mbps LAN institutional cache 2: Application Layer

Caching example (3) origin servers Install cache Consequence
suppose hit rate is .4 Consequence 40% requests will be satisfied almost immediately 60% requests satisfied by origin server utilization of access link reduced to 60%, resulting in negligible delays (say 10 msec) total delay = Internet delay + access delay + LAN delay = 60%*(2 sec+0.01 sec + some milliseconds)+ 40%* some milliseconds < 1.3 secs public Internet 1.5 Mbps access link institutional network 10 Mbps LAN institutional cache 2: Application Layer

More about Web caching Problem of Web caching
Extra space/machine (proxy) Inconsistency (out-of-date objects…) 2: Application Layer

Consistency of Cached Objects
Solution 1: no caching 2: Application Layer

Consistency of Cached Objects
Solution 2: Manually update 2: Application Layer

If-modified-since: <date> If-modified-since: <date>
Conditional GET client server Goal: don’t send object if client has up-to-date cached version client: specify date of cached copy in HTTP request If-modified-since: <date> server: response contains no object if cached copy is up-to-date: HTTP/ Not Modified HTTP request msg If-modified-since: <date> object not modified HTTP response HTTP/1.0 304 Not Modified HTTP request msg If-modified-since: <date> object modified HTTP response HTTP/ OK <data> 2: Application Layer

Hierarchical cache How to measure cache ? Calculation 99.5%
origin server How to measure cache ? Hit Ratio (HR) Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ? 99.5% Proxy Server 1 client Proxy Server 2 client 2: Application Layer

Hierarchical cache How to measure cache ? Calculation
origin server How to measure cache ? Hit Ratio (HR) Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ? How about average delay? Proxy Server 1 client Proxy Server 2 client 2: Application Layer

Content distribution networks (CDNs)
origin server in North America Ping What did you see ? CDN distribution node CDN server in S. America CDN server in Asia CDN server in Europe 2: Application Layer

Content distribution networks (CDNs)
origin server in North America Content replication CDN company (e.g., Akamai) installs hundreds of CDN servers throughout Internet in lower-tier ISPs, close to users Content providers (e.g., Netflix) are the CDN company’s customers. CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates servers CDN distribution node CDN server in S. America CDN server in Asia CDN server in Europe 2: Application Layer

CDN example origin server CDN company www.foo.com cdn.com
HTTP request for DNS query for 1 2 3 Origin server CDNs authoritative DNS server Nearby CDN server origin server distributes HTML Replaces: with CDN company cdn.com distributes gif files uses its authoritative DNS server to route redirect requests 2: Application Layer

More about CDNs routing requests Caching vs. CDN
CDN creates a “map”, indicating distances from leaf ISPs and CDN nodes when query arrives at authoritative DNS server: server determines ISP from which query originates uses “map” to determine best CDN server Caching vs. CDN Pull: passive Push: active 2: Application Layer

Client-server architecture
always-on host permanent IP address server farms for scaling clients: communicate with server may be intermittently connected may have dynamic IP addresses do not communicate directly with each other client/server 2: Application Layer

Pure P2P architecture no always-on server
arbitrary end systems directly communicate peers are intermittently connected and change IP addresses self scalability – new peers bring new resources Three topics: File distribution Searching for information Case Study: BitTorrent, Skype peer-peer 2: Application Layer

P2P file sharing Alice chooses one of the peers, Bob.
File is copied from Bob’s PC to Alice’s notebook: HTTP While Alice downloads, other users uploading from Alice. Alice’s peer is both a Web client and a transient Web server All peers are servers = highly scalable! Example Alice runs P2P client application on her notebook computer Intermittently connects to Internet Asks for “X.mp3” Application displays other peers that have copy of X.mp3. 2: Application Layer

File Distribution: Server-Client vs P2P
Question : How much time to distribute file from one server to N peers? us: server upload bandwidth Server ui: peer i upload bandwidth u1 d1 u2 d2 us di: peer i download bandwidth File, size F dN Network (with abundant bandwidth) uN 2: Application Layer

File distribution time: server-client
Server transmission: must sequentially sends (upload) N copies: NF/us time Client: each must download the file client i takes F/di time to download F u2 u1 d1 d2 us Network (with abundant bandwidth) dN uN = Dcs = max { NF/us, F/min(di) } i Time to distribute F to N clients using client/server approach increases linearly in N (for large N) 2: Application Layer

File distribution time: P2P
Server Server transmission: must upload at least one copy: F/us time Client: each must down a copy client i takes F/di time to download but also share (upload) Clients (peers): as a whole must download NF bits fastest possible overall download rate: us + Sui F u2 u1 d1 d2 us Network (with abundant bandwidth) dN uN increases linearly in N (for large N) ? DP2P = max { F/us, F/min(di) , NF/(us + Sui) } i 2: Application Layer

Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us 2: Application Layer

File distribution: BitTorrent
P2P file distribution tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer Alice arrives … … obtains list of peers from tracker … and begins exchanging file chunks with peers in torrent 2: Application Layer

BitTorrent (1) file divided into 256KB chunks. peer joining torrent:
has no chunks, but will accumulate them over time registers with tracker to get list of peers, connects to subset of peers (“neighbors”) while downloading, peer uploads chunks to other peers. peers may come and go: churn once peer has entire file, it may (selfishly) leave or (altruistically) remain 2: Application Layer

BitTorrent (2) Sending Chunks: tit-for-tat
Alice sends chunks to four neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10 secs every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join top 4 “optimistically unchoke” Requesting Chunks at any given time, different peers have different subsets of file chunks periodically, a peer (Alice) asks each neighbor for list of chunks that they have. Alice sends requests for her missing chunks rarest first 2: Application Layer

BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster! 2: Application Layer

P2P Case study: Skype inherently P2P: pairs of users communicate.
Skype clients (SC) inherently P2P: pairs of users communicate. proprietary application-layer protocol (inferred via reverse engineering) hierarchical overlay with SNs Index maps usernames to IP addresses; distributed over SNs Skype login server Supernode (SN) 2: Application Layer

Peers as relays Problem when both Alice and Bob are behind “NATs”.
NAT prevents an outside peer from initiating a call to insider peer (see later) Solution: Using Alice’s and Bob’s SNs, Relay is chosen Each peer initiates session with relay. Peers can now communicate through NATs via relay 2: Application Layer

P2P: searching for information
So many files But, where are they ? Index in P2P system: maps information to peer location (location = IP address & port number) 2: Application Layer

P2P: centralized directory
directory server peers Alice Bob 1 2 3 original “Napster” design 1) when peer connects, it informs central server: IP address content 2) Alice queries for “X.mp3” 3) Alice requests file from Bob 2: Application Layer

P2P: problems with centralized directory
Single point of failure Performance bottleneck Copyright infringement file transfer is decentralized, but locating content is highly centralized 2: Application Layer

P2P: decentralized directory
Each peer is either a group leader or assigned to a group leader. Group leader tracks the content in all its children. Peer queries group leader; group leader may query other group leaders. 2: Application Layer

More about decentralized directory
advantages of approach no centralized directory server location service distributed over peers more difficult to shut down disadvantages of approach bootstrap node needed group leaders can get overloaded 2: Application Layer

P2P: Query flooding Gnutella no hierarchy
use bootstrap node to learn about others join message Send query to neighbors Neighbors forward query If queried peer has object, it sends message back to querying peer join 2: Application Layer

P2P: more on query flooding
Pros peers have similar responsibilities: no group leaders highly decentralized no peer maintains directory info Cons excessive query traffic query radius: may not have content when present bootstrap node maintenance of overlay network 2: Application Layer

DHT: A New Story… Motivation:
Frustrated by popularity of all these “half-baked” P2P apps We can do better! Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node 2: Application Layer

P2P: Content Addressing (Hash Routing)
Given an object identifier I, calculate its hash value H=hash(I), and (hopefully) find it (or its location info) in peer H Not a new idea Load balancing – hash IP address, re-direct to different servers application put(key, data) get (key) data hash table node …. 2: Application Layer

Hash Routing What’s new in P2P? Two alternatives
Node can cache each (existing) object that hashes within its range Pointer-based: level of indirection - node caches pointer to location(s) of object What’s new in P2P? Dynamic overlay peer join/leave number of peers is not fixed Traditional hash function doesn’t work SHA-1 0-999 2: Application Layer

Distributed Hash Table (DHT)
Challenges For each object, node(s) whose range(s) cover that object must be reachable via a “short” path # neighbors for each node should scale well (e.g., should not be O(N)) Fully distributed (no centralized bottleneck/single point of failure) DHT mechanism should gracefully handle nodes joining/leaving need to repartition the range space over existing nodes need to reorganize neighbor set need bootstrap mechanism to connect new nodes into the existing DHT infrastructure 2: Application Layer

Case Studies Structure overlay (p2p) systems – Consistent Hashing
Chord CAN (Content Addressable Network) Key Questions Q1: How is hash space divided “evenly” among existing nodes? Q2: How is routing implemented that connects an arbitrary node to the node responsible for a given object? Q3: How is the hash space repartitioned when nodes join/leave? Let N be the number of nodes in the overlay Let H be the size of the range of the hash function (when applicable) 2: Application Layer

Chord Associate to each node and file a unique id in an uni-dimensional space (a Ring) E.g., pick from the range [0...2m-1] Usually the hash of the file or IP address Properties: Routing table size is O(log N) , where N is the total number of nodes Guarantees that a file is found in O(log N) hops from MIT in 2001 2: Application Layer

Consistent Hashing K5 N105 K20 Circular ID space N32 N90 K80
Key 5 K5 Node 105 N105 K20 Circular ID space N32 N90 K80 A key is stored at its successor: node with next higher ID (Key – Hashed value of a file identifier) 2: Application Layer

Chord Basic Lookup N120 N10 N105 N32 K80 N90 N60 “Where is key 80?”
“N90 has K80” K80 N90 Just need to make progress, and not overshoot. Will talk about initialization later. And robustness. Now, how about speed? N60 2: Application Layer

Chord “Finger Table” 1/4 1/2 1/8 1/16 1/32 1/64 1/128 N80 Small tables, but multi-hop lookup. Table entries: IP address and Chord ID. Navigate in ID space, route queries closer to successor. Log(n) tables, log(n) hops. Route to a document between ¼ and ½ … Entry i in the finger table of node n is the first node that succeeds or equals n + 2i In other words, the ith finger points 1/2n-i way around the ring 2: Application Layer

Chord Join Assume a hash space [0..7] Node n1 joins 1 7 6 2 5 3 4
Succ. Table i id+2i succ 1 7 6 2 5 3 4 2: Application Layer

Chord Join Node n2 joins 1 7 6 2 5 3 4 Succ. Table i id+2i succ 0 2 2
i id+2i succ 1 7 6 2 Succ. Table i id+2i succ 5 3 4 2: Application Layer

Chord Join Nodes n0, n6 join 1 7 6 2 5 3 4 Succ. Table i id+2i succ
Nodes n0, n6 join Succ. Table i id+2i succ 1 7 Succ. Table i id+2i succ 6 2 Succ. Table i id+2i succ 5 3 4 2: Application Layer

Chord Join Nodes: n1, n2, n0, n6 Keys: f7, f1 1 7 6 2 5 3 4
Succ. Table Key i id+2i succ 7 Nodes: n1, n2, n0, n6 Keys: f7, f1 Succ. Table Key 1 7 i id+2i succ 1 Succ. Table 6 2 i id+2i succ Succ. Table i id+2i succ 5 3 4 2: Application Layer

Chord Routing Succ. Table Key i id+2i succ 7 Upon receiving a query for file id, a node first calculates the key (Hash id) Checks whether stores the key locally If not, forwards the query to the largest node in its successor table that does not exceed the key Succ. Table Key 1 7 i id+2i succ 1 query(7) Succ. Table 6 2 i id+2i succ Succ. Table i id+2i succ 5 3 4 2: Application Layer

Chord Summary Routing table size? Routing time?
Log N fingers Routing time? Each hop expects to 1/2 the distance to the desired key => expect O(log N) hops. Note: so far only the basic Chord; many practical issues remain (not covered in this course though …) 2: Application Layer

A few words about BitCoin (and other digital/virtual currency)
Two key issues for a currency Generation (where does it come from ?) Distribution (how to use it, i.e., buy/sell transactions?) BitCoin – open source p2p currency Mining (hashing) Verification (blockchain) 2: Application Layer

Chapter 2: Summary Our study of network apps now complete!
application service requirements: reliability, bandwidth, delay client-server paradigm Internet transport service model connection-oriented, reliable: TCP unreliable, datagrams: UDP specific protocols: HTTP FTP SMTP, POP, IMAP DNS content distribution caches, CDNs P2P Cloud 2: Application Layer

Chapter 2: Summary More importantly: learned about protocols
typical request/reply message exchange: client requests info or service server responds with data, status code message formats: headers: fields giving info about data data: info being communicated control vs. data msgs in-band, out-of-band centralized vs. decentralized stateless vs. stateful reliable vs. unreliable msg transfer “complexity at network edge” – many protocols security: authentication 2: Application Layer

Chapter 2 Application Layer - 2

Similar presentations

Presentation on theme: "Chapter 2 Application Layer - 2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 2 Application Layer - 2

Similar presentations

Presentation on theme: "Chapter 2 Application Layer - 2"— Presentation transcript:

Similar presentations

About project

Feedback