CMPE 252A : Computer Networks

CMPE 252A : Computer Networks
Chen Qian Computer Engineering UCSC Baskin Engineering Application layer

Submission site Reading reports: canvas Survey: canvas Project: gitlab
Introduction

Internet protocol stack
application: supporting network applications FTP, SMTP, HTTP transport: process-process data transfer TCP, UDP network: routing of datagrams from source to destination IP, routing protocols link: data transfer between neighboring network elements Ethernet, (WiFi), PPP physical: bits “on the wire” application transport network link physical

Encapsulation source destination application transport network link
message M application transport network link physical segment Ht M Ht datagram Ht Hn M Hn frame Ht Hn Hl M link physical switch destination network link physical Ht Hn M Ht Hn Hl M M application transport network link physical Ht Hn M Ht M Ht Hn M router Ht Hn Hl M

Application layer Application-layer architectures
Case study: BitTorrent P2P Search and Distributed Hash Table (DHT)

Some network apps e-mail web instant messaging remote login
P2P file sharing multi-user network games streaming stored video clips voice over IP real-time video conferencing VoD (Youtube) Facebook Twitter Cloud ….

Application architectures
Client-server Peer-to-peer (P2P) Hybrid of client-server and P2P

Client-server architecture
always-on host permanent IP address server farms for scaling clients: communicate with server (speak first) may have dynamic IP addresses do not communicate directly with each other client/server Asymmetric protocol is easier to design than a symmetric protocol Server usually has a well-known port number for a particular service, so it is easy for client to go first (necessary) Port numbers 0 to 1023 are reserved for well-known services In P2P, a peer has to get address and port number of another peer from a super node or a tracker - a bootstrap protocol (perhaps based upon client-server paradigm) to get started!

Pure P2P architecture no always-on server
arbitrary end systems directly communicate peers are intermittently connected and change IP addresses Highly scalable but difficult to manage peer-peer NAT traversal problem – see wikipedia – to talk about later Each peer is like one of today client machines – may not have permanent IP address, not always ON Routing is hard (flooding), even finding IP address of another peer is hard Client-server interaction still being used. But each peer has both client and server software. Scalable in the sense that adding a new peer adds new demand, but also new capacity, but management complexity may not be scalable (e.g. use of flooding)

Hybrid of client-server and P2P
Skype voice-over-IP P2P application centralized server: finding address of remote party: client-client connection: direct (not through server) Direct P2P could be difficult due to NAT in firewalls – need a hack to get around - NAT traversal problem – see wikipedia While chatting between two users can be P2P, most IM services use a server to relay messages – message can be stored, not confidential! Also, chatting P2P without a server will encounter difficulty in crossing a firewall, which may disallow high numbered ports. From Wiki - Skype also routes calls through other Skype peers on the network to ease the crossing of Symmetric NATs and firewalls. This, however, puts an extra burden on those who connect to the Internet without NAT, as their computers and network bandwidth may be used to route the calls of other users.

File Distribution: Server-Client vs P2P
Question : How much time to distribute file from one server to N peers? us: server upload capacity (bps) Server ui: peer i upload capacity (bps) u1 d1 u2 d2 us di: peer i download capacity (bps) File, size F dN Network (with abundant bandwidth) uN

File distribution time: server-client
server sequentially sends N copies: NF/us time client i takes F/di time to download F u2 u1 d1 d2 us Network (with abundant bandwidth) dN uN = dcs >= max { NF/us , F/min(di) } i Time to distribute F to N clients using client/server approach For the second term, assume that the server uploads to clients in a round robin fashion so that clients can download concurrently??? increases linearly in N (for large N) 12

File distribution time: P2P
Server server must send one copy: F/us time client i takes F/di time to download NF bits must be downloaded (aggregate) F u2 u1 d1 d2 us Network (with abundant bandwidth) dN uN fastest possible upload rate: us + Sui The summation over u_i from 1 to N. Think of the network as a large buffer – for the third term, all clints need to be uploading at the same time as well as the server, which can be accomplished in a manner similar to pipelining in our datagram packet switching diagram The last term is small when N is small, but as N increases, it will converge to F divided by the average of the upload rate, which is smaller than u_s and because, upload is always slower than download, smaller than d_min. In the limit as N becomes large, the last term determines the value of d_p2p dP2P >= max { F/us , F/min(di) , NF/(us + Sui) } i 13

Client-server vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us In the limit as N goes to infinity, the average upload rate decreases to u, the minimum distribution time in the limit goes to F/u = 1 hour Assumption of just one server – what if we use N servers, the distribution time will not go up, but the cost goes up.

File distribution: BitTorrent
P2P file distribution torrent: group of peers exchanging chunks of a file tracker: tracks peers participating in torrent obtain list of peers trading chunks peer Initially download a metafile from a content provider, Try to connect to a torrent The metafile specifies name and size of a file, as well as finger prints of data blocks in the file, as well as a tracker server for the torrent  above not in Kurose and Ross book Later in DHT section, KR mentions that BitTorrent uses the Kademlia DHT to create a distributed tracker. The key is the torrent identifier. By querying the DHT with the key a new peer can determine the peer responsible for tracking the peers in the torrent. Each torrent has an infrastructure node called a tracker. When a peer joins a torrent, it registers itself with the tracker and periodically informs the tracker that it is still in the torrent. A given torrent may be tens of peers or more than a thousand peers. When Alice joins the tracker selects a subset of peers (say 50) from the set of participating peers and sends their IP addresses to Alice. Then Alice attempts to open concurrent TCP connections with all of the peers on the list. The ones she succeeds in establishing TCP connections are her neighboring peers. At any time, each peer has a subset of chunks of the file. Alice will ask each of her neighboring peers for a list of the chunks they have.

BitTorrent (1) file divided into 256KB chunks. peer joining torrent:
registers with tracker to get list of peers, connects to subset of peers (“neighbors”) has no chunks, but will accumulate them over time while downloading, peer uploads chunks to other peers. peers may come and go once a peer has entire file, it may (selfishly) leave or (altruistically) remain

BitTorrent (2) Sending Chunks: tit-for-tat
Alice sends chunks to four neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10 secs every 30 secs: randomly select another peer, starts sending chunks “optimistically unchoke” Pulling Chunks at any given time, different peers have different subsets of file chunks periodically, a peer (Alice) asks each neighbor for list of chunks that they have. Alice sends requests for her missing chunks rarest first In deciding which chunks to request, Alice determines from among the chunks she does not have, the chunks that are rarest among her neighbors (chunks with the fewest repeated copies among her neighbors) and then request those rarest chunks first. Peers capable of uploading at compatible rates tend to find each other. All other neighbors beside the top four and one selected random neighbor are said to be choked. Tit-fpr-tat eliminates free-riding nodes. Different BitTorrent clients, some of which try to game the system. Modulo TCP effects and assuming last-hop bottleneck links, each peer provides an equal share of its available upload capacity to peers to which it is actively sending data. We refer to this rate throughout the paper as a peer’s equal split rate. This rate is determined by the upload capacity of a particular peer and the size of its active set. In the official reference implementation of BitTorrent, active set size is proportional to upload capacity although in other popular BitTorrent clients, this size is static.

Finding a better trading partner
(1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers Optimistic unchoke serves to bootstrap new peers into the tit-for-tat game as well as to facilitate discovery of new, potentially better sources of data. With higher upload rate, can find better trading partners & get file faster!

P2P: searching for information
“Index” in P2P system: maps information to peer location (location = IP address & port number) “Index” may be a server, or may be a distributed system – it is an abstraction on this slide Each file that is shared using eMule is hashed using the MD4 algorithm. The top-level MD4 hash, file size, filename, and several secondary search attributes such as bit rate and codec are stored on eD2k servers and the serverless Kad network. The Kad network is a peer-to-peer (P2P) network which implements the Kademlia P2P overlay protocol. The majority of users on the Kad Network are also connected to servers on the eDonkey network, and Kad Network clients typically query known nodes on the eDonkey network in order to find an initial node on the Kad network.

P2P: centralized index original “Napster” design
directory server peers Alice Bob 1 2 3 original “Napster” design 1) when peer connects, it informs central server: IP address content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob

P2P: problems with centralized directory
single point of failure performance bottleneck scalability copyright infringement: “target” of lawsuit is obvious file transfer is decentralized, but locating content is highly centralized Google shows that a single server farm is reliable and scalable, so the main issue is legal

Query flooding overlay network: graph fully distributed
edge between peer X and Y if there’s a TCP connection all active peers and edges form overlay net edge: virtual (not physical) link given peer typically connected with < 10 overlay neighbors fully distributed no central server used by Gnutella Each peer indexes the files it makes available for sharing (and no other files) Public-domain file-sharing application, protocol spec is minimal, leaving significant flexibility in how the client is implemented

Query flooding Query message sent over existing TCP connections
File transfer: HTTP Query message sent over existing TCP connections peers forward Query message QueryHit sent over reverse path Query QueryHit A peer-count field in query message is set to a specific limit (say, 7) The QueryHit message follows the reverse path of the Query message, using preexisting TCP connections – need Query message ID and state information to do so Client later sets up a direct TCP connection for download using HTTP GET – note that this path is outside of the overlay network, may have difficulty if the node to download from is behind a firewall Scalability: limited scope flooding

Hierarchical Overlay Hybrid of centralized index, query flooding approaches each peer is either a super node or assigned to a super node TCP connection between peer and its super node. TCP connections between some pairs of super nodes. Super node tracks content in its children Typically, a group leader has a few hundred children peers Each group leader is an ordinary peer, but with higher bandwidth and connectivity than other peers An overlay network among group leaders—this overlay network is similar to Gnutella’s network in that a group leader can forward queries of its children to other group leaders (limited scope flooding also?) Who wants to be the super node?

Distributed Hash Table (DHT)
Next lecture

CMPE 252A : Computer Networks

Similar presentations

Presentation on theme: "CMPE 252A : Computer Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CMPE 252A : Computer Networks

Similar presentations

Presentation on theme: "CMPE 252A : Computer Networks"— Presentation transcript:

Similar presentations

About project

Feedback