Part III: Overlays, peer-to-peer

Part III: Overlays, peer-to-peer
Jinyang Li In addition to my own contributions, many of the slides are borrowed liberally from networking class notes from Robert Morris, Hari Balakrishnan, David Andersen and Nick Feamster

Overlays are everywhere
Internet is an overlay on top of telephone networks Overlays: a network on top of Internet Endpoints (instead of routers) are nodes Multi-hop paths among routers are links Instant deployment!

What can overlays do? Routing New applications
Improve routing robustness (e.g. convergence speed) Multicast Anonymous communication New applications Peer-to-peer file sharing and lookup Content distribution networks Peer-to-peer live streaming Your imagination is the limit

Why overlays? Internet is ossified
IPv6 proposed in 1992, still not widely deployed Multicast (1988), QoS (early 90s) etc. Avoid burdening routers with new features End hosts are cheap and capable Copy and store files Perform expensive cryptographic operations Perform expensive coding/decoding operations …

Today’s class Overlays that take over routers’ jobs
Resilient Overlay Networks (RON) Application-level multicast (NICE)

RON’s motivation Internet routing is not reliable Paxson 95-97
3.3% of all routes had serious problems Labovitz 97-00 10% of routes available < 95% of the time 65% of routes available < 99.9% of the time 3-min minimum detection+recovery time; often 15 mins 40% of outages took 30+ mins to repair Chandra 01 5% of faults last more than 2.75 hours Paxson’s study measures 40,000 end-to-end routes

Internet routing is unsatisfactory
Slow in detecting outage and recovery Unable to use multiple redundant paths Unable to detect badly performing paths Applications have no control of paths BGP must be scalable Topology information is highly summarized (due to policy requirements and scalability requirements) Routing updates must be damped to prevent oscillation Do not respond to traffic conditions (to prevent oscillation) Multihome only recovers slowly Q: Why can’t we fix BGP? Q2: Hasn’t multi-homing already solved the fault tolerance problem?

BGP converges slowly Given a failure, can take up to 15 minutes to see BGP. Sometimes, not at all. [Feamster]

RON in a nutshell What failures?
A small set of (<100) nodes) Scalable BGP-based IP routing substrate What failures? Outages: configuration/software error, broken links Performance failures: severe congestion, Dos attacks

RON’s goals Fast failure detection and recovery
Detect & fail-over within seconds Applications influence path selection Applications define failures Applications define path metrics Expressive and fine-grained policies Who and what applications are allowed to use what paths

Why would RON work? RON routes around many link “failures”
RON testbed study (2003): About 60% of failures within two hops of edge RON testbed study (2003): About 60% of failures within two hops of the edge RON routes around many link “failures” If exists a node whose paths to S, D doe not contain failed link RON cannot route around access link failure

RON Design Nodes in Different ASes RON library Forwarder Conduit
Performance Database Prober Router Link-state routing protocol, disseminates info using RON! Application-specific routing tables Policy routing module

RON reduces loss rate 30-min avg loss rate on Internet
30-min avg loss rate with RON RON loss rate is never more than 30%

RON routes around failures
30-minute average loss rates Loss Rate RON Better No Change RON Worse 10% 479 57 47 20% 127 4 15 30% 32 50% 20 80% 14 100% 10 Show as hours, not samples? 6,825 “path hours” represented here 5 “path hours” of 100% loss (complete outage) 38 “path hours” of TCP outage (>= 30% loss) RON routed around all of these! One indirection hop provides almost all the benefit!

Resilience Against DoS Attacks

Throughput Improvement
5%

Lessons of RON End hosts know better about performance and outages than routers Internet routing trades off scalability for performance and fast failover A small amount of redundancy goes a long way

RON’s tradeoff BGP Scalability Performance (fast convergence etc.)
Flexibility (application specific metric & policy) BGP ??? Routing overlays (e.g., RON)

Open Questions Efficiency Scaling
generates redundant traffic on access links Scaling Probing traffic is O(N^2) Can a RON be made to scale to > 50 nodes? Is a 1000 node RON much better than 50-node? Interaction of overlays and IP network Interaction of multiple overlays

Application level multicast
A.k.a. overlay multicast End host multicast

Why multicast? Send the same stream of data to many hosts
Internet radio/TV/conference Stock quote dissemination Multiplayer network games An efficient way to send data to many hosts Multicast is at packet granularity

Naïve approach is wasteful
Sender’s outgoing link carries n copies of data 128Kbps mp3 stream, 10,000 listeners = 1.28Gbps

IP multicast service model
Mimic LAN broadcast Anyone can send, everyone hears Use multicast address (2^28 addresses) Each address is called a “group” End hosts register with routers to receive packets

Basic multicast techniques
Construct trees Why trees? (why not meshes?) How many trees? Shared vs. source specific trees Criteria of a “good” tree? Who build trees? Routers vs. end hosts

IP multicast Routers construct multicast trees for packet replication and forwarding Efficient (low latency, no dup pkts on links)

IP multicast: Augmenting DV
How to broadcast using DV routing tables without loops? Idea: shortest paths from S to all nodes form a tree RPF protocol: A router duplicates and forwards all packets if they arrive via the shortest path to S

Reverse path flooding (RPF)
a: a, 0 b: b, 1 c: c, 10 d: c, 11 c: c, 1 d: d, 0 a: a, 1 b: b, 0 d: c, 2 a: a, 10 c: c, 0 d: d, 1 a 1 d b 10 1 1 c C does not forward packets from A and vice versa However, link a <--> c sees two packets

Reverse path broadcast (RPB)
RPF causes every ‘upstream’ routers on a LAN (link) to send a copy RPB: only one router sends a copy Routers listen to each others’ DV advertisements Only the one with lowest hopcount sends

IP multicast: augmenting DV
Requires symmetric paths Needs to prune unnecessary broadcast packets to achieve multicast [Deering et. Al. SIGCOMM 1988, TOCS 1990]

IP multicast: augmenting LS
Basic LS: each router floods with changes in link state LS w/ multicast: routers monitor local multicast group membership and changes result in flooding Routers use Dijkstra to compute SP trees How expensive to compute trees for N nodes, E edges, G groups?

IP multicast has not taken off
Requires support from routers Do ISPs have incentives to support multicast? Not scalable Routers keep state for every active group! Multicast group addresses cannot be aggregated Group membership changes much more frequently than links going up and down Difficult to provide congestion/flow control, reliability and security

Overlay multicast No change to IP infrastructure needed
Multicast code run on end hosts End hosts can copy&store data No change to IP infrastructure needed Easy to implement complex functionalities: flow control, security, layered multicast etc. Less efficient: higher delay, duplicate pkts per link

Overlay multicast challenge
How can hosts form an efficient tree? Hosts do know all that routers know What’s wrong with a random tree? Stretch: packets travel farther than have to Stress: packets traverse links multiple times A particular concern with access links and cross country links

Bad tree vs good tree

Cluster-based trees (NICE)
Reside in 1 cluster Reside in 2 clusters Reside in 3 clusters A hierarchy of clusters Cluster consists of [k,3k-1] members Log N depth

Cluster-based trees (NICE)
Each node knows all members of its cluster(s)

Cluster-based trees Cluster nodes according to latency Not perfect
packets do not travel too far out of the way Not perfect Packets are sent to cluster heads (who are in the middle) so might overshoot

NICE in action How to join a hierarchy? How to split/merge clusters?
Which is the right cluster? How long does join take? How to split/merge clusters? What if a cluster head fails?

When do clustering not work well?
Cogent MCI MIT Harvard Boston U MIT & Harvard peers with each other Key assumption: low latency is transitive As a node descends tree to join, assumes children of close-by cluster head are also close-by

What did you learn today?

Lessons Where should a functionality reside? Routers vs. end hosts
Scalability vs. Performance Flexibility Instant deployment! Routers Efficiency

Project draft report You should be able to reuse your draft for the final report You should have complete related work by now You should have a complete plan Most of the system design Most of the experiment designs If you have preliminary graphs, use them, try to explain them

The sandwich method for explanation
An easy example illustrating the basic idea Detailed explanations of challenges and how your system addresses them Does it work in general environments? Projector problem: contact andrew case WWH1022

Part III: Overlays, peer-to-peer

Similar presentations

Presentation on theme: "Part III: Overlays, peer-to-peer"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part III: Overlays, peer-to-peer

Similar presentations

Presentation on theme: "Part III: Overlays, peer-to-peer"— Presentation transcript:

Similar presentations

About project

Feedback