What about the Network? CS 525 Spring 2009 Advanced Distributed Systems
End-to-End Arguments in System Design J.. Saltzer, D.P. Reed and D.D. Clark M.I.T. Laboratory for Computer Science Presented by: Abdullah Al-Nayeem
Where to Place Functionalities? Example: Reliable file transfer Should reliability be implemented per-hop by the communication subsystem? Or, end-to-end by host applications? 34/14/2009Department of Computer Science, UIUC
Where to Place Functionalities? Possible failures in file transfer: – Disk access failure (hardware) – Packet drop or duplicated packet (communication) – File system error (software) Communication subsystem cannot itself guarantee reliability. – Also increases network complexity – More overheads for applications that do not require reliability. Application layer can provide full reliability, even without any support from lower layers of the network. – End-to-end checksum and retry 44/14/2009Department of Computer Science, UIUC
End-to-End Argument (E2EA) The lower layers of the network are not the right place to implement application-specific functions – Move functions “up and out” “The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the end points of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible.” 54/14/2009Department of Computer Science, UIUC
Typical Examples Bit error recovery Security using encryption Duplicate message suppression Recovery from system crashes Delivery acknowledgement 64/14/2009Department of Computer Science, UIUC
Benefits of E2EA Core network can be simpler and faster Less assumptions required on the networks More flexibility in developing new network technologies and applications – Helped in proliferation of the Internet Dumb networks, intelligent hosts 74/14/2009Department of Computer Science, UIUC
Extension of E2EA Lower layers may implement partial application- specific functions, but only for performance improvements. – Reducing retries in data transmissions Should the level of reliability at the network be higher than the expected application reliability? What are the possible tradeoffs? – Short-term performance vs. long-term flexibility – Performance vs. cost 84/14/2009Department of Computer Science, UIUC
Identifying the Ends VoIP: Human user is the end-point File Transfer: Application is the end-point Only the end-points knows how to guarantee required reliability 4/14/2009Department of Computer Science, UIUC9 Voice over IP Voice Files File Transfer
Moving Away from E2EA Hosts are not always trustworthy – Security attacks, e.g. denial of service E2EA does not guarantee congestion control – Unfriendly host Communications are not always between two end- points – Multicast, broadcast How does the network handle these circumstances? 104/14/2009Department of Computer Science, UIUC
Other Issues ISP control, filtering, network monitoring Government interventions More subtle end points – Anonymous users using third-party services – Cloud computing entities (SaaS user, SaaS provider, Cloud provider) Do these factor imply the end of E2EA? 114/14/2009Department of Computer Science, UIUC
Summary End-to-End argument is not an absolute, but a design tool End-to-End argument can help in organizing “layered” communication systems. 124/14/2009Department of Computer Science, UIUC
Consensus Routing: The Internet as a Distributed System John P. John 1, Ethan Katz-Bassett 1, Arvind Krishnamurthy 1, Thomas Anderson 1, Arun Venkataramani 2 1 Dept. of Computer Science, Univ. of Washington, Seattle 2 University of Massachusetts Amherst 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2008 Presented by: Ahmed Khurshid
Motivation Internet routing protocols (both intra and inter domain) usually favors responsiveness over consistency – A new route is incorporated in the forwarding table before propagating the same to neighbors Results in routing loops and blackholes Usually there is no extra effort to ensure consensus – Solutions have been proposed for intra-domain routing 144/14/2009Department of Computer Science, UIUC
Motivation – Routing loop 15 Link failure causing BGP loops at 2 and 3 Policy change causing BGP loops at 2 and 3 when 4 withdraws a prefix from 2 and 3 but not 6 5: 1-5, 5: 4-5 5: : 4-5 5: prefers the path through 32 and 3 each prefer the other over 6 4/14/2009Department of Computer Science, UIUC Minimum Route Advertisement Interval (MRAI) Timer
Motivation – Blackhole 16 iBGP link recovery causing blackholes AP is prefered over CD Recovered 4/14/2009Department of Computer Science, UIUC CD
Consensus Routing A consistency first approach that cleanly separates safety and liveness of routing – Safety: All the routers use a consistent route towards a destination (i.e. no loops) – Liveness: Quick reaction to failures and policy changes Uses two simple ideas to ensure both consistent behavior and quick reaction 1.Runs a distributed coordination algorithm to ensure globally consistent view of routing state 2.Forwards packets using one of two logically distinct modes 174/14/2009Department of Computer Science, UIUC
Stable Mode Unlike BGP, consensus routing does not immediately incorporate a newly learned route into the forwarding table Periodically, all routers engage in a distributed coordination algorithm that determine the most recent set of complete updates The coordination is based on classical distributed snapshot and consensus algorithms Chandy-Lamport snapshot algorithm Paxos Output of the coordination is used to compute a set of stable forwarding tables (SFTs) that are guaranteed to be consistent SFTs replace traditional FIBs (Forwarding Information Base) 184/14/2009Department of Computer Science, UIUC
Stable Mode – Update Log 19 AC DEGF B HIJK Tier-1 Tier-2 Tier-3 (Stub) Users Store updates into the update log without modifying the SFT Route advertisement/withdrawal 4/14/2009Department of Computer Science, UIUC
Stable Mode – Distributed Snapshot 20 AC DEGF B HIJK Tier-1 Tier-2 Users Updates in the snapshot may be complete or incomplete Marker message Tier-3 (Stub) 4/14/2009Department of Computer Science, UIUC
Stable Mode – Aggregation 21 AC DEGF B HIJK Tier-1 Tier-2 Users Tier-1 ASes are good candidates for being consolidators Snapshots Tier-3 (Stub) Why? Better reachability Longevity Full mesh topology among the ASes 4/14/2009Department of Computer Science, UIUC Consolidators
Stable Mode – Consensus 22 AC DEGF B HIJK Tier-1 Tier-2 Users Consolidators run Paxos to agree upon a global view by extracting incomplete updates from the reported snapshots Paxos message Tier-3 (Stub) 4/14/2009Department of Computer Science, UIUC
Stable Mode – Flood 23 AC DEGF B HIJK Tier-1 Tier-2 Users Message contains the set of incomplete updates (I) and the set of ASes (S) that successfully responded to the snapshot Flooding message Tier-3 (Stub) 4/14/2009Department of Computer Science, UIUC
Stable Mode SFT Computation – SFT is computed using the global set of incomplete updates (I) and local logs – Routes involving ASes not present in S are not placed in the SFT 244/14/2009Department of Computer Science, UIUC What happens to those ASes? How does this strategy achieve consensus in an asynchronous system?
Router State Routing Information Base (RIB) – Stores for each prefix the most recent Route update received from each neighbor Locally selected best route Route advertised to each neighbor History – Stores for each prefix a chronological list of received and selected routes in the RIB Stable Forwarding Table (SFT) – Stores next hop interfaces corresponding to stable routes 254/14/2009Department of Computer Science, UIUC
Triggers Each update carries a trigger A trigger is a globally unique identifier for a set of causally related events propagating the network – It is a two-tuple: (AS number, trigger number) Triggers ease tracking updates and reduces control overhead in consensus routing A router ‘A’ stores all the received triggers in its local History Triggers under processing are temporarily stored in a local set I A 264/14/2009Department of Computer Science, UIUC
Distributed Coordination 27 During snapshot, router ‘A’ saves the sequence of triggers in local History as H A Prepare a set of incomplete triggers (I A ) that contains – All the triggers present in I A – Triggers waiting in the outgoing queues – Logged triggers received over incoming channels (after the start of the current snapshot round) H A and I A are sent to the consolidators 4/14/2009Department of Computer Science, UIUC
View Change 28 A BC DE Destination (Y)Source (X) Prefix - YABCDE k th SFTB->C->DC->DDY (k+1) th SFTB->C->EC->EEYY Use (k+1) th SFT Hasn’t finished computing (k+1) th SFT yet Use k th SFT Send packet to Y 4/14/2009Department of Computer Science, UIUC
Transient Mode Consensus routing switches to this mode when – The next-hop router along a stable route is unreachable – A stable route may not be available Uses several known schemes – Routing deflection – Detour Routing – Backup route 294/14/2009Department of Computer Science, UIUC
Route Deflection After encountering a failed link, deflect the packet to a neighboring AS after consulting RIB If no neighbor can be chosen, then deflect the packet back to the sending AS (backtracking) – However, backtracking alone is not sufficient to guarantee reachability (see figure) 30 Limitations of backtracking 4/14/2009Department of Computer Science, UIUC D 5-D D 1-5-D, 2-5-D, 3-5-D D D 5-D D
Other Transient Schemes Detour Routing – After encountering a failed link, select a neighboring AS (arbitrarily) and tunnel transient packets to it – Tier-1 ASes are good choices in this selection Backup Routes – Use pre-computed backup routes to forward packets during failure 314/14/2009Department of Computer Science, UIUC
Evaluation Simulation Methodology – CAIDA AS-level graphs gathered from RouteViews BGP tables Includes 23,390 ASes and 46,095 links annotated with inferred business relationships of the linked ASes Using XORP prototype to measure implementation overhead Using PlanetLab nodes to measure the cost of consensus 324/14/2009Department of Computer Science, UIUC
Link Failure One of the links of a multi-homed stub AS is failed during each experiment 33 Consensus routing provides significantly higher levels of connectivity than BGP 4/14/2009Department of Computer Science, UIUC
Effect of Traffic Engineering Withdraw a subprefix from all but one of the providers (3 or more) of a multi-homed AS 34 Consensus routing does not affect routing in case of policy changes 4/14/2009Department of Computer Science, UIUC
Overhead 35 In terms of bandwidth and time, consensus routing incurs little overhead Control traffic required by consensus routing Delay incurred by consensus routing 4/14/2009Department of Computer Science, UIUC
Discussion Points Selection of consolidators – Will Tier-1 ASes (or other ASes) agree to perform this additional duty? Slow ASes may face periods of disconnectivity – How to handle this situation? What can we say about completeness and accuracy of this strategy? Will ASes readily cooperate to handle transient packets? 364/14/2009Department of Computer Science, UIUC
CAIDA Tools Presented by: Abdullah Al-Nayeem
CAIDA The Cooperative Association for Internet Data Analysis (CAIDA) – San Diego Supercomputing Center (SDSC), UCSD CAIDA provides data, tools and analyses on Internet traffic for better understanding of – current and future network topology, routing, security, performance and economic issues. 4/14/200938Department of Computer Science, UIUC
CAIDA Tools Measurement – Tools for active or passive measurement of Internet traffic and flow patterns Utilities – Utilities to aid analysis of Internet traffic and flow patterns Visualization – Tools to visualize Internet data 4/14/200939Department of Computer Science, UIUC
Internet Measurement Infrastructure Archipelago (Ark): CAIDA’s next-generation active measurement infrastructure – An evolution of the skitter infrastructure 33 active monitors at different counties. 4/14/200940Department of Computer Science, UIUC
Scamper Measurement tool used at Ark monitors Teams of Scamper probers probe all routed /24's in a short period of time: – a random address in each /24 prefix is probed approximately every 48 hours (one probing cycle) – Supports ICMP-Paris, TCP, UDP traceroute Features: – Measures forward IP paths – Measures round-trip time – Discovers maximum transmission unit (MTU) length 4/14/200941Department of Computer Science, UIUC
Scamper Datasets IPv4 Routed /24 Topology Dataset – Useful for understanding the topology of internet IPv4 Routed /24 AS Links Dataset – contains Autonomous System (AS) links derived from the IP paths of the Topology Dataset – RouteViews BGP data is used to know the AS RouteViews 4/14/200942Department of Computer Science, UIUC
Visualization of IPv4 Internet Topology 1-17 Jan, ,853,991 IPv4 address 5,682,419 IP links 17,791 Ases Outdegree of an AS is the number of next-hop ASes that were observed accepting traffic from this AS 4/14/200943Department of Computer Science, UIUC
RRDTool Round Robin Database tool – A system to store and display time-series data – Network bandwidth, machine-room temperature, server load average, etc. Features: – Archives of fixed size for unlimited data – Overwrite old spots if full Limitations: – Can’t add data for past events – Can’t add data twice at the same timestamp 4/14/200944Department of Computer Science, UIUC
RRDTool (2) Example: Statistics for network interfaces 4/14/200945Department of Computer Science, UIUC
Beluga Provides a real-time graph of RTTs and packet loss to an end host Stanford to m-root-server (Tokyo) 4/14/200946Department of Computer Science, UIUC
Walrus Directed-graph visualization tool in 3D space A meaningful spanning tree is required, for better visualization. 4/14/200947Department of Computer Science, UIUC
Thanks Questions and Comments?