CS 4700 / CS 5700 Network Fundamentals

Slides:



Advertisements
Similar presentations
CS 4700 / CS 5700 Network Fundamentals Lecture 15: Content Delivery Networks (Over 1 billion served … each day) Revised 10/22/2014.
Advertisements

1 Server Selection & Content Distribution Networks (slides by Srini Seshan, CS CMU)
CS 4700 / CS 5700 Network Fundamentals Lecture 13: Middleboxes and NAT (Duct tape for IPv4) Revised 3/9/2013.
On the Effectiveness of Measurement Reuse for Performance-Based Detouring David Choffnes Fabian Bustamante Fabian Bustamante Northwestern University INFOCOM.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
Anycast Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
Web Content Delivery Networks Yogesh Bhumralkar. CDN: Motivations zCongestion in the Internet. zWeb Servers sometimes become overloaded due to too many.
Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
Content Delivery Networks (CDN) Dr. Yingwu Zhu Reverse Proxy Reverse Proxy Reverse Proxy Intranet Web Cache Architecure Browser Local ISP cache L4 Switch.
Content Distribution Networks (CDNs) Mike Freedman COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
CSE 534 – Fundamentals of Computer Networks Lecture 11: Content Delivery Networks (Over 1 billion served … each day) Based on slides by D. NEU.
1. 1.Charting the CDNs(locating all their content and DNS servers). 2.Assessing their server availability. 3.Quantifying their world-wide delay performance.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
Akamai vs. Flash Crowds and Distributed Denial of Service Akamai Technologies & Carnegie Mellon Bruce Maggs.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
How Akamai Handles Large Events Bruce Maggs Carnegie Mellon Duke Akamai Technologies.
CDN: Content Distribution Networks  References:  CS613 textbook, “Computer Networking – A Top-Down Approach”, 6 th edition. Chapter  The text.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Content Distribution Networks (CDNs)
Engineering a Content Delivery Network Bruce Maggs.
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.
Drafting Behind Akamai (Travelocity-Based Detouring) Ao-Jan Su, David R. Choffnes, Aleksandar Kuzmanovic and Fabián E. Bustamante Department of EECS Northwestern.
By: Hunza, Omar and Anum Chapter 4 pg(76-79).
CS 4700 / CS 5700 Network Fundamentals
Lecture 13 – Network Mapping
Lab A: Planning an Installation
Content Distribution Networks
REPLICATION & LOAD BALANCING
Coral: A Peer-to-peer Content Distribution Network
Engineering a Content Delivery Network
Content Distribution Networks
Caching Temporary storage of frequently accessed data (duplicating original data stored somewhere else) Reduces access time/latency for clients Reduces.
Semester 4 - Chapter 3 – WAN Design
CS 4700 / CS 5700 Network Fundamentals
Large Distributed Systems
LECTURE 34: WEB PROGRAMMING FOR SCALE
Utilization of Azure CDN for the large file distribution
ECE 671 – Lecture 16 Content Distribution Networks
Distributed Content in the Network: A Backbone View
Managing Online Services
Overlay Networking Overview.
Distributed Systems CS
LECTURE 32: WEB PROGRAMMING FOR SCALE
LECTURE 33: WEB PROGRAMMING FOR SCALE
Edge computing (1) Content Distribution Networks
NTHU CS5421 Cloud Computing
Content Distribution Networks
AWS Cloud Computing Masaki.
HTTP and Abstraction on the Internet / The Need for DNS
It Followed Me Home: Exploring Strong Last Hop Devices and CDNs
CSCI-351 Data communication and Networks
Engineering a Content Delivery Network
Content Delivery and Remote DNS services
EE 122: Lecture 22 (Overlay Networks)
Mobile IP Outline Homework #4 Solutions Intro to mobile IP Operation
Mobile IP Outline Intro to mobile IP Operation Problems with mobility.
LECTURE 33: WEB PROGRAMMING FOR SCALE
AKAMAI Content Delivery Services
CSCI-351 Data communication and Networks
Engineering a Content Delivery Network
The Evolution of a Content Delivery Network: A 21-Year Perspective
Presentation transcript:

CS 4700 / CS 5700 Network Fundamentals Christo Wilson 8/22/2012 CS 4700 / CS 5700 Network Fundamentals Lecture 15: Content Delivery Networks (Over 1 billion served … each day) Revised 10/19/2016 Defense

Outline Motivation CDN basics Prominent example: Akamai

Content in today’s Internet Most flows are HTTP Web is at least 52% of traffic Median object size is 2.7K, average is 85K (as of 2007) HTTP uses TCP, so it will Be ACK clocked For Web, likely never leave slow start Is the Internet designed for this common case? Why?

Evolution of Serving Web Content In the beginning… …there was a single server Probably located in a closet And it probably served blinking text Issues with this model Site reliability Unplugging cable, hardware failure, natural disaster Scalability Flash crowds (aka Slashdotting, Reddit hug, …)

Replicated Web service Christo Wilson 8/22/2012 Replicated Web service Internet Use multiple servers Advantages Better scalability Better reliability Disadvantages How do you decide which server to use? How to do synchronize state among servers? Defense

Christo Wilson 8/22/2012 Load Balancers Internet Device that multiplexes requests across a collection of servers All servers share one public IP Balancer transparently directs requests to different servers How should the balancer assign clients to servers? Random / round-robin When is this a good idea? Load-based When might this fail? Challenges Scalability (must support traffic for n hosts) State (must keep track of previous decisions) RESTful APIs reduce this limitation Defense

Load balancing: Are we done? Advantages Allows scaling of hardware independent of IPs Relatively easy to maintain Disadvantages Expensive Still a single point of failure Location! Where do we place the load balancer for Wikipedia?

Popping up: HTTP performance For Web pages RTT matters most Where should the server go? For video Available bandwidth matters most Is there one location that is best for everyone?

Server placement

Why speed matters Impact on user experience Users navigating away from pages Video startup delay

Why speed matters Impact on user experience Impact on revenue Users navigating away from pages Video startup delay Impact on revenue Amazon: increased revenue 1% for every 100ms reduction in PLT Shopzilla:12% increase in revenue by reducing PLT from 6 seconds to 1.2 seconds Ping from BOS to LAX: ~100ms

Strawman solution: Web caches ISP uses a middlebox that caches Web content Better performance – content is closer to users Lower cost – content traverses network boundary once Does this solve the problem? No! Size of all Web content is too large Zipf distribution limits cache hit rate Web content is dynamic and customized Can’t cache banking content What does it mean to cache search results?

Outline Motivation CDN basics Prominent example: Akamai

What is a CDN? Content Delivery Network Primary Goals Also sometimes called Content Distribution Network At least half of the world’s bits are delivered by a CDN Probably closer to 80/90% Primary Goals Create replicas of content throughout the Internet Ensure that replicas are always available Directly clients to replicas that will give good performance

Key Components of a CDN Distributed servers Usually located inside of other ISPs Often located in IXPs (coming up next) High-speed network connecting them Clients (eyeballs) Can be located anywhere in the world They want fast Web performance Glue Something that binds clients to “nearby” replica servers

Key CDN Components 16

Examples of CDNs Akamai Limelight Edgecast Others 147K+ servers, 1200+ networks, 650+ cities, 92 countries Limelight Well provisioned delivery centers, interconnected via a private fiber-optic connected to 700+ access networks Edgecast 30+ PoPs, 5 continents, 2000+ direct connections Others Google, Facebook, AWS, AT&T, Level3, Brokers

Inside a CDN Servers are deployed in clusters for reliability Some may be offline Could be due to failure Also could be “suspended” (e.g., to save power or for upgrade) Could be multiple clusters per location (e.g., in multiple racks) Server locations Well-connected points of presence (PoPs) Inside of ISPs

Mapping clients to servers CDNs need a way to send clients to the “best” server The best server can change over time And this depends on client location, network conditions, server load, … What existing technology can we use for this? DNS-based redirection Clients request www.foo.com DNS server directs client to one or more IPs based on request IP Use short TTL to limit the effect of caching

CDN redirection example choffnes$ dig a2.espncdn.com ;; QUESTION SECTION: ;a2.espncdn.com. IN A ;; ANSWER SECTION: a2.espncdn.com. 278 IN CNAME a.espncdn.com.edgesuite.net. a.espncdn.com.edgesuite.net. 3120 IN CNAME a1589.g1.akamai.net. a1589.g1.akamai.net. 20 IN A 207.210.142.16 a1589.g1.akamai.net. 20 IN A 207.210.142.11

DNS Redirection Considerations Advantages Uses existing, scalable DNS infrastructure URLs can stay essentially the same TTLs can control “freshness” Limitations DNS servers see only the DNS server IP Assumes that client and DNS server are close. Is this accurate? Small TTLs are often ignored Content owner must give up control Unicast addresses can limit reliability

CDN Using Anycast ? Anycast address 120.10.0.0/16 120.10.0.0/16 An IP address in a prefix announced from multiple locations 120.10.0.0/16 AS 41 AS 32 AS 31 AS 20 120.10.0.0/16 AS 3 AS 1 AS 2 ?

Anycasting Considerations Why do anycast? Simplifies network management Replica servers can be in the same network domain Uses best BGP path Disadvantages BGP path may not be optimal Stateful services can be complicated

Optimizing Performance Key goal Send clients to server with best end-to-end performance Performance depends on Server load Content at that server Network conditions Optimizing for server load Load balancing, monitoring at servers Generally solved

Optimizing performance: caching Where to cache content? Popularity of Web objects is Zipf-like Also called heavy-tailed and power law Nr ~ r-1 Small number of sites cover large fraction of requests Given this observation, how should cache-replacement work?

Optimizing performance: Network There are good solutions to server load and content What about network performance? Key challenges for network performance Measuring paths is hard Traceroute gives us only the forward path Shortest path != best path RTT estimation is hard Variable network conditions May not represent end-to-end performance No access to client-perceived performance

Optimizing performance: Network Example approximation strategies Geographic mapping Hard to map IP to location Internet paths do not take shortest distance Active measurement Ping from all replicas to all routable prefixes 56B * 100 servers * 500k prefixes = 500+MB of traffic per round Passive measurement Send fraction of clients to different servers, observe performance Downside: Some clients get bad performance

Outline Motivation CDN basics Prominent example: Akamai

Akamai case study Deployment Customers Overall stats 147K+ servers, 1200+ networks, 650+ cities, 92 countries highly hierarchical, caching depends on popularity 4 yr depreciation of servers Many servers inside ISPs, who are thrilled to have them Deployed inside100 new networks in last few years Customers 250K+ domains: all top 60 eCommerce sites, all top 30 M&E companies, 9 of 10 top banks, 13 of top 15 auto manufacturers Overall stats 5+ terabits/second, 30+ million hits/second, 2+ trillion deliveries/day, 100+ PB/day, 10+ million concurrent streams 15-30% of Web traffic

Somewhat old network map

Akamizing Links Embedded URLs are Converted to ARLs <html> <head> <title>Welcome to xyz.com!</title> </head> <body> <img src=“http://www.xyz.com/logos/logo.gif”> <img src=“http://www.xyz.com/jpgs/navbar1.jpg”> <h1>Welcome to our Web site!</h1> <a href=“page2.html”>Click here to enter</a> </body> </html> AK

DNS Redirection Web client’s request redirected to ‘close’ by server Client gets web site’s DNS CNAME entry with domain name in CDN network Hierarchy of CDN’s DNS servers direct client to 2 nearby servers Hierarchy of CDN DNS servers Internet Customer DNS servers Multiple redirections to find nearby edge servers Web replica servers (3) (4) Client is given 2 nearby web replica servers (fault tolerance) (5) Client gets CNAME entry with domain name in Akamai (2) (6) Client requests translation for yahoo LDNS (1) Web client

Mapping Clients to Servers Maps IP address of client’s name server and type of content being requested (e.g., “g” in a212.g.akamai.net) to an Akamai cluster. Special cases: Akamai Accelerated Network Partners (AANPs) Probably uses internal network paths Also may require special “compute” nodes General case: “Core Point” analysis

Do we need to measure every path? Core points Do we need to measure every path?

Core points Core point is the first router at which all paths to nameservers intersect Traceroute once per day from 300 clusters to 280,000 core points R1 R3 R2 R2

Core points 280,000 nameservers (98.8% of requests) reduced to 30,000 core points Nice order-of-magnitude reduction ping core points every 6 minutes More than 200 probes per second per server! Note the use of low-rate expensive measurements with high-rate cheap measurements

Key future challenges Mobile networks Video China Latency in cell networks is higher Internal network structure is more opaque Video 4k/8k UHD = 16-30K Kbps compressed 25K Tbps projected Big data center networks not enough (5 Tbps each) Multicast (from end systems) potential solution China Local laws prohibit content on servers in China w/o license Need to work with local CDNs to establish presence