Akamai vs. Flash Crowds and Distributed Denial of Service Akamai Technologies & Carnegie Mellon Bruce Maggs
Outline AkamaiAkamai Content Delivery on 9/11Content Delivery on 9/11 Impact of the “Slammer” WormImpact of the “Slammer” Worm FirstPointFirstPoint SiteShieldSiteShield
Akamai Services and Products windowsupdate.microsoft.com/
Akamai’s Platform for Delivering Content and Applications Akamai Servers at Network Edge Content Providers End Users NAP
Current Installations Network Deployment Servers Networks 65+ Countries
Content Delivery Using Akamai <html><head> Welcome to xyz.com! Welcome to xyz.com! </head><body> <img src=“ Welcome to our Web site! Welcome to our Web site! Click here to enter Click here to enter </body></html> Embedded URLs are Converted to ARLs ak
End User Akamai DNS Resolution Akamai High-Level DNS Servers 10 g.akamai.net 1 Browser’s Cache OS 2 Local Name Server 3 xyz.com’s nameserver 6 ak.xyz.com 7 a212.g.akamai.net Akamai Low-Level DNS Servers 12 a212.g.akamai.net xyz.com.com.net Root (InterNIC) akamai.net8 select cluster select servers within cluster
Content Delivery on 9/11 Akamai’s network had capacity for all content providers requesting serviceAkamai’s network had capacity for all content providers requesting service Total bits served on September 11 was approximately 3.5 times normalTotal bits served on September 11 was approximately 3.5 times normal Traffic was higher on September 12Traffic was higher on September 12 (But not as high as January 7, 2002)(But not as high as January 7, 2002)
News Site A – FreeFlow Traffic
News Site A – FreeFlow Streaming
News Site B – EdgeSuite Traffic
News Site B – FreeFlow Traffic
News Site B – FreeFlow Streaming
Portal A – FreeFlow traffic
Sports Site A – FreeFlow traffic
Steve Jobs Keynote
Impact of Sapphire/Slammer Worm Web site performance severely impacted Congestion in core of Internet Significant route flapping
Military Web Site - Performance
71 content providers; 17 agents
Military Web Site - Reliability
Video
Aggregate Routing Activity 11:30 PM EST Friday
Routing Activity by Network 11:30 PM EST Friday
DOS attacks Coordinated attacks From multiple compromised machines On website or upstream Goal – to overwhelm Hacker-based e.g., – Microsoft, Yahoo! Voluntary sit-ins e.g., – World Economic Forum
Microsoft
What is FirstPoint Traffic management system for mirrored websitesTraffic management system for mirrored websites Directs browser to the optimal mirrorDirects browser to the optimal mirror DNS basedDNS based Application level anycastApplication level anycast
Why FirstPoint Content providers have mirrored websitesContent providers have mirrored websites Content providers only want to offload embedded contentContent providers only want to offload embedded content -Control -Security -Performance
Mapping Problem How to improve user experience?
What is the Mapping Problem Problem of directing requests to servers so as to optimize end-user experienceProblem of directing requests to servers so as to optimize end-user experience -reduce latency -reduce loss -reduce jitter Assumption - servers are fine Assumption - servers are fine Applicable to 2 mirrors or 1500 Akamai locationsApplicable to 2 mirrors or 1500 Akamai locations
Attempt Measure which is closerMeasure which is closer -Closeness changes over time Measure frequentlyMeasure frequently -Bothers people -Too many to do ~500,000 unique nameservers on any given day 10 sec per measurement cycle
Idea TopologyTopology -relatively static -changes in BGP time -order of hours if not days CongestionCongestion -dynamic -changes in round-trip time -order of milliseconds
Topology Discovery - Proxy points Data exchange
Topology Discovery 500,000 nameservers 500,000 nameservers reduced to 90,000 proxy points (clusters)
Congestion Measurement Problem - Still too many measurements to do. 90,000 measurements every 10s with 32B packets requires a few Mbps per mirror. Problem - Still too many measurements to do. 90,000 measurements every 10s with 32B packets requires a few Mbps per mirror. Solution - Importance based sampling Solution - Importance based sampling
CDF of End-user Load
Load Estimation 500,000 nameservers reduced to 90,000 clusters 90,000 clusters 7,000 account for 95% end-user load!
Mapping Problem – Solved? Maps built every 10s
FirstPoint Customers - how to tell?Customers - how to tell? -look for CNAME to akadns.net Customers - who?Customers - who? -High traffic content providers -Yahoo!, Microsoft, TicketMaster etc Price - don’t ask :)Price - don’t ask :) Competitors - whoCompetitors - who -one-of-a-kind service -boxes: Cisco, F5, Foundry
FirstPoint - other aspects Load-balancingLoad-balancing -estimate-based -feedback-based : https, snmp -cost-based: 95/5 Fast cutout in case of failoverFast cutout in case of failover Highly fault-tolerantHighly fault-tolerant -hardware duplication, leader election -overlay routing, BGP-based anycast Integration with other servicesIntegration with other services -DOS/Load failover
SiteShield Content provider’s website Hacker! AKAMAIAKAMAI AKAMAIAKAMAI AKAMAIAKAMAI
SiteShield IP address of origin shielded Akamai can be attacked But Akamai will respond by Diffusion – load balancing, & Resurrection – reviving unpinned servers