Decoding Major Internet Outages in 2017

Slides:



Advertisements
Similar presentations
How do Networks work – Really The purposes of set of slides is to show networks really work. Most people (including technical people) don’t know Many people.
Advertisements

Jacob Boston Josh Pfeifer. Definition of HyperText Transfer Protocol How HTTP works How Websites work GoDaddy.com OSI Model Networking.
Traffic Engineering for CDNs Matt Jansen Akamai Technologies APRICOT 2015.
Data Communications and Networks
Syllabus outcomes Describes and applies problem-solving processes when creating solutions Designs, produces and evaluates appropriate solutions.
Lecturer: Ghadah Aldehim
TCOM 515 Lecture 6.
Denial of Service Bryan Oemler Web Enhanced Information Management March 22 nd, 2011.
Forensic and Investigative Accounting Chapter 14 Internet Forensics Analysis: Profiling the Cybercriminal © 2005, CCH INCORPORATED 4025 W. Peterson Ave.
Forensic and Investigative Accounting Chapter 14 Digital Forensics Analysis © 2011 CCH. All Rights Reserved W. Peterson Ave. Chicago, IL
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
Packet switching Monil Adhikari. Packet Switching Packet switching is the method by which the internet works, it features delivery of packets of data.
COMP 431 Internet Services & Protocols
Data Hosting and Security Overview January, 2011.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
Dissecting Significant Outages from 2014 Valerio Plessi CCIE R&S Customer Success Engineer
DNS Security Risks Section 0x02. Joke/Cool thing traceroute traceroute c
THE DNS (DOMAIN NAME SYSTEM). Before the DNS, all computers connected to the internet through ARPANET (the worlds first operational packet switching network).
June 11, 2002 Abilene Route Quality Control Initiative Aaron D. Britt Guy Almes Route Optimization.
VCE IT Theory Slideshows
Internet Service Providers and types of internet connections
Improving Resilience and Performance in Light of Recent Internet Outages Troy Whitney – Manager, Solutions Engineering.
Understand Names Resolution
CS 3700 Networks and Distributed Systems
Amazon Web Services (aws)
IP and MAC Addresses, DNS Servers
Original slides prepared by Theo Benson
Content Distribution Networks
Instructor Materials Chapter 9: Testing and Troubleshooting
Practical Censorship Evasion Leveraging Content Delivery Networks
Using MIS 2e Chapter 6 Appendix
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
Unit 5: Providing Network Services
CHAPTER 3 Architectures for Distributed Systems
Interdomain Traffic Engineering with BGP
Working at a Small-to-Medium Business or ISP – Chapter 7
Internet Applications
Internet Networking recitation #12
Acutelearn Amazon Web Services Training Classroom Training Instructor led trainings at Acutelearn premises Corporate Training Custom tailored trainings.
The Issue We all depend on the Internet
How do we decide where to deploy to next?
Providing Network Services
Working at a Small-to-Medium Business or ISP – Chapter 7
How Data Flows through the Internet
Packet Sniffing.
TCP/IP Networking An Example
Working at a Small-to-Medium Business or ISP – Chapter 7
Web Design & Development
Department of Computer and IT Engineering University of Kurdistan
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
Content Distribution Networks
Lecture 6: TCP/IP Networking 1nd semester By: Adal ALashban.
Lecture 3: Secure Network Architecture
The Internet.
Computer Networking A computer network, often simply referred to as a network, is a collection of computers and devices connected by communications channels.
Computer Networking A computer network, often simply referred to as a network, is a collection of computers and devices connected by communications channels.
4.02 Develop web pages using various layouts and technologies.
Crimson® 3.1 Updates January 2019.
BGP Interactions Jennifer Rexford
INTERNET APPLICATIONS
How Our Customers Communicate With Us
Content Delivery and Remote DNS services
Amazon AWS Certified Solutions Architect Professional solutions-architect-professional-practice-test.html.
Amazon Web Services.
Protocol Application TCP/IP Layer Model
Windows Name Resolution
Data Communications and Networks
AWS Migration Made Easy
Presentation transcript:

Decoding Major Internet Outages in 2017 Nitin Nayar Senior Solutions Engineer

AGENDA 3 Major Outages from 2017: Marketo DNS AWS S3 outage Rostelecom Route Leak

What happens when Domain Name expires? The Marketo Story

Marketo’s Domain Name Expiry On July 25th at 4:25am PST, Marketo’s main domain started experiencing an outage HTTP Availability dipped to 60-70% Network packet loss ~20% ShareLink: https://ciuvrxmw.share.thousandeyes.com

Marketo’s Domain Name Expiry Why is Traffic being sent to AS 40034- Confluence Networks?

Marketo’s Domain Name Expiry DNS Network Topology Marketo has 2 DNS servers: ns1.marketo.com & ns2.marketo.com Why is there a NEW DNS Server: 208.91.197.32? DNS Server tests were setup to the CNAME servers for app.marketo.com CNAME authoritative servers: ns1.marketo.com and ns2.marketo.com -> Resolving to the same bogus IP address we saw in the previous slide (but that was app.marketo.com) ShareLink: https://gmhux.share.thousandeyes.com

Marketo’s Domain Name Expiry WHOIS Lookup Nameservers used by “Network Solutions” for expired domains. At the beginning of the Marketo outage, we performed a WHOIS lookup for marketo.com and got some interesting results. First note that the creation and expiration dates both occur on July 23rd. This outage occurred on July 25th, which is suspiciously close to that date. Because domain name renewals generally occur on annual or multi-year cycles, it’s likely that the Marketo domain expired on July 23, 2017. As further evidence of expiration, Marketo’s nameservers are listed as ns1.pendingrenewaldeletion.com and ns2.pendingrenewaldeletion.com. These are the nameservers that the registrar, Network Solutions, uses for domains that have expired. After querying these two nameservers for random domain names, we found that they always return the same IP address in Confluence Networks (208.91.197.132) that we saw previously, regardless of the domain name. Based on war stories from a similar event in 2013, Network Solutions transfers expired domains to their partner, Confluence Networks, in order to monetize the traffic sent to those expired domain names. This is exactly what we saw: when Marketo’s domain expired, Network Solutions changed their nameservers to ns1.pendingrenewaldeletion.com and ns2.pendingrenewaldeletion.com, which direct traffic to one specific IP address in Confluence Networks (208.91.197.132).

Marketo Outage Root Cause Summary Outage was a direct result of “marketo.com” domain name expiry On expiry, traffic to Marketo was black-holed in a new network belonging to “Confluence Networks”

AWS S3 Outage

AWS S3 Outage AWS S3 (US-East Region) experienced a massive outage on Feb 28th between 9:40am – 12:36am PST Impact of the outage was widespread disrupting multiple services like Quora, Coursera, Docker and Down Detector The outage highlighted the dependency across various AWS services AWS S3 (Simple Storage Service) is a cloud object storage solution that many services rely on to store and retrieve files from anywhere on the web. In addition, many other AWS services that depend on S3 — Elastic Load Balancers, Redshift data warehouse, Relational Database Service and others — also had limited to no functionality.  Highligthing the dependency across AWS services and the interworkings of the services. ShareLink: https://gokahptkc.share.thousandeyes.com

AWS S3 Outage Root Cause Analysis 100% Packet Loss / Complete loss of TCP connectivity Root Cause: Human error that mistakenly took down more servers than intended. Services may rely on S3 in a variety of ways: A service may be directly hosted on S3, in the case of static websites. These services’ fates are tied to S3’s — during the outage, these services would have suffered a complete outage. A service may have objects on its web pages that are hosted on S3. In this case, the service would not be completely unavailable, but certain objects served out of S3 may be unable to load, and page load times may be impacted. A service may have critical sub-services (such as user session management, customer records or media files) that depend on S3 or other impacted AWS services. This might manifest itself as a complete outage or reduced functionality. As a result, the many affected services’ failure modes during the AWS S3 outage turned out to be telling indicators of the ways in which they rely on Amazon’s services.

Rostelecom BGP Route Leak

Rostelecom BGP Route Leak On April 26th between 22:36-22:43 UTC, Rostelecom, (Russia’s largest ISP) leaked dozens of routes The affected IP prefixes belonged to financial services firms, e-commerce and payment services 136 prefixes affected (36 belonged to financial companies) Mastercard SecureCode, Smart Data and MasterPass Verified by Visa and Visa-owned CardinalCommerce Symantec WebSecurity and Geotrust RSA’s email servers Online banking sites for French banks BNP Paribas and CIT, and Polish Bank Zachodni owned by Santander Traffic to indented destinations was steered through Rostelcom’s network Notes: - Of these 137 prefixes, approximately 100 belong to organizations in Russia and, as such, likely as part of normal network operations. However, 36 prefixes belonged to companies such as Symantec, Visa, Mastercard, BNP Paribas and EMC. 

Rostelecom BGP Route Leak Rostelecom (AS 12389) advertised and withdrew routes to its neighbors  Peers such as Cogent (AS 174), Hurricane Electric (AS 6939) and Tata (AS 6453) accepted these routes and propagated them across the Internet. Rostelecom (blue dotted circle) advertised and then withdrew routes (red dotted lines). Route Monitors (diamonds) in orange and red were impacted, while those in green continued sending traffic to the legitimate origin network (green circle).

Rostelecom BGP Route Leak Traffic from Canada steered through Rostelecom’s network, and going over 60+ intermediate hops! Correlating Traffic with Routes ThousandEyes data includes not just routing information but also the traffic path taken by IP packets across the Internet in a view called Path Visualization. You can see in Figure 2 that traffic entered the Rostelecom network and then returned back out to the destination network, in this case via Cogent in Stockholm. But Rostelecom took the traffic for a ride, through 60+ interfaces in what was either an unintentional routing loop or a very intentional series of devices to inspect the traffic. You can see these as the white, unresponsive, interfaces in the visualization.

References AWS S3 Outage Marketo: ShareLink: https://gokahptkc.share.thousandeyes.com AWS Root Cause Analysis- ThousandEyes Blog: https://blog.thousandeyes.com/aws-s3-outage-likely-caused-by- internal-network-issue/ Marketo: ShareLink-HTTP: https://ciuvrxmw.share.thousandeyes.com ShareLink-DNS: https://gmhux.share.thousandeyes.com Marketo Root Cause Analysis- ThousandEyes Blog: https://blog.thousandeyes.com/what-happened-when-marketos- domain-name-expired/

Thank You