Improving Resilience and Performance in Light of Recent Internet Outages Troy Whitney – Manager, Solutions Engineering
We now live in an Internet-centric IT world Employee Productivity Business Operations DDoS Attack Cloud Outages Customer Experience
So what’s changed? Everything. INTERNET Data Center ‘Private’ circuits connect the corporate WAN through Internet Service Providers Branch
Cloud data centers host business critical apps INTERNET Data Center Apps Applications hosted in the cloud or remote data centers Branch
Direct Internet Access connects branches Data Center Apps Direct Internet connectivity to cloud services and software-defined routing between branches Branch
Wireless is everywhere INTERNET Data Center Apps Wireless is the primary connection at the branch Branch
And employees work where convenient INTERNET Data Center Apps Home Employees access applications from home and on the road Branch
Managed DNS is a linchpin of service delivery DNS services are managed by external providers INTERNET Data Center Apps Home Branch
CDNs and DDoS mitigation act as intermediaries CDNs offload traffic, filter attacks and reduce latency CDN / DDoS Mitigation DNS INTERNET Data Center Apps Home Branch
IaaS has become your additional data center CDN / DDoS Mitigation IaaS providers host services and entire applications IaaS DNS INTERNET Data Center Apps Home Branch
Internet Outages Happen All the Time ~ 170 affected interfaces / hour ~ 1.6K prefixes / hour
Internet Outage: AWS S3
50% of major retailers affected IaaS outages As business move critical apps and services to IaaS clouds, outages can be very damaging Despite fault-isolated regions, many apps aren’t multi- region Even those that are focus on compute, not resiliency of other services Impacts can be complex, correlated and externally introduced AWS S3 outage Feb 2017 4 hours 1000s of apps and sites Estimated $150M impact 50% of major retailers affected
Broad impact on sites and apps Impacted file storage, often not replicated across regions Impacted other dependent AWS services (Redshift, ELB, RDS, etc.) Impacted AWS monitoring services (Cloud Watch, status page) Impacted commonly-used third-party services (Blue Kai, etc.)
A large-scale operations error AWS unintentionally removed servers and had to restart the file storage systems The issue identification, system restart and recovery took hours This showed up as completely unavailable services
Internet Outage: Dyn DNS DDoS
DDoS attacks Attackers attempt to prevent users from reaching a service with a denial of service attack DDoS attacks overwhelm networks, network equipment or applications with traffic They happen with alarming frequency and scale, causing business interruption and covering traces of other attack types Largest attacks now exceed 500 Gbps Costs in excess of $40K per hour per company One attack cost a firm 8% of customers
Dyn DNS DDoS DNS matters! You can’t send a message if you don’t know the address An example from Oct 21st 2016
Service availability impacted for 24 hours DNS is application traffic too It needs the network to run A DDoS attack prevents that
Network connectivity to Dyn during the attack
Clogging the Pipes
Internet Outage: Rostelecom Route Leak
Dozens of large scale routing leaks each year Route leaks Networks around the world exchange routes, data on how traffic can move to its destination But, these routes can leak accidentally or another network can intentionally hijack them This causes Internet traffic to move to an incorrect destination, denying service or allowing traffic inspection Dozens of large scale routing leaks each year Lasting from seconds to days
Rostelecom route leak April 27th Rostelecom, a Russian state owned ISP leaked routes for dozens of networks Including major payments infrastructure: Visa, Mastercard, BNP Paribas, HSBC, MUFG, UBS, Santander Traffic flowed through Russian networks for over 7 minutes
Taking financial traffic for a ride Traffic entered the Rostelecom network Traversed 60+ interfaces either in a loop or as it was inspected Then returned back to the payment card network
A New Approach to Managing Internet Outages
Collect performance data from every perspective NY Branch INTERNET Data Center Apps Home Enterprise Agents Cloud Agents HK Branch Endpoint Agents
A unified view of performance from user to app End-to-End Performance Data User App Performance User Experience Network Connectivity Network Topology Routing Topology App Routing Enterprise, Endpoint and Cloud Agents
See every network like it’s your own Washington, DC Visualize your network topologies the way that critical services flow over it San Francisco, CA 182.50.78.169 182.50.78.41 182.50.78.41 Boston, MA See faults and dependencies in context 182.50.78.169 Hong Kong Dallas, TX 3 © 2017 ThousandEyes Inc. All Rights Reserved. Vancouver, Canada
Quickly surface insights from a global data set Immediately identify issues from complex behaviors Algorithms sort through the data of all ThousandEyes users to find the answer NTT in Virginia New York Cloud Agent Salesforce Customer 1 Boston Enterprise Agent Google Los Angeles Cloud Agent Customer 2 Comcast in Denver AWS
Solve issues across shared infrastructure Washington, DC Dashboards / Reports Alerts Snapshots Your Network Your ISP Cloud or CDN
About Us 2010 San Francisco New York | London | Austin We’re a team of network experts, committed to helping you best connect your business FOUNDED IN 2010 HEADQUARTERS IN: San Francisco OFFICES IN: New York | London | Austin
Thank You