Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015.

Similar presentations


Presentation on theme: "1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015."— Presentation transcript:

1 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015

2 2 Phil Stanhope Dyn Fellow, Office of the CTO I help oversee architecture, engineering & operations of our internet performance products. Huge focus on real-time protocol and performance telemetry: BGP, ICMP, DNS, HTTP[S], high performance web services & systems, real-time stream processing, kernel tweaks, cache-friendly APIs, cache-busting techniques … Still programming in C (and now Go). Equal opportunity protocol abuser with a focus on a need for speed. phil@dyn.com and @componentry Background

3 3 Measure a lot of things about the internet…

4 4 At scale, everything breaks Urs Hölzle Google's first vice president of engineering

5 5 ● Consolidated view across your Internet Infrastructure ● Determine the impact to Cloud, CDN and Hosting Infrastructure globally ● Immediate time to information What is Internet Intelligence? 5 Lots of tools can tell you what’s happening either in your page – use them. Or in your infrastructure (servers, caches, databases, …) – use them too.

6 6 Leverage Currently Deployed Dyn Assets ● Global Monitoring Infrastructure ● Custom Cloud Monitoring Infrastructure ● Real User Monitoring data ● Global Routing Infrastructure Monitors ● Multi-Layer “looking glass” (backbone, far-edge, end-user) How is it Done? 6

7 7 In the Internet – Trace Routes & BGP 7 Average >13 Hops per traceroute RTT sum of traceroute hops over one day (ms)  596,538,299,434 Distance traversed: 6.03E+13 km Light years measured: > 6 per day 3500 passes through internet pathways per day  every 25 seconds 6B+ probes, 243 countries, 26,389 cities (as of two days ago) 52K ASN monitored per day > 200M BGP updates per day Alarms & Alerts

8 8 On the Internet – Real Time DNS Traffic Mgmt 8 30B real-time traffic steering decisions per day 150B+ data points analyzed per day Real-time protocol analysis as well as traditional log processing – Situational awareness – Billing & Reporting > 30TB logs per day and growing BrowserRecursiveAuthoritive Injector & Beacon GET - http://dyninsight.com/inject/CUST_ID/CUST_DATA beacon = HMAC(secret, token) Javascript “injection” – just like injecting an advertisement into a page Writes a transparent iframe into the page Loading the iframe requires resolving beacon Guaranteed to cause recursive DNS cache miss time, client_ip, beacon time, recursive_ip,beacon HTTPDNSLOG & ANALYZE Collect GET - http://beacon.dyninsight.com/CUST_ID/CUST_DATA/token time, client_ip Dynamic HTML - containing customer resources to test Resources 1.. N @ target origins tested Resource timing Information sent to collector Per resource timing info 1 1 2 2 3 3 4 4 5 5 KEY: Gatherer token = encode(cust_id, client_ip, time, nodeid, referer) Time – 2 - Authoritative Time – 2 – Recursive (inferred)

9 9 In the App – W3C Resource Timing 9 1B RUM latency measurements per Day Traffic originating from 80% of ASNs on the internet every minute Real-Time & Historical APIs Real-Time Event & Alerting

10 10 What is under our microscope?

11 11 It’s always cloudy on the internet… You’ll need a lot of weather maps to see what’s going on in the internet… – By provider set, by target customers, by competitor, by target market What impacts you may not be impacting others. And vice-versa… Look Now, Look Back, See the Future

12 12 Waterfalls & RUM – Where do you start? 12

13 13 Micro Outages – Happening all the Time 13

14 14 Persistent, Forecastable Geo Failures 14

15 15 A single web page that shows combination of real-time, near-real-time forensic data, and provides some hints at forecasting what will happen soon somewhere in the internet Intentionally unbranded – what can you do with our datasets? Covers the internal APIs that we use – they are all in the process of becoming available to you. Ask me more about that! Common set of UX controls can be used to a variety of real-time and batch data: GeoViews, Weather Maps, Sunburst Drill Down, Matrix & Long-Term Trending Under the covers: ReactJS, D3, GeoJson/Topojson, Go, Varnish, Websockets Let’s go Dive into Real-Time Maps… 15

16 16 Consistent Differences, Predictable Outcomes 16

17 17 Break / Fix – Is it impacting your users? 17

18 18 Working [but slow] – Should you take action? 18

19 19 QUESTIONS?


Download ppt "1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now What? Phil Stanhope October 2015."

Similar presentations


Ads by Google