Measuring the End User Geoff Huston APNIC Labs, February 2014.

Slides:



Advertisements
Similar presentations
Cross-Site Scripting Issues and Defenses Ed Skoudis Predictive Systems © 2002, Predictive Systems.
Advertisements

Stacking it Up Experimental Observations on the operation of Dual Stack Services in todays Network Geoff Huston APNIC R&D February
Stacking it Up Experimental Observations on the operation of Dual Stack Services Geoff Huston, APNIC Labs 1.
Stacking it Up Experimental Observations on the operation of Dual Stack Services Geoff Huston IETF-80 March
Measuring IP Performance Geoff Huston Telstra. What are you trying to measure? User experience –Responsiveness –Sustained Throughput –Application performance.
Measuring IPv6 Deployment Geoff Huston George Michaelson
1 An Update on Multihoming in IPv6 Report on IETF Activity IPv6 Technical SIG 1 Sept 2004 APNIC18, Nadi, Fiji Geoff Huston.
Measuring IPv6 Deployment Geoff Huston APNIC R&D August
IPv4 Unallocated Address Space Exhaustion Geoff Huston Chief Scientist APNIC APNIC 24, September 2007.
Stacking it Up Experimental Observations on the operation of Dual Stack Services Geoff Huston IETF-80 March
Measuring IPv6 Geoff Huston APNIC Labs, February 2014.
Sergei Komarov. DNS  Mechanism for IP hostname resolution  Globally distributed database  Hierarchical structure  Comprised of three components.
Measuring IPv6 Geoff Huston George Michaleson APNIC Labs, May 2013.
Measuring the DNS from the Users’ perspective Geoff Huston APNIC Labs, May 2014.
Authored by: Rachit Rastogi Computer Science & Engineering Deptt., College of Technology, G.B.P.U.A. & T., Pantnagar.
Technical Architectures
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
7DS Seven Degrees of Separation Suman Srinivasan IRT Lab Columbia University.
IPv6 end client measurement George Michaelson
Internet Basics.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
INTRODUCTION TO WEB DATABASE PROGRAMMING
IT 210 The Internet & World Wide Web introduction.
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
Geoff Huston APNIC Labs
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
Implementing ISA Server Publishing. Introduction What Are Web Publishing Rules? ISA Server uses Web publishing rules to make Web sites on protected networks.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Tyre Kicking the DNS Testing Transport Considerations of Rolling Roots Geoff Huston APNIC.
Enabling Embedded Systems to access Internet Resources.
1 The Firewall Menu. 2 Firewall Overview The GD eSeries appliance provides multiple pre-defined firewall components/sections which you can configure uniquely.
HOW WEB SERVER WORKS? By- PUSHPENDU MONDAL RAJAT CHAUHAN RAHUL YADAV RANJIT MEENA RAHUL TYAGI.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
APNIC Update The state of IP address distribution and IPv6 deployment status Miwa Fujii Senior IPv6 Program Specialist APNIC.
Web Engineering we define Web Engineering as follows: 1) Web Engineering is the application of systematic and proven approaches (concepts, methods, techniques,
Hour 7 The Application Layer 1. What Is the Application Layer? The Application layer is the top layer in TCP/IP's protocol suite Some of the components.
Measuring DNSSEC Geoff Huston APNIC Labs, June 2014.
Application Block Diagram III. SOFTWARE PLATFORM Figure above shows a network protocol stack for a computer that connects to an Ethernet network and.
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
Jan 30, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Project Milestone 2 due today. Undergraduate projects should have 3 students per project.
Measuring DNSSEC Use Geoff Huston APNIC Labs. We all know…
Switch Features Most enterprise-capable switches have a number of features that make the switch attractive for large organizations. The following is a.
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
The Module Road Map Assignment 1 Road Map We will look at… Internet / World Wide Web Aspects of their operation The role of clients and servers ASPX.
Happy Eyeballs for the DNS Geoff Huston, George Michaelson APNIC Labs October 2015.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
DNSSEC – Issues and Achievements Geoff Huston APNIC Labs.
CSI 3125, Preliminaries, page 1 Networking. CSI 3125, Preliminaries, page 2 Networking A network represents interconnection of computers that is capable.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
What if Everyone Did It? Geoff Huston APNIC Labs.
By Nathaniel Dias, Benton Le Ics4U Mr.Krnic. The beginning of the internet started as a result of the Cold War. After the launch of the Russian space.
1 Introductory material. This module illustrates the interactions of the protocols of the TCP/IP protocol suite with the help of an example. The example.
Ch 2. Application Layer Myungchul Kim
COMPUTER NETWORKS Hwajung Lee. Image Source:
Chapter 11 – Cloud Application Development. Contents Motivation. Connecting clients to instances through firewalls. Cloud Computing: Theory and Practice.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Geoff Huston APNIC March 2017
George Michaelson Geoff Huston Joao Damas APNIC Labs
Web Development Web Servers.
Geoff Huston APNIC Labs
Geoff Huston APNIC Labs
Geoff Huston APNIC Labs September 2017
Geoff Huston APNIC Labs
TCP/IP Networking An Example
Measuring KSK Roll Readiness
Web Server Technology Unit 10 Website Design and Development.
Measuring KSK Roll Readiness
The Internet and Electronic mail
What part of “NO” is so hard for the DNS to understand?
Presentation transcript:

Measuring the End User Geoff Huston APNIC Labs, February 2014

What’s the question? How many users do ? How many users can retrieve a URL using IPv6? How many users perform DNSSEC validation when they resolve a domain name? How many users follow DNAME chains in the DNS etc

“Measurable” Questions How much traffic uses IPv6? How many connections use IPv6? How many routes are IPv6 routes? How many service providers offer IPv6? How many domain names have AAAA RRs? How many domains are DNSSEC signed? How many DNS queries are made over IPv6? …

Users vs Infrastructure None of these specific measurement questions really embrace the larger questions about the end user behaviour They are all aimed at measuring an aspect of of behaviour within particular parameters of the network infrastructure, but they don’t encompass how the end user assembles a coherent view of the network

For example… IPv6 To make an IPv6 connection everything else (routing, forwarding, DNS, transport) has to work with IPv6 So can we measure how many connected devices on today’s Internet are capable of making IPv6 connections?

How to measure a million end users for their IPv6 capability

Be Google (or any other massively popular web service provider)

How to measure a million end users for their IPv6 capability Be Google (or any other massively popular web service provider) – And insert measurement code on the web page that is executed as part of the page load _uacct = "UA "; urchinTracker();

How to measure a million end users for their IPv6 capability Be Google (or any other massively popular web service provider) or

How to measure a million end users for their IPv6 capability Be Google (or any other massively popular web service provider) or Get your code to run on a million users’ machines through another delivery channel

Ads are ubiquitous

Ads are implemented in Adobe Flash Advertising channels use Flash to make ads interactive – This is not just an ‘animated gif’

Flash makes ads interactive [Apply Now] hover-over is interactive, and responds when selected.

Flash and the network Flash includes primitives in ‘actionscript’ to fetch ‘network assets’ – Typically used to load alternate images, sequences – Not a generalized network stack, subject to constraints: Port 80 crossdomain.xml on hosting site must match source name (wildcard syntax) Flash has asynchronous ‘threads’ model for event driven, sprite animation

APNIC’s measurement technique Craft flash/actionscript which fetches network assets to measure. Assets are reduced to a notional ‘1x1’ image which is not added to the DOM and is not displayed Assets can be named (DNS resolution via local gethostbyname() styled API within the browser’s Flash engine) or use literals (bypass DNS resolution) Encode data transfer in the name of fetched assets – Use the DNS as the information conduit: Result is returned by DNS name with wildcard – Use HTTP as the information conduit Result is returned via parameters attached to an HTTP GET command

Advertising placement logic Fresh Eyeballs == Unique IPs – We have good evidence the advertising channel is able to sustain a constant supply of unique IP addresses Pay by click, or pay by impression – If you select a preference for impressions, then the channel tries hard to present your ad to as many unique IPs as possible Time/Location/Context tuned – Can select for time of day, physical location or keyword contexts (for search-related ads) – But if you don’t select, then placement is generalized Aim to fill budget – If you request $100 of placement a day, then inside 24h algorithm tries hard to even placement but in the end, will ‘soak’ place your ad to achieve enough views, to bill you $100

Advertising placement logic Budget: $100 per day, at $1.00 ‘CPM’ max – Clicks per millepressions: aim to pay no more than $1 per click but pay up to $1 for a thousand impressions Even distribution of ads throughout the day No constraint on location, time Outcome: 350,000 placements per day, on a mostly even placement model with end of day ‘soak’ to achieve budget goal

20 Ad Placement Training – Day 1

21 Ad Placement Training – Day 2

22 Ad Placement Training – Day 3

23 Ad Placement Training – Day 4

24 Ad Placement Training – Days 5, 6 & 7

Measurement Control Channel Use Flash code that is executed on ad impression that retrieves the actual measurement script – Ad carries code to send the client to retrieve an ad-controller URL – Client retrieves set of “tests” from the ad-controller as a sequence of URLs to fetch and a “result” URL to use to pass the results to the ad-server This allows us to vary the measurement experiment without necessarily altering the ad campaign itself – the ad, and its approval to run, remain unchanged so that measurements can be activated and deactivated in real time.

Experiment Server config There are currently three servers, identically configured (US, Europe, Australia) Server runs Bind, Apache and tcpdump Experiment directs the client to the “closest” server (to reduce rtt-related timeouts) based on simple /8 map of client address to region

Measuring IPv6 via Ads Client is given 5 URLs to load: Dual Stack object V4-only object V6-only object V6 literal address (no DNS needed) Result reporting URL (10 second timer) All DNS is dual stack

Measuring DNSSEC via Ads Client is given 4 URLs to load: DNSSEC-validly signed DNS name DNSSEC-invalidly signed DNS name Unsigned DNS name (control) Result reporting URL (10 second timer) All DNS is IPv4

Discovering Routing Filters via Ads Client is given 3 URLs to load: DNS name that resolves into the test prefix DNS name the resolves to a control prefix Result reporting URL (10 second timer)

Caching Caching (generally) defeats the intent of the measurement – Although some measurements are intended to measure the effects of caching We use unique DNS labels and unique URL GET parameters – Ensures that all DNS resolution requests and HTTP fetch requests end up at the experiment’s servers We use a common “tag” across all URLs in a single experiment – Allows us to join the individual fetches to create the per- user view of capability

Collected Data Per Server, Per Day: – http-access log (successfully completed fetches) – dns.log (incoming DNS queries) – Packet capture All packets

Collected Data Web Logs: h.labs.apnic.net 2002:524d:xxxx::524d:xxxx [29/Apr/2013:05:55: ] "GET /1x1.png? t10000.u s i888.v1794.v6lit h.labs.apnic.net 2002:524d:xxxx::524d:xxxx [29/Apr/2013:05:55: ] "GET /1x1.png? t10000.u s i888.v1794.r6.td h.labs.apnic.net xxx.xxx [29/Apr/2013:05:55: ] "GET /1x1.png? t10000.u s i888.v1794.rd.td h.labs.apnic.net xxx.xxx [29/Apr/2013:05:55: ] "GET /1x1.png? t10000.u s i888.v1794.r4.td h.labs.apnic.net xxx.xxx [29/Apr/2013:05:55: ] "GET /1x1.png? t10000.u s i888.v1794&r=zrdtd-348.zr4td-376.zr6td-316.zv6lit- 228 (In this case the client is using 6to4 to access IPv6, and prefers to use IPv4 in a dual stack context)

Collected Data DNS Logs: 27-Feb :00: queries: client #54311 query: f.t10000.u s i1022.v c34.z.dotnxdomain.net IN A -EDC ( ) 27-Feb :00: queries: client #30544 query: e.t10000.u s i1022.v c33.z.dashnxdomain.net IN A -EDC ( ) 27-Feb :00: queries: client #55619 query: d.t10000.u s i1022.v c33.z.dotnxdomain.net IN A -EDC ( )

What does this allow? In providing an end user with a set of URLs to retrieve we can examine: – Protocol behaviour e.g.: V4 vs V6, protocol performance, connection failure rate – DNS behaviours e.g.: DNSSEC use, DNS resolution performance…

The generic approach Seed a user with a set of tasks that cause identifiable traffic at an instrumented server The user does not contribute measurements The server performs the data collection

Collision detection? There was a thought that this approach could be used to perform collision detection: Test: -a.TestName.CandidateTLD/1x1.png? -a -a.TestName.ExistingTLD/1x1.png? -b ?za= &zb= Result Analysis: If the server sees a query for B and NOT A, then we can infer that there is possibly a collision for the use of CandidateTLD between local and globally scoped contexts

Really? But is this collision or the opposite? This shows the extent of local zone instances occluding a global zone But I thought we were looking for the possibility of global zone delegation altering the behaviour of client applications using / assuming a local zone resolution Which looks like the opposite

Furthermore … Is it the use of a local name or the content of local name search lists that is critical here? And what name forms trigger the local name resolution function to invoke the local search list to apply to the given name? Are we measuring the extent of name collision itself or the extent of deployment of various forms of name resolution with search lists?

What about… Test: -single-label-name/1x1.png? -a -single-label-name/1x1.png? -b -single-label-name.Existing.domain.name/1x1.png? -c ?za= &zb= &zc= Question: If we launched a high volume of ads, then what would we see at a root server?

A few observations Measuring what happens at the user level by measuring some artifact or behaviour in the infrastructure and inferring some form of user behaviour is going to be a guess of some form If you really want to measure user behaviour then its useful to trigger the user to behave in the way you want to study or measure The technique of embedding code behind ads is one way of achieving this objective, for certain kinds of behaviours relating to the DNS and to URL fetching

Questions? APNIC Labs: Geoff Huston