NANO: Network Access Neutrality Observatory Mukarram Bin Tariq, Murtaza Motiwala, Nick Feamster, Mostafa Ammar Georgia Tech.

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Grand Challenges in Networking Nick Feamster CS 7001.
Challenges in Making Tomography Practical
Characterizing VLAN-Induced Sharing in a Campus Network
1 1 Detecting Network Neutrality Violations with Causal Inference Mukarram Bin Tariq, Murtaza Motiwala Nick Feamster, Mostafa Ammar {mtariq, murtaza, feamster,
Detecting Network Neutrality Violations with Causal Inference Mukarram Bin Tariq, Murtaza Motiwala Nick Feamster, Mostafa Ammar Georgia Tech
Nick Feamster Research: Network security and operations –Helping network operators run the network better –Helping users help themselves Lab meetings:
Nick Feamster Georgia Tech Joint work with Mukarram bin Tariq, Murtaza Motiwala, Yiyi Huang, Mostafa Ammar, Anukool Lakhina, Jim Xu Detecting and Diagnosing.
Access Networks: Applications and Policy Nick Feamster CS 6250 Fall 2011 (HomeOS slides from Ratul Mahajan)
Case-control study 3: Bias and confounding and analysis Preben Aavitsland.
Statistical vs Clinical or Practical Significance
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Assumptions underlying regression analysis
Andrea M. Landis, PhD, RN UW LEAH
The Maximum Likelihood Method
Experimental and Quasiexperimental Designs Chapter 10 Copyright © 2009 Elsevier Canada, a division of Reed Elsevier Canada, Ltd.
Independent Measures T-Test Quantitative Methods in HPELS 440:210.
Designing an impact evaluation: Randomization, statistical power, and some more fun…
1 Quality of Service Issues Network design and security Lecture 12.
Agenda Group Hypotheses Validity of Inferences from Research Inferences and Errors Types of Validity Threats to Validity.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Chapter 15 ANOVA.
Statistics for the Social Sciences Psychology 340 Spring 2005 Using t-tests.
What is Science? or True False
6.5 Compare Surveys, Experiments, and Observational Studies p. 414 How do you accurately represent a population? What is an experimental study? What is.
Ch 14 實習(2).
Where do data come from and Why we don’t (always) trust statisticians.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Experimental Design making causal inferences. Causal and Effect The IV precedes the DV in time The IV precedes the DV in time The IV and DV are correlated.
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
Peer-Assisted Content Distribution Networks: Techniques and Challenges Pei Cao Stanford University.
Study Design Data. Types of studies Design of study determines whether: –an inference to the population can be made –causality can be inferred random.
Network Neutrality 4/21/20111Harvard Bits. 4/21/2011Harvard Bits2.
EXPERIMENTS AND OBSERVATIONAL STUDIES Chance Hofmann and Nick Quigley
Association vs. Causation
Validity and Reliability Dr. Voranuch Wangsuphachart Dept. of Social & Environmental Medicine Faculty of Tropical Medicine Mahodil University 420/6 Rajvithi.
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
Magister of Electrical Engineering Udayana University September 2011
Chapter 1: Introduction to Statistics
Chapter 11.1 Measures of Dispersion and Types of Studies.
Final Study Guide Research Design. Experimental Research.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
Experimental Design making causal inferences Richard Lambert, Ph.D.
Experimental Design All experiments have independent variables, dependent variables, and experimental units. Independent variable. An independent.
Study Designs for Clinical and Epidemiological Research Carla J. Alvarado, MS, CIC University of Wisconsin-Madison (608)
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Sex and Gender Differences in Clinical Research Methodological Ramifications Martin H. Prins
Measuring Net Neutrality Alex Maltinsky Prof. Ran Giladi.
Ten things about Experimental Design AP Statistics, Second Semester Review.
Producing Data 1.
1.3 Experimental Design. What is the goal of every statistical Study?  Collect data  Use data to make a decision If the process to collect data is flawed,
(www).
Topic 2: Types of Statistical Studies
CHAPTER 4 Designing Studies
Biostatistics Case Studies 2016
Epidemiology 503 Confounding.
6.5 Compare Surveys, Experiments, and Observational Studies
Explanation of slide: Logos, to show while the audience arrive.
Designing experiments
Ten things about Experimental Design
Why use marginal model when I can use a multi-level model?
Economics 20 - Prof. Anderson
Day 1 Parameters, Statistics, and Sampling Methods
Statistical Data Analysis
Day 1 Parameters, Statistics, and Sampling Methods
Enhancing Causal Inference in Observational Studies
Confounders.
Presentation transcript:

NANO: Network Access Neutrality Observatory Mukarram Bin Tariq, Murtaza Motiwala, Nick Feamster, Mostafa Ammar Georgia Tech

Net Neutrality

Example: BitTorrent Blocking

Many Forms of Discrimination Throttling and prioritizing based on destination or service Target domains, applications, or content Discriminatory peering Resist peering with certain content providers...

Problem Statement Identify whether a degradation in a service performance is caused by discrimination by an ISP Quantify the causal effect Existing techniques detect specific ISP methods TCP RST (Glasnost) ToS-bit based de-prioritization (NVLens) Goal: Establish a causal relationship in the general case, without assuming anything about the ISPs methods

Causality: An Analogy from Health Epidemiology: study causal relationships between risk factors and health outcome NANO: infer causal relationship between ISP and service performance

Does Aspirin Make You Healthy? Sample of patients Positive correlation in health and treatment Can we say that Aspirin causes better health? Confounding Variables: correlate with both cause and outcome variables and confuse the causal inference Aspirin No Aspirin Healthy 40%15% Not Healthy 10%35% Sleep Duratio n Diet Other Drugs Gender Aspirin Health ?

Does an ISP Cause Service Degradation? Comcast No Comcast BitTorrent Download Time 5 sec2 sec Client Setup ToD Content Locatio n Sample of client performances Some correlation in ISP and service performance Can we say that Comcast is discriminating? Many confounding variables can confuse the inference. Comcast BT Downloa d Time ?

Causation vs. Association (1) Baseline Performance Performance with the ISP Causal Effect = E(Real Download time using Comcast) E(Real Download time not using Comcast) G 1, G 0 : Ground-truth values for performance (aka. Counter-factual values) Problem: Generally, we do not observe both ground truth values for the same clients. Consequently, in situ data sets are not sufficient to directly estimate causal effect.

Causation vs. Association (2) Observed Baseline Performance Observed Performance with the ISP Association = E(Download time using Comcast) E(Download time not using Comcast) We can observe association in an in situ data set. In general,. How to estimate causal effect ( ) ?

Estimating the Causal Effect Two common approaches a. Random Treatment b. Adjusting for Confounding Variables

Random Treatment Given a population: 1. Treat subjects with Aspirin randomly, irrespective of their health 2. Observe new outcome and measure association 3. For large samples, association converges to causal effect if confounding variables do not change Diet, other drugs, etc. should not change Aspirin Treated Not Aspirin Treated !H H H HH H H HH H = = 0.55

Random Treatment (How to apply to the ISP Case?) Ask clients to change their ISP to an arbitrary one Difficult to achieve on the Internet Changing ISP is cumbersome for the users Changing ISP may change other confounding variables, e.g., the ISP network changes.

!HH H H H H H H H H H H H H H H H H H H Adjusting for Confounding Variables 1. List confounders e.g., gender ={, } 2. Collect a data set 3. Stratify along confounder variable values 4. Measure Association 5. If there still is association, then it must be causation !H H H HHH H H HH H H H H H H HH H H Treated Baseline Strata Effect An in situ data set Treatment: Baseline: Border(, ) Treated: No border Outcome: Healthy (H), Not Healthy (!H) Stratum: Type {Circle, Square}

Adjusting for Confounding (How to Apply to the ISP Case?) Challenges What is the baseline? What are the confounding variables? Is the list of confounders sufficient? How to collect the data? Can we infer more than the effect? e.g., the discrimination criteria

What is the Baseline? Baseline: service performance when ISP is NOT used We need to use some ISP for comparison What if the one we use is not neutral Solutions: a. Use average performance over all other ISPs b. Use a lab model c. Use service providers model

Determine Confounding Variables Client Side Client setup (Network Setup) Application (Browser, BT Client, VoIP client) Resources (Memory, CPU, Utilization) ISP Related Not all ISPs are equal; e.g., location. Temporal Diurnal cycles, transient failures Using Domain Knowledge

How to Collect the Data? Requirements Quantify confounders and effect Identify the treatment variable Unbiased Passive measurements at the client end

Inferring the Criteria Label data in two classes: discriminated (-), non-discriminated (+) Train a decision tree for classification rules provide hints about the criteria + Domain Content Size youtube.com MB

A Simple Simulation Clients use two applications: App 1 and App 2 to access services Service 1 is slower using App 2 App is confounding ISP B throttles access to Service 1 Association Service 1Service 2 Baseline ISP B Association0.92 (10%)0.04 (1%) Causation Service 1Service 2 App 1 App 2 App 1 App 2 Baseline ISP B Causation 2.05 (20%) 5.18(187 %) 0.06(2%)0.12(4%)

Conclusions NANO: Black-box approach to infer and quantify discrimination; generically applicable. Ongoing work, Open Issues: Privacy: Can we do local inference? Deployment: PlanetLab, CPR, Real Users How much data?: Depends on variance Contact:

Backup

Variables Causal Variables (X) The brand of an ISP IP address to ISP name mapping Outcome Variables (Y) Identify the Service The performance of a service Throughput, Delay, Jitter, Loss Confounding Variables (Z) Those that correlate with both

Adjusting for Confounding Variables Treatment ( whether using ISP i ): X i Outcome ( performance of service j ): Y j Causal Effect: Boundaries of a stratum z with all-things equal

Sufficient Confounders? If we have enough variables, then we should be able to predict the performance Suppose: Confounding Var ( Z ), Treatment ( X ), Outcome ( y ) Predict: Test for prediction error:

Incentivizing Useful Diagnostics get a piggybacked ride NANO will identify transient failures in the elimination process Useful diagnostics dashboard for the users

P2P-ized Architecture Coordination to speed up inference and active learning

The Cross-Disciplinary Aspect Epidemiology: Study of what is upon people factors affecting health of human populations causal inference among factors and health Epi`net`iology ** : Study of what is upon the network (Internet) what factors affect network health and service performance net-neutrality: is the ISP causing it? ** you wont find this term in a dictionary