Towards Street-Level Client- Independent IP Geolocation Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar Kuzmanovic, Northwestern Cheng Huang, Microsoft Research
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Problem and Motivation How to accurately locate IP addresses on the Internet? Host-dependent solutions: –GPS –WiFi (e.g., Google My Location, Skyhook) Host-independent solutions: –Server cannot always expect clients’ cooperation Security / access restrictions Online service access analytics Location-based online advertising 2
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation A Scenario of Street-Level Online Advertising 3 User’s location Local Businesses
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Prior Work Constrained Based Geolocation [ToN 06] Median error distance = 228 km –Measure delays from active vantage points Topology Based Geolocation [IMC 06] Median error distance = 67 km –CBG + consider network topological information Octant [NSDI 07] Median error distance = 35.2 km –CBG + consider router’s location, geographical and demographics information 4
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Methodology Highlights Our methodology is based on two insights –Websites often provide the actual geographical location of associated entities E.g., universities, businesses, government offices, etc. Develop methods to determine if web- or servers reside at the corresponding locations –Relative network delays highly correlate with geographical distances Absolute network delay measurements are fundamentally limited in their ability to achieve fine-grained geolocation results 5
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation 6 Institutional Network Example to external network router IP subnet mail server web server 550 South Hill Street Suite 890, Los Angeles, CA Web cloud- sourcing 550 South Hill Street Suite 890, Los Angeles, CA 90013
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation 7 < << Measured delays: The Role of Relative Network Delays
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation A Case Study Target IP address: Target postal address: 1850, K Street NW, Washington DC, DC,
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Three-Tier Geolocation System 9 Tier 1 Goal: Find the coarse- grained region for the targeted IP Measured delays Geographical distances Create intersection
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Three-Tier Geolocation System 10 Tier 2 Estimate the delay between landmarks and the target D 1 + D 2 < D 3 +D 4 Create a new intersection Populate the intersection with landmarks Goal: Use passive landmarks to determine finer-grained region for the targeted IP
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Three-Tier Geolocation System 11 Tier 3 Select the landmark with the minimum delay to the target, and associate the target’s location with it km vs km Measured distance ∝ Geographical distance Goal: Geolocate the target IP using passive landmarks
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Remaining Issues Verifying landmarks –Sweep-out most of the erroneous landmarks –Errors are still possible! Resilience to errors –The larger the error – the more resilient our method is –We prove that the likelihood that an erroneous landmark will affect the accuracy is small 12
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Evaluation Three datasets –Planetlab dataset (Academic) –Collected dataset (Residential) –Online Maps dataset (In the wild) Factors impact the accuracy –Landmark density –Population density –Access networks 13
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Dataset Characteristics 14 The three datasets cover both urban areas and rural areas. Urban areas Rural areas
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Baseline Results 15 Error distance (km)PlanetlabResidentialOnline Maps The best previous result Median Maximum
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Landmark Density 16 The larger the number of landmarks we can discover in the vicinity of a target, the larger the probability we will be able to more accurately geolocate the targeted IP. Density sequence: Planetlab > Residential > Online Maps
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation The Role of Population Density 17 The error distance is smallest in densely populated areas The error grows as the population density decreases Middle of “nowhere”
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation The Role of Access Networks 18 Error distance (km)AT&TComcastVerizon Median km 700 meters Cable access networks (Comcast) have a much larger latency variance than DSL networks (AT&T and Verizon)
Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Conclusions A geolocation system able to geolocate IP addresses with more than an order of magnitude better precision than the best previous method Our methodology consists of two components –Mining landmarks from the Web and using Web or servers as landmarks –Using relative network distances as opposed to absolute network distances 19
Thank You