Determining the Geographic Location of Internet Hosts Venkata N. Padmanabhan Microsoft Research Lakshminarayanan Subramanian University of California at Berkeley SIGMETRICS 2001
Background Location-aware services are relevant in the Internet context too targeted advertising event notification territorial rights management Existing approaches: user input: burdensome, error-prone whois: manual updates, host may not be at registered location Goal: estimate location based on client IP address challenging problem because an IP address does not inherently indicate location
IP2Geo Multi-pronged approach that exploits various “properties” of the Internet DNS names of router interfaces often indicate location Network delay tends to correlate with geographic distance Hosts that are aggregated for the purposes of Internet routing also tend to be clustered geographically GeoTrack determine location of closest router with recognizable DNS name GeoPing use delay measurements to triangulate location GeoCluster extrapolate partial IP-to-location mapping information using cluster information derived from BGP routing data
GeoPing Delay-based triangulation is conceptually simple delay distance distance from 3 or more non-collinear points location But there are practical difficulties network path may be circuitous transmission and queuing delays may corrupt delay estimate one-way delay is hard to measure GeoPing delay is measured from several distributed probes minimum delay among several samples is picked Nearest Neighbor in Delay Space (NNDS) algorithm construct a delay map containing (delay vector,location) tuples given a delay vector, search through the delay map for closest match location corresponding to the closest match is our location estimate
Validation of Delay-based Approach Delay tends to increase with geographic distance
Impact of the Number of Probes Highest accuracy when 7-9 probes are used
GeoCluster Basic idea divide up the space of IP addresses into clusters using BGP prefixes use partial IP-to-location mapping data to infer location of each cluster given target IP address, find matching cluster via longest-prefix match. location of the matching cluster is our estimate of host location Issues partial IP-to-location mapping information may not be entirely accurate BGP prefixes might not correspond to geographic clusters Sub-clustering algorithm use partial IP-to-location mapping information to test whether a BGP prefix is likely to correspond to a geographic cluster if the test is negative, divide the prefix into two and recursively apply the test to each half in the end we are only left with geographically clustered prefixes dispersion offers an indication of the accuracy of a location estimate
Performance of IP2Geo Median error: GeoCluster: 28 km,GeoTrack: 102 km, GeoPing: 382 km
Summary IP2Geo combines several techniques that leverage different sources of information GeoTrack: DNS names GeoPing: network delay GeoCluster: address aggregates used for routing Median error varies between 20 and 400 km Even a 30% success rate is useful especially since we can tell when the estimate is likely to be accurate Forthcoming paper at SIGCOMM 2001 For more information visit: