Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sándor Laki (C) Geolocation by IP address 1 Geolocation by IP address Locating Internet hosts Sándor Laki

Similar presentations


Presentation on theme: "Sándor Laki (C) Geolocation by IP address 1 Geolocation by IP address Locating Internet hosts Sándor Laki"— Presentation transcript:

1 Sándor Laki (C) Geolocation by IP address 1 Geolocation by IP address Locating Internet hosts Sándor Laki lakis@inf.elte.hu http://lakis.web.elte.hu

2 Sándor Laki (C) Geolocation by IP address 2 Outline  RADAR a wireless solution  IP2Geo on the Internet  Constraint-Based Geolocation (GeoLim) OCTANT framework  Topology-Based Geolocation

3 Sándor Laki (C) Geolocation by IP address 3 RADAR

4 Sándor Laki (C) Geolocation by IP address 4 RADAR, a wireless approach  Focus on the indoor environment GPS does not work indoors Dedicated technologies  Goals Leverage existing infrastucture Use wireless LAN Software solution Better scalability and lower cost than dedicated technology

5 Sándor Laki (C) Geolocation by IP address 5 RADAR  Key idea: Signal strength matching  Offline calibration Construct radio map ( )  Real-time location and tracking Extract SStr from beacons Find table entry that best matches the measured SStr

6 Sándor Laki (C) Geolocation by IP address 6 RADAR: Determine location  Find nearest neighbor in signal space (NNSS)  1st solution Physical position of NNSS gives the user location  2nd solution K-NNSS Average the coordinates of k nearest neighbor gives the wanted position

7 Sándor Laki (C) Geolocation by IP address 7 Correlation between physical location and signal strength Base system: INFOCOM 2000 paper Enhanced system: Microsoft Technical Report MSR-TR-2000-12

8 Sándor Laki (C) Geolocation by IP address 8 IP2Geo Single-point localization

9 Sándor Laki (C) Geolocation by IP address 9 IP2Geo - Motivation  Much focus on location-aware services in wireless and mobile contexts  Such services are relevant in the Internet context too targeted advertising event notification territorial rights management network diagnostics  It is a challenging problem IP address does not inherently contain an indication of location

10 Sándor Laki (C) Geolocation by IP address 10 IP2Geo  Multi-pronged approach that exploits various “properties” of the Internet DNS names of router interfaces often indicate location network delay tends to correlate with geographic distance hosts that are aggregated for the purposes of Internet routing also tend to be clustered geographically  GeoTrack determine location of closest router with a recognizable DNS name  GeoPing use delay measurements to estimate location  GeoCluster extrapolate partial (and possibly inaccurate) IP-to-location mapping information using BGP prefix clusters

11 Sándor Laki (C) Geolocation by IP address 11 GeoTrack – main idea  Extract geographical information from DNS names of routers on the path  Localizes the target to the last router whose position is known  Example ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle 184.atm6-0.xr2.ewr1.alter.net => New York dnvr-scrm.abilene.ucaid.edu => Denver

12 Sándor Laki (C) Geolocation by IP address 12 GeoTrack  GeoTrack operation do a traceroute to the target IP address determine location of last recognizable router along the path  Key ideas in GeoTrack partitioned city code database to minimize chance of false match ISP-specific parsing rules delay-based correction  Limitations routers may not respond to traceroute DNS name may not contain location information or lookup may fail target host may be behind a proxy or a firewall

13 Sándor Laki (C) Geolocation by IP address 13 GeoPing - Delay based localization  Delay-based triangulation is conceptually simple delay to distance distance from 3 or more non-colinear points => target location  But there are practical difficulties network path may be circuitous transmission & queuing delays may corrupt delay estimate OWD is hard to measure OWD ≠ RTT/2 because of routing asymmetry

14 Sándor Laki (C) Geolocation by IP address 14 GeoPing - details  Measure the network delay to the target host from several geographically distributed probes typically more than 3 probes are used round-trip delay measured using ping utility small-sized packets => transmission delay is negligible pick minimum among several delay samples  Nearest Neighbor in Delay Space (NNDS) akin to Nearest Neighbor in Signal Space (NNSS) in RADAR construct a delay map containing (delay vector,location) tuples given a vector of delay measurements, search through the delay map for the NNDS location of the NNDS is our estimate for the location of the target host  More robust that directly trying to map from delay to distance

15 Sándor Laki (C) Geolocation by IP address 15 GeoPing – Delay tends to increase with geographic distance

16 Sándor Laki (C) Geolocation by IP address 16 GeoPing – Estimation error

17 Sándor Laki (C) Geolocation by IP address 17 GeoCluster A passive method

18 Sándor Laki (C) Geolocation by IP address 18 GeoCluster  A passive technique unlike GeoTrack and GeoPing  Basic idea: breaks the IP address space into clusters assign a geographical location to each cluster based on IP-to-location third party databases given a target IP address, first find the matching cluster using longest-prefix match. location of matching cluster is our estimate of host location

19 Sándor Laki (C) Geolocation by IP address 19 GeoCluster  Example: consider the cluster 128.95.0.0/16 (containing 65536 IP addresses) suppose we know that the location corresponding to a few IP addresses in this cluster is Seattle then given a new address, say 128.95.4.5, we deduce that it is likely to be in Seattle too

20 Sándor Laki (C) Geolocation by IP address 20 GeoCluster – Clustering IP addresses  Exploit the hierarchical nature of Internet routing inter-domain routing in the Internet uses the Border Gateway Protocol (BGP)  BGP operates on address aggregates  we treat these aggregates as clusters  in all we had about 100,000 clusters of different sizes

21 Sándor Laki (C) Geolocation by IP address 21 IP-to-location mapping  Data sources e-mail service, business web-hosting companies, etc. requires a large, fine-grain and fresh database!  Information partial information (i.e., only for a small subset of addresses) possibly inaccurate (e.g., manual input from user)

22 Sándor Laki (C) Geolocation by IP address 22 Extrapolating IP-to-location mapping  Determine location most likely to correspond to a cluster majority polling “average” location dispersion is an indicator of our confidence in the location estimate  What if there is a large geographic spread in locations? some clusters correspond to large ISPs and the internal subdivisions are not visible at the BGP level sub-clustering algorithm: keep sub-dividing clusters until there is sufficient consensus in the individual sub-clusters some clients connect via proxies or firewalls (e.g., AOL clients) sub-clustering may help if there are local or regional proxies otherwise large dispersion => no location estimate made many tools fail in this regard

23 Sándor Laki (C) Geolocation by IP address 23 Performance of GeoCluster Median errors: GeoCluster ~30km GeoPing ~300km GeoTrack ~100km

24 Sándor Laki (C) Geolocation by IP address 24 Other database-oriented applications  NetGeo and IP2LL based on WHOIS DB  not closely regulated  the address information often indicates the head office of the owner which may be far from the actual target  Quova Commercial service with thier own database  Gtrace using DNS LOC entries

25 Sándor Laki (C) Geolocation by IP address 25 Octant framework A very impressive solution

26 Sándor Laki (C) Geolocation by IP address 26 Octant overview  Combine very different techniques Active and passive  Constraint-based Weighted positive and negative constraints Constraint => region  Using Bézier-regions Efficient implementations of clipping and union operations are available

27 Sándor Laki (C) Geolocation by IP address 27 Octant - Notations   i : the region in which the target node is located   j : a constraint It is a region where the node might be reside associated with weight  Set of nodes Landmarks: physical locations are at least partially known (L j ) Every L j has an „estimated” location  L j

28 Sándor Laki (C) Geolocation by IP address 28 Octant – Landmarks and constraints  Primary landmark GPS, street address Low error  Secondary landmark Position computed by Octant itself  Positive constraints ( set:  ) Node A is within d miles of L k  =  (x,y) in  k c(x,y,d), where c(x,y,d) is a disc.  Negative constraints( set:  ) Node A is further than d miles from L k  =  (x,y) in  k c(x,y,d)

29 Sándor Laki (C) Geolocation by IP address 29 Estimated location   i =  X i  X i \  X i  X i

30 Sándor Laki (C) Geolocation by IP address 30 Mapping latencies to distances  Latency between a target and a landmark bounds thier maximum distance Calculate with speed of light  delay*2/3*c  Low precision Octant’s way  Dynamic calibration  For each L landmark compute two bounds R L (d) and r L (d)  where d is the ping time of node i r L (d)  || loc(L) – loc(i) ||  R L (d) When queuing delays are dominant then r L (d) = 0.

31 Sándor Laki (C) Geolocation by IP address 31 Mapping latencies to distances  Each landmark periodically pings all other landmarks => creating a correlation table  Determines the convex hull around the points => R(d) and r(d)  It is sufficient when the target has a direct and congestion-free path to the landmark Octant introduce a cut off at latency p  a tunable percentage of landmark lie to the left of p  discard the others  (z is a fictitious datapoint,  placed far away)

32 Sándor Laki (C) Geolocation by IP address 32 Mapping latencies to distances

33 Sándor Laki (C) Geolocation by IP address 33 Last hop delays  Mapping is further complicated by queuing and transmission delays associated with the last hop Cable and DSL connections Overloaded PlanetLAB nodes  Goal: isolate the delay components which artificially inflate latencies Detailed maps of the underlying physical network, as in network tomography (not in Octant) Octant introduce a simple metric called height

34 Sándor Laki (C) Geolocation by IP address 34 Last hop delays in Octant  Based on pair-wise latency measurements between landmarks  Primary landmarks: a, b, c  Measure thier latencies(RTT): [a,b], [a,c], [b,c]  The positions of primary landmarks are known -> we can estimate the transmission delays: (a,b), (a,c), (b,c)  Lasthop delay(a,b) = [a,b] - (a,b)  Landmark coordinates (a lon, a lat ),…

35 Sándor Laki (C) Geolocation by IP address 35 Last hop delays in Octant  How much of the delays can be attributed to each landmark? Denoted by a’, b’ and c’ // height  Similarly, for a target t, we can compute t’, as an estimation…  We can solve for  t’, t lon, t lat

36 Sándor Laki (C) Geolocation by IP address 36 Last hop delays  t lon and t lat has relatively high error not used in the later stages  Given the target and landmark heights  Each landmark can shift its R L up if t’ < heights of the other landmarks r L down if t’ > heights of the other landmarks

37 Sándor Laki (C) Geolocation by IP address 37 Indirect routes  The preceding assumption Route lengths are proportional to great circle distances  not the case in practise, due to policy routing  Example: a subscriber Ithaca, NY -> Cornell Univ. (Ithaca) Syracuse, NY -> Brockport, IL -> New York City -> Cornell Univ. ~1 mile physical distance VS. ~800 miles length path

38 Sándor Laki (C) Geolocation by IP address 38 Indirect routes discovery  Landmark’s heigth can indicate…  Localizing routers on the network path Secondary landmarks Localization by latencies Extract location from router names Reverse DNS lookup – undns tool Using ZIP code to determine geographical location

39 Sándor Laki (C) Geolocation by IP address 39 Handling uncertainty  Filter out errorneous constraint  Latency based constraints Weight system that decreases exponentially with increasing latency Weight threshold

40 Sándor Laki (C) Geolocation by IP address 40 Iterative refinement  Two phase: First, we use accurate and mostly conservative constraints Second, less acurate and more aggressive constraints to obtain a better estimation (inside the initial estimated region) …

41 Sándor Laki (C) Geolocation by IP address 41 Results

42 Sándor Laki (C) Geolocation by IP address 42 Results

43 Sándor Laki (C) Geolocation by IP address 43 Topology-based Geolocation

44 Sándor Laki (C) Geolocation by IP address 44 Motivations  Problems with CBG Use constraints that are less than speed of light  Risk of underestimates When an underestimate occurs, the final region does not contain the true location  Topology based geolocation using the speed of light to generate constraints inspired by Sensor Network Localization

45 Sándor Laki (C) Geolocation by IP address 45 Summary of techniques  Traceroute from landmarks Map topology  Estimate hop latency Improve accuracy  Cluster network interfaces Increase structuring  Validate location hints Incorporate location hints  Constraint optimization Geolocate targets

46 Sándor Laki (C) Geolocation by IP address 46 Estimate hop latencies  Using traceroute tool to infer link latency  Estimate hop latency from the difference in RTT to adjacent routers Accurate only if the link is traversed both directions (symmetric routing) How can we discover this property?  Three different techniques

47 Sándor Laki (C) Geolocation by IP address 47 Estimate hop latencies  First, observing the reverse TTL values Most routers initialize the TTL values for thier packets from a small set.  30,32,64,128,150,255 If TTL values changes significantly from one node to the next => discard the link estimate  Second, measuring paths in both direction between pairs of landmarks If both paths traverse a particular link => taking the differences of measurements to the two endpoints This estimation has high confidence  Third, increasing vantage points from which we probe a certain link For every link on a path from a landmark we probes to both endpoints from all other landmarks… If these probes pass over the link => estimate for the link

48 Sándor Laki (C) Geolocation by IP address 48 Clustering interfaces  Clustering interfaces that belong to the same router (IP aliases)

49 Sándor Laki (C) Geolocation by IP address 49 Clustering interfaces  Two IP-aliases techniques Mercator  UDP probes are send to high-numbered ports on a set of interfaces  Routers send back a port-unreachable ICMP message with the source address  If two diff. interfaces replie with the same source address => aliases Ally  Used on pairs of interfaces  Sends probes to the two if.  Examines the IP-ID  Most routers generate the IP-ID using a single counter that has incremented after each packet has been created

50 Sándor Laki (C) Geolocation by IP address 50 Validating location hints  DNS names -> locations  Some names are incorrect… Missnamed, reconfig, reassignment of IP addresses  Topology constraints can be used to verify location hints RTT measurements -> upper bounds… Clustering -> aliases Hop latencies

51 Sándor Laki (C) Geolocation by IP address 51 Constraint optimization  Targets:X = {x i } Landmarks:L = {l i }  Distance bw i and j: d(i,j)  Hard delay constraint D(l i x j ) <= c ij // c ij Speed of light Set of hdc = C d  Soft Link Latency Constraints Hop latency bw i and j: h ij D(x i,x j ) = h ij + e ij Where e ij is some error Set of SLLC = C l

52 Sándor Laki (C) Geolocation by IP address 52 Constraint optimization  Minimize:  i,j in C l |e ij |  Subject to: C d, C l  Not a convex optimization problem But we can recast it as a semidefinite program  Using fast solvers SeDuMi Vivaldi

53 Sándor Laki (C) Geolocation by IP address 53 Results

54 Sándor Laki (C) Geolocation by IP address 54 Results

55 Sándor Laki (C) Geolocation by IP address 55 References  RADAR and IP2Geo http://eris.prakinf.tu-ilmenau.de/res/papers/coopStreaming/padmanabhan01Locating.pdf  Bernard Wong, Ivan Stoyanov and Emin Gün Sirer. Octant: A Comprehensive Framework for the Geolocalization of Internet Hosts. In Proceedings of the Symposium on Networked System Design and Implementation, Cambridge, Massachusetts, April 2007.  Katz-Bassett, E., John, J. P., Krishnamurthy, A., Wetherall, D., Anderson, T., and Chawathe, Y. Towards IP geolocation using delay and topology measurements. In Proceedings of the 6th ACM SIGCOMM on internet Measurement (Rio de Janeriro, Brazil, October 25 - 27, 2006). IMC '06. ACM Press, New York, NY, 71- 84.


Download ppt "Sándor Laki (C) Geolocation by IP address 1 Geolocation by IP address Locating Internet hosts Sándor Laki"

Similar presentations


Ads by Google