RandPing: A Randomized Algorithm for IP Mapping Michelle Liu Yuhan Cai 11/16/2018
Outline Introduction Related Work Background Algorithm Overview Experimental Evaluation Conclusions and Future Work 11/16/2018
Introduction Motivations Problem statement Challenges Collection of personalized information Authorities of transactions Problem statement IP mapping is the problem that, given an IP address p, find the geographic location of the internet host with IP address p. Challenges No authorative database IP addresses do not contain geographic information 11/16/2018
Related Work DNS based approach Delay based approach Using DNS records from databases IP2LL, NetGeo, and GeoTrack DNS might not be related to locations Delay based approach Exploiting relationship between distances and network delays GeoPing and CBG Clustering based approach Splitting IP address space into clusters Assumption: all hosts within the same cluster are co-located 11/16/2018
Background Best line bound Above the baseline Below all data points Closest to all data points 11/16/2018
Background (cont.) Clustering Outlier detection Scriptroute system Partitioning Around Medoids (PAM) Quality of a Clustering = average of the distance of an object to the medoid of its cluster Outlier detection O is a DB(p, D)-outlier if at least fraction p of T lies greater than distance D from O. Scriptroute system A system that allows network measurements conduction from remote vantage points 11/16/2018
Algorithm Overview Overall idea Major steps Clustering probing machines Random selection of a small set of probing machines Reduction of search space by pruning Major steps Preprocessing stage Randomized pinging Location estimation 11/16/2018
Preprocessing Stage Construction of RTT table and Distance table for probing machines Computation of the best line for each probing machine subject to the constraint: 11/16/2018
Preprocessing (cont.) Clustering of probing machines based on their geographic locations Transformation of the geographic system to a Cartesian coordinate system x = 2RcosT0 (G – G0) / 360 y = 2R (T - T0) / 360 11/16/2018
Randomized Pinging Random selection of m clusters Random selection of k probing machines within each cluster Pinging the target machine to get n = m*k RTT measurements 11/16/2018
Location Estimation Computation of estimated distances Determination of the best group of circles by dynamic programming Keep track of groups of circles Incrementally build up each group Pick the biggest group 11/16/2018
Location Estimation (cont.) Locating the target machine by non-linear programming subject to the constraints: 11/16/2018
Location Estimation (cont.) Repeat the process for r times Computation of the centroid for the r estimated locations Prune out distance-based outliers Compute the centroid of the points left 11/16/2018
Experimental Results Setup Results Machines selected from Planetlab in US One small set of machines to be target machines, the rest to be probing machines Results Error distance: distance between the real location of the target machine and the estimated one 11/16/2018
Experimental Results (cont.) City Name Actual Location Estimated Location Error Distance (km) Cornell (NY) (-76.476, 42.4478) (-72.3764, 43.2691) 345.9 Duke (-78.9427, 36.0088) (-73.9713, 39.6992) 633 Intel (Seattle) (-122.316, 47.6614) (-122.2084, 45.5088) 250.1 Northwestern (-87.69, 42.05) (-89.9477, 40.2735) 272.2 Stanford (-122.172, 37.4294) (-114.5750, 35.8964) 663 Dartmouth (-70.9667, 41.6167) (-77.3431, 40.9380) 496.3 UCSC (-122.06, 37.0) (-119.1027, 37.4213) 270.2 UGA (-83.36, 33.98) (-76.7415, 33.5117) 591.1 UMASS (-72.5249, 42.3881) (-68.5706, 41.5383) 333.7 UOregon (-123.06, 44.04) (-111.4846, 39.1779) 1075 Uvirginia (-78.4749, 38.0613) (-72.8606, 39.9402) 536.2 CalTech (-118.15, 34.1358) (-114.4350, 35.5736) 373.3 Pittsburg (-79.9486, 40.4451) (-80.2406, 39.7665) 53.88 Rutgers (-70.4313, 40.5228) (-74.4294, 40.5492) 336.7 Umich (-83.7126, 42.2944) (-82.4517, 42.6130) 131.2 Wisc (-89.3867, 43.0757) (-88.2059, 42.1028) 150.6 11/16/2018
Experimental Results (cont.) 11/16/2018
Experimental Analysis Limited number of probing machines Effect of randomization is not obvious The best line estimation is too conservative. Intersection region of the circles is too big. 11/16/2018
Conclusions A randomized approach for IP mapping using clustering and outlier detection Location estimation based on dynamic programming and non-linear programming 11/16/2018
Future Work Adjusting the algorithm parameters: number of clusters number of trials and number of picked machines Proving a lower bound for the difference between the accuracy of randomized algorithm and deterministic algorithm 11/16/2018