Automatically Annotating and Integrating Spatial Datasets Chieng-Chien Chen, Snehal Thakkar, Crail Knoblock, Cyrus Shahabi Department of Computer Science &Information Technologies University of Southern California Discussant: Oncel Tuzel
Outline Problem Definition Finding Control Points Filtering Control Points Integration of Data Sources Performance Evaluation Conclusion
Problem Definition Automatic integration of data sources having: –Different projections –Different accuracy –Different formats Application –Building Finder –Road Extraction –Etc.
Data Sources Microsoft Terraservice –Satellite Image –Feature Points Feature name Type Lattitude/Longitude TIGER/Line Files (A digital database of geographic features, such as roads, railroads, rivers, lakes, legal boundaries, census statistical boundaries, etc. covering the entire United States.) –Name –Type of feature –Latitude/Longitude –Address, etc…
Data Sources Online data / Yellow pages –Type –Name –Address White lines: Roads from TIGER/Line data source Image: MS Terraservice satellite image
Finding Control Points Control point pair consists of a point in one dataset and a corresponding point in the other dataset. Determines accuracy of the algorithm. Used to transform arbitrary points from one dataset to other. Methods: Using Online Data Analyzing Imagery Using Vector Data
Control Points Using Online Data
Method –For a given location TerraService dataset has accurate control points (churches, libraries, hospitals, etc.) –Find the corresponding control points in Tiger/Lines dataset –Search landmark categories on yellow page sources –Get the address of the landmark find the address in Tiger/Lines DB –Match the names of the landmarks and find matching control points Problems –Inaccuracies in yellow pages –Landmarks are not uniformly distributed –Landmarks may have large areas
Control Points Using Online Data Terraservice DB Yellow pages, Tiger/Line DB integrated
Control Points Analyzing Imagery Using Vector Data Road intersections may be good control points Use computer vision techniques to find the roads intersections on satellite image Find intersections in Tiger/Line files Match control points Automatically extracting road intersections on large images are: –Time consuming –Inaccurate Proposed Method: Localized Image Processing
Localized Image Processing Mark the locations of the intersections points found from Tiger/Line DB on satellite image Define the area size parameter –Start with a small area size, increase the area size until meet some clear features Search the region centered at marked point having given area size Find the edges on the given region Mark the intersection of detected lines Smaller search region –easier –faster
Filtering Control Points Both methods may generate inaccurate points Inaccurate points reduce the accuracy of alignment of data sets Inaccurate control points are detected by identifying pairs having significantly different relationship than the other pairs Vector Median Filter Represent each control point pair by a 2D displacement vector Median vector is the vector that has the least summed distance to other points Finds the correct median if pairs are accurate Modified to get the k nearest vectors to the median
Vector Median Filter As k increases provides more control points, but there may be more inaccurate pairs A natural choice is to select
Conflating Imagery And Vector Data Arbitrary points on one of the data set is transferred to the other using the extracted control points Delaunay Triangulation and piecewise linear rubber sheeting are utilized for transformation Triangulation Alignment according to local adjustments is proposed The domain is partitioned into small pieces (triangles) Delaunay Triangulation is used –Maximizes the minimum angle of all the angles in the triangulation –Avoids triangles with small angles –Built in O(nlogn) time
Conflating Imagery And Vector Data Piecewise Linear Rubber Sheeting Find the transformation coefficients to map triangulation of vector data to imagery Apply the same coefficients to the ends of road segments of vector data Construct the road network on satellite image Since Delaunay Triangulation avoids triangles with small angles, there is less distortion Region Growing Used to find control points where there is no landmarks or intersections Extrapolation on current control points is performed
Performance Evaluation Tests are performed integrating vector data to satellite imagery Evaluation is performed according to generation of control points and effect of filtering Hypothesis –Automated conflation using automatically generated control points without filtering improves accuracy of road identifications –Filtering technique further improves the results –Best results are achieved with localized image processing with vector filtering
Experimental Setup Microsoft TerraService web server was used to query satellite images Tiger/Line files were used as the vector data There are spatial inconsistencies between the data sets Accurate roads are generated by conflating vector data with manually selected control point pairs The experiments are performed by measuring the displacement between the conflated road endpoints and accurate road endpoints Results are given for both control point generation method with/without filtering Tests are performed on two different locations having 300/500 road end points
Online Data vs. Intersection Points
Filtered Vs. Unfiltered Control Points
Results VMF Filtered Online DataVMF Filtered Intersection Pts.
Conclusion An automated integration approach is designed and implemented Results show improvement on road identification Does not offer a general mechanism Accurate roads may be marked manually on the satellite image??? Different transformations may be applied on arbitrary points???
The End