Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University
Nearest Neighbor Searching (NNS) Applications Pattern Recognition, Data Compression Statistical Classification, Clustering Databases, Information Retrieval Computer Vision, etc. http://en.wikipedia.org/wiki/Nearest_neighbor_search
Nearest Neighbor Searching Under Uncertainty Discrete pdf Continuous pdf
Nearest Neighbor In Expectation _________
Bisector In Case Of Gaussian For Gaussian distribution, bisector is a line! Hard to get explicit formula! Figure: http://www.cs.utah.edu/~hal/courses/2009S_AI/Walkthrough/KalmanFilters/
Squared Distance Function bisector is simple and beautiful! In case of discrete pdf, bisector is also a line! In both cases, compute the Voronoi diagram, solve it optimally! However, not a metric !
Sampling Continuous Distributions Sometimes working on continuous distributions is hard…. Lower bounds on other metrics and distributions are also possible…. Let’s focus on discrete pdf then….
Expected Nearest Neighbor In L1 Metric (Manhattan metric)
Expected Nearest Neighbor In L1 Metric ( cont. ) Source: Range Searching on Uncertain Data [P.K.Agarwal et al. 2009]
Geometric Reduction
Building Block: Half-Space Intersection and Convex Hulls Upper hulls correspond to lower envelopes, an example in 2D Source: page 252 – 253, Computational Geometry: Algorithms and Applications, 3rd Edition[Mark de Berg et al. ]
Segment-tree Based Data Structures for Expected-NN In L1 Metric
Segment-tree Based Data Structures for Expected-NN In L1 Metric ( cont
Segment-tree Based Data Structures for Expected-NN In L1 Metric ( cont Size of data structure Preprocessing time Query time Summary of the result
Approximate L2 Metric It’s a metric when P is centrally symmetric!
Approximate L2 Metric ( cont. ) More complex!
Future Work Approximate the expected NN in L2 metric Work harder in the near future! Approximate the expected NN in L2 metric Study the complexity of expected Voronoi diagram Study the probability case
Questions? Thanks! Main References: [1] Pankaj K. Agarwal, Siu-Wing Cheng, Yufei Tao, Ke Yi: Indexing uncertain data. PODS 2009: 137-146 [2] Pankaj K. Agarwal, Lars Arge, Jeff Erickson: Indexing Moving Points. J. Comput. Syst. Sci. 66(1): 207-243 (2003) Questions?