Presentation is loading. Please wait.

Presentation is loading. Please wait.

UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001 Lecture 8 Approximate Nearest Neighbor.

Similar presentations


Presentation on theme: "UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001 Lecture 8 Approximate Nearest Neighbor."— Presentation transcript:

1 UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001 Lecture 8 Approximate Nearest Neighbor Searching Derandomization for Efficient Geometric Partitioning Monday, 4/30/01

2 Part 2 Advanced Topics ApplicationsManufacturingModeling/Graphics Wireless Networks VisualizationTechniques(de)RandomizationApproximationRobustnessRepresentationsEpsilon-net Decomposition tree

3 Literature for Part II

4

5 Approximate Nearest Neighbor Searching “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions” Arya, Mount, Netanyahu, Silverman, Wu

6 Goals ä Fast nearest neighbor query in d- dimensional set of n points: ä approximate nearest neighbor  distance within factor of (1+  ) of true closest neighbor ä preprocess using O(dnlogn) time, O(dn) space ä Balanced-Box Decomposition (BBD) tree  note that space, time are indepenent of   query in O(c d,   logn) time C++ code for simplified version is at http://www.cs.umd.edu/~mount/ANN

7 Approach: Distance Assumptions ä Use L p (also called Minkowski) metric ä assume it can be computed in O(d) time ä pth root need not be computed when comparing distances ä Approximate nearest neighbor  distance within factor of (1+  ) of true closest neighbor p*  Can change  or metric without rebuilding data structure

8 Approach: Overview ä Preprocess points to create: ä Balanced-Box Decomposition (BBD) tree ä Query algorithm: for query point q ä Locate leaf cell containing q in O(log n) time ä Priority search: Enumerate leaf cells in increasing distance order from q ä For each leaf cell, calculate distance from q to cell’s point ä Keep track of closest point p seen so far  Stop when distance from q to leaf > dist(q,p)/(1+  ) ä Return p as approximate nearest neighbor to q.

9 Balanced Box Decomposition (BBD) Tree ä Similar to kd-tree [Samet handout] ä Binary tree ä Tree structure stored in main memory ä Cutting planes orthogonal to axes ä “Alternating” dimensions ä O(log n) height ä Subdivides space into regions of O(d) complexity using d- dimensional rectangles ä Can be built in O(dn log n) time x1x1 x4x4 x3x3 y1y1 x2x2 y2y2 y3y3 One possible kd-like tree for the above points (not a BBD tree, though) y1y1 x1x1 x1x1 2 7 y2y2 1 5 y2y2 y3y3 x2x2 8 9 x3x3 x4x4 3 4 6 < >=

10 Balanced Box Decomposition (BBD) Tree (continued) ä Distinguishing features of BBD tree: ä Cell is either ä d-dimensional rectangle or ä difference of 2 d-dimensional nested rectangles ä In this sense, BBD tree is like: ä Optimized kd-tree: partition points into roughly = sized sets [inner box shrink] ä While descending in tree, number of points on path decreases exponentially ä Specialized Quadtree: aspect ratio of box is bounded by a constant [hyperplane split] ä While descending in tree, size of region on path decreases exponentially ä Leaf may be associated with more than 1 point in/on cell: O(n) node ä Inner boxes are “sticky”: if it is close to edge, it “sticks” subdivision tree split shrink

11 Midpoint Algorithm for Splitting/ Shrinking ä Split box b using hyperplane through center of b and orthogonal to ith coordinate axis (longest dimension) ä Bounds aspect ratio ä Centroid shrink: produce O(1) subcells, each with <= 2n c /3 points [n c =# pts in current cell] ä 3-stage: shrink, split, shrink single-stage simplified shrink 3-stage shrink, split, shrink what’s wrong with this approach?

12 Middle-Interval Algorithm for Splitting/ Shrinking ä Flexibility for splitting plane choice ä Choose plane from a central strip of current outer box

13 ä Each subdivision cell satisfies this packing constraint: ä Proof has 2 cases: ä Overlapping boxes ä Disjoint boxes: ä Box of side 2r encloses ball of radius r ä Aspect ratio 3:1 implies smallest side length >= s/3 ä Densest packing given by regular grid of boxes of side length s/3 ä Interval of length 2r can intersect no more than intervals ä Account for all dimensions by raising to power d Packing Constraint Given a BBD-tree for a set of data points in R d, the number of leaf cells of size at least s>0 intersecting a (Minkowski L m ) open ball of radius r>0 is at most

14 ä Visit boxes in increasing order of distance from q ä Similar to kd-tree priority search ä Maintain priority queue of tree nodes ä Node priority inversely related to dist(q,cell) ä Search repeats: ä Extract highest priority node ä Descend subtree ä visit leaf closest to q ä add siblings to queue Priority Search from Query Point node closest to query point At start, root + v 1, v 2, v 3, v 4 are in priority queue

15 ä Maintain sum of appropriate powers of coordinate differences between query point and nearest point of outer box ä Incrementally update distance from parent box to each child when split is performed: ä Closer child has same distance as parent ä Further child’s distance needs only 1-coordinate update (along splitting dimension) ä Can make a difference in higher dimensions! Incremental, Relative Distance [Arya, Mount93] L 1 distance yTyTyTyT yByByByB (x L + x 1, y T - y 1 ) xLxLxLxL xRxRxRxR (x L + x 2, y B + y 2 ) (x R - x 3, y T - y 3 ) (x R - x 4, y B + y 4 ) yTyTyTyT yByByByB (x M - x’ 1, y T - y 1 ) xLxLxLxL xRxRxRxR (x L + x 2, y B + y 2 ) (x R - x 3, y T - y 3 ) (x M + x’ 4, y B + y 4 ) xMxMxMxM

16 Experiments Experiments generated points from a variety of probability distributions: UniformGaussianLaplaceCorrelated Gaussian Correlated LaplacianClustered GaussianClustered Segments

17 Experiments

18

19

20 Conclusions ä Algorithm is not necessarily practical for large dimensions ä But, for dimensions <= ~20, does well ä Shrinking helps with highly clustered datasets, but was not often needed in their experiments ä Only needed for 5-20% of tree nodes ä BBD tree (in paper’s form) is primarily for static point set ä But, auxiliary data structure could maintain changes

21 Derandomization for Efficient Geometric Partitioning “ Bounded-Independence Derandomization of Geometric Partitioning with Applications to Parallel Fixed-Dimensional Linear Programming ” Goodrich, Ramos

22 Overview ä Paper concerns geometric partitioning: ä Given: ä a collection X of n hyperplanes in R d ä a parameter r ä Partitioning Goal: ä partition R d into O(r d ) constant-sized cells ä so that each cell intersects few hyperplanes ä Previous Work:  Random sampling -> partition in which each cell intersects at most  n hyperplanes, where  =logr/r ä Derandomization can be used for deterministic construction ä Current Work: ä Assume set is a special space with a special property ä For such a set, construct (efficiently, deterministically, and in parallel) a (small-sized) approximation for the space ä Apply to efficiently & deterministically solve parallel fixed-dimensional linear programming For other Goodrich papers, see http://www.cs.jhu.edu/~goodrich/cgc/pubs/

23 Background: Derandomization ä Common approach for randomized geometric algorithms: ä use small-sized random samples ä Derandomize: ä quantify combinatorial properties of the random samples ä show that sets with these properties can be constructed efficiently without randomization ä Combinatorial properties often characterized by what the next long series of slides is about….

24 Background: Configuration ä Given an abstract set (universe) N of geometric objects  A configuration  over N is a pair (D,L) = (D(  ),L(  )), where D, L are disjoint subsets of N ä Objects in D are:  triggers associated with   objects that define   d(  ) = cardinality of D(  ) = degree ä Objects in L are:  stoppers associated with   objects that conflict with   l(  ) = cardinality of L(  ) = level = (absolute) conflict size Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

25 Background: Configuration Example   h   h   h   h   set of line segments in the plane   is feasible if  occurs in trapezoidal decomposition H(R) for some subset R of N ä trapezoids arising in incremental computation of H(N)  Here, R = { h   h  } Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley H(R) segments in R segments in N \ R h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4  h1h1h1h1 h2h2h2h2 h4h4h4h4 N h3h3h3h3

26 Background: Configuration Example  For a feasible trapezoid  define its:  trigger set D(  ) = segments of N adjacent to boundary of   conflict set L(  ) = segments of N \ D(  ) intersecting  Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley H(R) segments in R segments in N \ R h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4  Configuration (D(  ), L(  )) where D(  )={h 3, h 4 } and L(  )={h 1, h 2 }

27 Background: Configuration Space  A configuration space  (N)  over N is a (multi)set of configurations with the ä Bounded Degree Property:  The degree of each configuration in  (N)  is bounded (by a constant -- something independent of N) Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley Note: The term configuration space is also used in motion planning. In that context, is refers to the motion planning search space.

28 Background: Configuration Example  Associate with each feasible  a configuration (D(  ), L(  ))  If N in general position, d(  ) = cardinality of D(  ) <= 4  since  is a trapezoid  Due to bounded degree d(  ), result  (N)  is a configuration space of all feasible trapezoids over N H(R) segments in R segments in N \ R h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4  D(   )={h 3, h 4 } L(   )={h 1, h 2 } Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley  D(   )={h 3, h 4 } L(   )={0} configurations for feasible trapezoids   and  

29 Background: Configuration Example  If we restrict N to be { h   h   then   ,   are 2 feasible trapezoids  D(   )= D(   )={h 3, h 4 }  L(   )= L(   )={0} ä 2 “distinct” configurations:  (D(   ), L(   )) = (D(   ), L(   ))  Size of  (N) includes such “duplicate” configurations  Reduced Size of  (N) excludes “duplicates” 2 feasible trapezoids for N = { h   h   h3h3h3h3 h4h4h4h4   Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

30 Background: Configuration Example ä Note that not every arrangement of line segments (before overlaying a trapezoidal decomposition on it) has the bounded degree property.  In general, it can have d(  ) = O(N) ä Can you think of another type of decomposition that has bounded degree? Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4  h5h5h5h5

31 Background: Configuration Example  Definition:  i (N) is set of configurations in  (N) with level i  [recall level is size of L(  ), the conflict set]   0 (N) is active over N.  Example:  0 (N) = {(D(   ), L(   )), (D(   ), L(   ))} 2 feasible trapezoids for N = { h   h   h3h3h3h3 h4h4h4h4   Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

32 Background: Configuration Example  Definition: A configuration space  (N)  has bounded valence if the number of configurations in  (N)  sharing the same trigger set is bounded (by a constant).  Example: For  (N)  = our configuration space of all feasible trapezoids over N has bounded valence ä all feasible trapezoids with same trigger set can be identified with trapezoids in trapezoidal decomposition formed by that trigger set ä size of that trigger set is bounded by a constant, so number of such trapezoids is also bounded by a constant Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley Trapezoidal decomposition induced by trigger set = {h 3, h 4 } h3h3h3h3 h4h4h4h4  

33 Background: Configuration Example ä Theorem: ä Let:   (N)  be a configuration space of bounded valence ä n=size of N  d = maximum degree of a configuration in  (N) ä R = a random sample of N of size r ä Then:  For each active configuration  in  0 (R) ä with probability > 1/2  the conflict size of  relative to N is <= c(n/r) log r for large enough c  Expected reduced size: E[reduced size of  (R)] is in O(r d ) ä Example: For any random sample R of N of size r ä each trapezoid in the trapezoidal decomposition H(R) has O([n/r] log r) conflict size with high probability  size of  (R) is in O(r d )  for bounded  (N)  size and reduced size only differ by constant factor Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

34 ä Definition:  Let:  (N)  be a configuration space ä n=size of N   ’(r) be maximum reduced size function of  (N)  for r <= n   (N)  has bounded dimension if there is a constant d such that  ’(r) is in O(r d ) for all r <= n  In this case, d is the dimension of  (N)   ounded valence -> bounded dimension ä Some important types of configuration spaces don’t have bounded valence but have bounded dimension ä Range space: configuration space for which trigger set of every configuration is empty. In this case, a configuration is a range.  Half-space Range: Range = points in halfspace.  (N)  set of distinct ranges induced by (upper) halfspaces. Dualize -> line arrangement [it has bounded dimension] Background: Range Space Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley

35 Background:  -net of a Range Space Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley ä Theorem: ä If:   (N)  is a configuration space of bounded dimension d   >= 0 ä R is a random subset of N ä formed via r independent draws from N with replacement  r >= 8/  ä then:  conflict size of each configuration  in   (R)  in (relative to N) of every range in   (R)  n  with probability at least 1- 2  ’(2r) 2 -  r/2  For a range space  (N)  R is an  -net of the range space  (N)  for large enough r, a random sample of size r is an  -net with high probability

36 Background: VC-Dimension of a Range Space Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley ä Use to bound dimension when direct argument fails  Let  (N)  be a range space  A subset M of N is shattered by N if every subset of M occurs as a range in  (M)  Reduced size of  (M) is 2 m  VC-Dimension of  (N)  is maximum size of a shattered subset of N. N=set of 1D points Example:  (N)=space of ranges induced by rightwards half-spaces h1h1h1h1 h2h2h2h2 p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 p5p5p5p5 p6p6p6p6 p7p7p7p7 h3h3h3h3 h4h4h4h4 h5h5h5h5 h6h6h6h6 h7h7h7h7 h8h8h8h8 What is the VC-Dimension of  (N)?

37 What does all this have to do with the paper???? ä Paper concerns geometric partitioning: ä Given: ä a collection X of n hyperplanes in R d ä a parameter r ä Goal: ä partition R d into O(r d ) constant-sized cells ä so that each cell intersects few hyperplanes ä Previous Work:  Random sampling -> partition in which each cell intersects at most  n hyperplanes, where  =logr/r ä Derandomization can be used for deterministic construction ä Current Work: ä Assume set is a range space with bounded VC-exponent ä VC-exponent is more general concept than VC-dimension  For such a set, construct (efficiently, deterministicaly, and in parallel) a (small-sized) approximation for the range space that is a variation on the  -net concept.

38 Additional Handouts ä Parallel programming ä PRAM CREW, EREW models ä Parallel geometric algorithms

39 Project Update

40 Project ProposalMonday, 4/92% Interim ReportMonday, 4/235% Final PresentationMonday, 5/78% Final SubmissionMonday, 5/1410% 25% of course grade DeliverableDue Date Grade %

41 Guidelines: Presentation ä 1/2 hour class presentation ä Explain to the class what you did ä Structure it any way you like! ä Some ideas: ä slides (electronic or transparency) ä demo ä handouts

42 Guidelines: Final Submission ä Abstract: Concise overview (at most 1 page) ä Introduction: ä Motivation: Why did you choose this project? ä Related Work: Context with respect to CG literature ä Summary of Results ä Main Body of Paper: (one or more sections) ä Conclusion: ä Summary: What did you accomplish? ä Future Work: What would you do if you had more time? ä References: Bibliography (papers, books that you used) Well- written final submissions with research content may be eligible for publishing as UMass Lowell CS technical reports.

43 Guidelines: Final Submission ä Main Body of Paper: ä If your project involves Theory/ Algorithm: ä Informal algorithm description (& example) ä Pseudocode ä Analysis: ä Correctness ä Solutions generated by algorithm are correct ä account for degenerate/boundary/special cases ä If a correct solution exists, algorithm finds it ä Control structures (loops, recursions,...) terminate correctly ä Asymptotic Running Time and/or Space Usage

44 Guidelines: Final Submission ä Main Body of Paper: ä If your project involves Implementation: ä Informal description ä Resources & Environment: ä what language did you code in? ä what existing code did you use? (software libraries, etc.) ä what equipment did you use? (machine, OS, compiler) ä Assumptions ä parameter values ä Test cases ä tables, figures ä representative examples

45 Final Exam

46 Final Exam: Date, Format ä Date Choices ä Friday, 18 May at ä 1:00-4:00 pm or ä 5:30-8:30 pm ä Wednesday, 23 May at ä 9:00-12:00 am or ä 1:00-4:00 pm or ä 5:30-8:30 pm or ä Format: ä in class ä open book, notes ä similar to midterm: ä 50% calculate/ manipulate ä 50% design, analyze

47 Final Exam: Part I Material ä O’Rourke CH 1-8: emphasis on chapters omitted from midterm (CH 7-8) ä Some key themes ä Common geometric/combinatorial structures: ä Decomposition/Partition: ä Triangulation ä Trapezoidalization ä Delaunay Triangulation ä Voronoi Diagram ä Arrangment (level, zone) ä Enclosure: ä Convex Hull ä Nested Polytope Hierarchy ä Visibility Polygon & Kernel of Star Polygon

48 Final Exam: Part I Material ä Some key themes (continued) ä Algorithmic Paradigms ä Sweep: sort, then sweep a line, parabolic front ä Divide-and-Conquer ä Incremental ä Randomized ä Output-Sensitive ä Preprocessing for fast queries ä Representations: ä Quad-edge ä O’Rourke ä Geometric Primitives

49 Final Exam: Part I Material ä Some key themes (continued) ä Math: ä Convexity ä Monotonicity ä Distance Metrics ä Visibility/ Star-shapedness ä Euler’s Formula ä Duality ä Graphs ä Point Line ä Parabolic ä Minkowski Sum ä Randomness ä Graph Theory: Independent Set

50 Final Exam: Part II Material ä Part II ä Translational Polygon Containment ä Connected Dominating Sets for Wireless Networks ä Mesh Generation using Delaunay Triangulation ä Approximate Nearest Neighbor Searching ä Derandomization for Efficient Geometric Partitioning

51 Final Exam: Part II Material ä Translational Polygon Containment ä Minkowski Sum properties & use for intersection ä Distance metrics: L p, convex ä Linear programming: Basic model formulation ä Linear equations representing constraints & objective function ä Convex feasible region ä Types of variables ä Connected Dominating Sets for Wireless Networks ä Dominating set of an undirected graph ä Basic algorithm + 2 rules

52 Final Exam: Part II Material ä Mesh Generation using Delaunay Triangulation ä Incremental Delaunay Triangulation Algorithm ä Edge flipping ä Constrained Delaunay Triangulation ä Mesh refinement ä Robustness: awareness of general issues ä Derandomization for Efficient Geometric Partitioning ä Configurations, configuration spaces, conflict and trigger sets

53 Final Exam: Part II Material ä Approximate Nearest Neighbor Searching ä Distance computations for high dimensions: ä approximate distance & stopping criterion ä compare without taking roots ä incremental for subdivision ä Binary Box Decomposition (“kd-like”) tree representation for fast ~O(log n) queries: ä Efficiency by eliminating at each step/level a constant fraction of: ä Area (geometric consideration) ä Vertices (combinatorial consideration)


Download ppt "UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001 Lecture 8 Approximate Nearest Neighbor."

Similar presentations


Ads by Google