UMass Lowell Computer Science Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001 Lecture 8 Approximate Nearest Neighbor Searching Derandomization for Efficient Geometric Partitioning Monday, 4/30/01
Part 2 Advanced Topics ApplicationsManufacturingModeling/Graphics Wireless Networks VisualizationTechniques(de)RandomizationApproximationRobustnessRepresentationsEpsilon-net Decomposition tree
Literature for Part II
Approximate Nearest Neighbor Searching “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions” Arya, Mount, Netanyahu, Silverman, Wu
Goals ä Fast nearest neighbor query in d- dimensional set of n points: ä approximate nearest neighbor distance within factor of (1+ ) of true closest neighbor ä preprocess using O(dnlogn) time, O(dn) space ä Balanced-Box Decomposition (BBD) tree note that space, time are indepenent of query in O(c d, logn) time C++ code for simplified version is at
Approach: Distance Assumptions ä Use L p (also called Minkowski) metric ä assume it can be computed in O(d) time ä pth root need not be computed when comparing distances ä Approximate nearest neighbor distance within factor of (1+ ) of true closest neighbor p* Can change or metric without rebuilding data structure
Approach: Overview ä Preprocess points to create: ä Balanced-Box Decomposition (BBD) tree ä Query algorithm: for query point q ä Locate leaf cell containing q in O(log n) time ä Priority search: Enumerate leaf cells in increasing distance order from q ä For each leaf cell, calculate distance from q to cell’s point ä Keep track of closest point p seen so far Stop when distance from q to leaf > dist(q,p)/(1+ ) ä Return p as approximate nearest neighbor to q.
Balanced Box Decomposition (BBD) Tree ä Similar to kd-tree [Samet handout] ä Binary tree ä Tree structure stored in main memory ä Cutting planes orthogonal to axes ä “Alternating” dimensions ä O(log n) height ä Subdivides space into regions of O(d) complexity using d- dimensional rectangles ä Can be built in O(dn log n) time x1x1 x4x4 x3x3 y1y1 x2x2 y2y2 y3y3 One possible kd-like tree for the above points (not a BBD tree, though) y1y1 x1x1 x1x1 2 7 y2y2 1 5 y2y2 y3y3 x2x2 8 9 x3x3 x4x < >=
Balanced Box Decomposition (BBD) Tree (continued) ä Distinguishing features of BBD tree: ä Cell is either ä d-dimensional rectangle or ä difference of 2 d-dimensional nested rectangles ä In this sense, BBD tree is like: ä Optimized kd-tree: partition points into roughly = sized sets [inner box shrink] ä While descending in tree, number of points on path decreases exponentially ä Specialized Quadtree: aspect ratio of box is bounded by a constant [hyperplane split] ä While descending in tree, size of region on path decreases exponentially ä Leaf may be associated with more than 1 point in/on cell: O(n) node ä Inner boxes are “sticky”: if it is close to edge, it “sticks” subdivision tree split shrink
Midpoint Algorithm for Splitting/ Shrinking ä Split box b using hyperplane through center of b and orthogonal to ith coordinate axis (longest dimension) ä Bounds aspect ratio ä Centroid shrink: produce O(1) subcells, each with <= 2n c /3 points [n c =# pts in current cell] ä 3-stage: shrink, split, shrink single-stage simplified shrink 3-stage shrink, split, shrink what’s wrong with this approach?
Middle-Interval Algorithm for Splitting/ Shrinking ä Flexibility for splitting plane choice ä Choose plane from a central strip of current outer box
ä Each subdivision cell satisfies this packing constraint: ä Proof has 2 cases: ä Overlapping boxes ä Disjoint boxes: ä Box of side 2r encloses ball of radius r ä Aspect ratio 3:1 implies smallest side length >= s/3 ä Densest packing given by regular grid of boxes of side length s/3 ä Interval of length 2r can intersect no more than intervals ä Account for all dimensions by raising to power d Packing Constraint Given a BBD-tree for a set of data points in R d, the number of leaf cells of size at least s>0 intersecting a (Minkowski L m ) open ball of radius r>0 is at most
ä Visit boxes in increasing order of distance from q ä Similar to kd-tree priority search ä Maintain priority queue of tree nodes ä Node priority inversely related to dist(q,cell) ä Search repeats: ä Extract highest priority node ä Descend subtree ä visit leaf closest to q ä add siblings to queue Priority Search from Query Point node closest to query point At start, root + v 1, v 2, v 3, v 4 are in priority queue
ä Maintain sum of appropriate powers of coordinate differences between query point and nearest point of outer box ä Incrementally update distance from parent box to each child when split is performed: ä Closer child has same distance as parent ä Further child’s distance needs only 1-coordinate update (along splitting dimension) ä Can make a difference in higher dimensions! Incremental, Relative Distance [Arya, Mount93] L 1 distance yTyTyTyT yByByByB (x L + x 1, y T - y 1 ) xLxLxLxL xRxRxRxR (x L + x 2, y B + y 2 ) (x R - x 3, y T - y 3 ) (x R - x 4, y B + y 4 ) yTyTyTyT yByByByB (x M - x’ 1, y T - y 1 ) xLxLxLxL xRxRxRxR (x L + x 2, y B + y 2 ) (x R - x 3, y T - y 3 ) (x M + x’ 4, y B + y 4 ) xMxMxMxM
Experiments Experiments generated points from a variety of probability distributions: UniformGaussianLaplaceCorrelated Gaussian Correlated LaplacianClustered GaussianClustered Segments
Experiments
Conclusions ä Algorithm is not necessarily practical for large dimensions ä But, for dimensions <= ~20, does well ä Shrinking helps with highly clustered datasets, but was not often needed in their experiments ä Only needed for 5-20% of tree nodes ä BBD tree (in paper’s form) is primarily for static point set ä But, auxiliary data structure could maintain changes
Derandomization for Efficient Geometric Partitioning “ Bounded-Independence Derandomization of Geometric Partitioning with Applications to Parallel Fixed-Dimensional Linear Programming ” Goodrich, Ramos
Overview ä Paper concerns geometric partitioning: ä Given: ä a collection X of n hyperplanes in R d ä a parameter r ä Partitioning Goal: ä partition R d into O(r d ) constant-sized cells ä so that each cell intersects few hyperplanes ä Previous Work: Random sampling -> partition in which each cell intersects at most n hyperplanes, where =logr/r ä Derandomization can be used for deterministic construction ä Current Work: ä Assume set is a special space with a special property ä For such a set, construct (efficiently, deterministically, and in parallel) a (small-sized) approximation for the space ä Apply to efficiently & deterministically solve parallel fixed-dimensional linear programming For other Goodrich papers, see
Background: Derandomization ä Common approach for randomized geometric algorithms: ä use small-sized random samples ä Derandomize: ä quantify combinatorial properties of the random samples ä show that sets with these properties can be constructed efficiently without randomization ä Combinatorial properties often characterized by what the next long series of slides is about….
Background: Configuration ä Given an abstract set (universe) N of geometric objects A configuration over N is a pair (D,L) = (D( ),L( )), where D, L are disjoint subsets of N ä Objects in D are: triggers associated with objects that define d( ) = cardinality of D( ) = degree ä Objects in L are: stoppers associated with objects that conflict with l( ) = cardinality of L( ) = level = (absolute) conflict size Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Example h h h h set of line segments in the plane is feasible if occurs in trapezoidal decomposition H(R) for some subset R of N ä trapezoids arising in incremental computation of H(N) Here, R = { h h } Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley H(R) segments in R segments in N \ R h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4 h1h1h1h1 h2h2h2h2 h4h4h4h4 N h3h3h3h3
Background: Configuration Example For a feasible trapezoid define its: trigger set D( ) = segments of N adjacent to boundary of conflict set L( ) = segments of N \ D( ) intersecting Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley H(R) segments in R segments in N \ R h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4 Configuration (D( ), L( )) where D( )={h 3, h 4 } and L( )={h 1, h 2 }
Background: Configuration Space A configuration space (N) over N is a (multi)set of configurations with the ä Bounded Degree Property: The degree of each configuration in (N) is bounded (by a constant -- something independent of N) Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley Note: The term configuration space is also used in motion planning. In that context, is refers to the motion planning search space.
Background: Configuration Example Associate with each feasible a configuration (D( ), L( )) If N in general position, d( ) = cardinality of D( ) <= 4 since is a trapezoid Due to bounded degree d( ), result (N) is a configuration space of all feasible trapezoids over N H(R) segments in R segments in N \ R h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4 D( )={h 3, h 4 } L( )={h 1, h 2 } Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley D( )={h 3, h 4 } L( )={0} configurations for feasible trapezoids and
Background: Configuration Example If we restrict N to be { h h then , are 2 feasible trapezoids D( )= D( )={h 3, h 4 } L( )= L( )={0} ä 2 “distinct” configurations: (D( ), L( )) = (D( ), L( )) Size of (N) includes such “duplicate” configurations Reduced Size of (N) excludes “duplicates” 2 feasible trapezoids for N = { h h h3h3h3h3 h4h4h4h4 Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Example ä Note that not every arrangement of line segments (before overlaying a trapezoidal decomposition on it) has the bounded degree property. In general, it can have d( ) = O(N) ä Can you think of another type of decomposition that has bounded degree? Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley h1h1h1h1 h2h2h2h2 h3h3h3h3 h4h4h4h4 h5h5h5h5
Background: Configuration Example Definition: i (N) is set of configurations in (N) with level i [recall level is size of L( ), the conflict set] 0 (N) is active over N. Example: 0 (N) = {(D( ), L( )), (D( ), L( ))} 2 feasible trapezoids for N = { h h h3h3h3h3 h4h4h4h4 Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Example Definition: A configuration space (N) has bounded valence if the number of configurations in (N) sharing the same trigger set is bounded (by a constant). Example: For (N) = our configuration space of all feasible trapezoids over N has bounded valence ä all feasible trapezoids with same trigger set can be identified with trapezoids in trapezoidal decomposition formed by that trigger set ä size of that trigger set is bounded by a constant, so number of such trapezoids is also bounded by a constant Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley Trapezoidal decomposition induced by trigger set = {h 3, h 4 } h3h3h3h3 h4h4h4h4
Background: Configuration Example ä Theorem: ä Let: (N) be a configuration space of bounded valence ä n=size of N d = maximum degree of a configuration in (N) ä R = a random sample of N of size r ä Then: For each active configuration in 0 (R) ä with probability > 1/2 the conflict size of relative to N is <= c(n/r) log r for large enough c Expected reduced size: E[reduced size of (R)] is in O(r d ) ä Example: For any random sample R of N of size r ä each trapezoid in the trapezoidal decomposition H(R) has O([n/r] log r) conflict size with high probability size of (R) is in O(r d ) for bounded (N) size and reduced size only differ by constant factor Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
ä Definition: Let: (N) be a configuration space ä n=size of N ’(r) be maximum reduced size function of (N) for r <= n (N) has bounded dimension if there is a constant d such that ’(r) is in O(r d ) for all r <= n In this case, d is the dimension of (N) ounded valence -> bounded dimension ä Some important types of configuration spaces don’t have bounded valence but have bounded dimension ä Range space: configuration space for which trigger set of every configuration is empty. In this case, a configuration is a range. Half-space Range: Range = points in halfspace. (N) set of distinct ranges induced by (upper) halfspaces. Dualize -> line arrangement [it has bounded dimension] Background: Range Space Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: -net of a Range Space Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley ä Theorem: ä If: (N) is a configuration space of bounded dimension d >= 0 ä R is a random subset of N ä formed via r independent draws from N with replacement r >= 8/ ä then: conflict size of each configuration in (R) in (relative to N) of every range in (R) n with probability at least 1- 2 ’(2r) 2 - r/2 For a range space (N) R is an -net of the range space (N) for large enough r, a random sample of size r is an -net with high probability
Background: VC-Dimension of a Range Space Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley ä Use to bound dimension when direct argument fails Let (N) be a range space A subset M of N is shattered by N if every subset of M occurs as a range in (M) Reduced size of (M) is 2 m VC-Dimension of (N) is maximum size of a shattered subset of N. N=set of 1D points Example: (N)=space of ranges induced by rightwards half-spaces h1h1h1h1 h2h2h2h2 p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 p5p5p5p5 p6p6p6p6 p7p7p7p7 h3h3h3h3 h4h4h4h4 h5h5h5h5 h6h6h6h6 h7h7h7h7 h8h8h8h8 What is the VC-Dimension of (N)?
What does all this have to do with the paper???? ä Paper concerns geometric partitioning: ä Given: ä a collection X of n hyperplanes in R d ä a parameter r ä Goal: ä partition R d into O(r d ) constant-sized cells ä so that each cell intersects few hyperplanes ä Previous Work: Random sampling -> partition in which each cell intersects at most n hyperplanes, where =logr/r ä Derandomization can be used for deterministic construction ä Current Work: ä Assume set is a range space with bounded VC-exponent ä VC-exponent is more general concept than VC-dimension For such a set, construct (efficiently, deterministicaly, and in parallel) a (small-sized) approximation for the range space that is a variation on the -net concept.
Additional Handouts ä Parallel programming ä PRAM CREW, EREW models ä Parallel geometric algorithms
Project Update
Project ProposalMonday, 4/92% Interim ReportMonday, 4/235% Final PresentationMonday, 5/78% Final SubmissionMonday, 5/1410% 25% of course grade DeliverableDue Date Grade %
Guidelines: Presentation ä 1/2 hour class presentation ä Explain to the class what you did ä Structure it any way you like! ä Some ideas: ä slides (electronic or transparency) ä demo ä handouts
Guidelines: Final Submission ä Abstract: Concise overview (at most 1 page) ä Introduction: ä Motivation: Why did you choose this project? ä Related Work: Context with respect to CG literature ä Summary of Results ä Main Body of Paper: (one or more sections) ä Conclusion: ä Summary: What did you accomplish? ä Future Work: What would you do if you had more time? ä References: Bibliography (papers, books that you used) Well- written final submissions with research content may be eligible for publishing as UMass Lowell CS technical reports.
Guidelines: Final Submission ä Main Body of Paper: ä If your project involves Theory/ Algorithm: ä Informal algorithm description (& example) ä Pseudocode ä Analysis: ä Correctness ä Solutions generated by algorithm are correct ä account for degenerate/boundary/special cases ä If a correct solution exists, algorithm finds it ä Control structures (loops, recursions,...) terminate correctly ä Asymptotic Running Time and/or Space Usage
Guidelines: Final Submission ä Main Body of Paper: ä If your project involves Implementation: ä Informal description ä Resources & Environment: ä what language did you code in? ä what existing code did you use? (software libraries, etc.) ä what equipment did you use? (machine, OS, compiler) ä Assumptions ä parameter values ä Test cases ä tables, figures ä representative examples
Final Exam
Final Exam: Date, Format ä Date Choices ä Friday, 18 May at ä 1:00-4:00 pm or ä 5:30-8:30 pm ä Wednesday, 23 May at ä 9:00-12:00 am or ä 1:00-4:00 pm or ä 5:30-8:30 pm or ä Format: ä in class ä open book, notes ä similar to midterm: ä 50% calculate/ manipulate ä 50% design, analyze
Final Exam: Part I Material ä O’Rourke CH 1-8: emphasis on chapters omitted from midterm (CH 7-8) ä Some key themes ä Common geometric/combinatorial structures: ä Decomposition/Partition: ä Triangulation ä Trapezoidalization ä Delaunay Triangulation ä Voronoi Diagram ä Arrangment (level, zone) ä Enclosure: ä Convex Hull ä Nested Polytope Hierarchy ä Visibility Polygon & Kernel of Star Polygon
Final Exam: Part I Material ä Some key themes (continued) ä Algorithmic Paradigms ä Sweep: sort, then sweep a line, parabolic front ä Divide-and-Conquer ä Incremental ä Randomized ä Output-Sensitive ä Preprocessing for fast queries ä Representations: ä Quad-edge ä O’Rourke ä Geometric Primitives
Final Exam: Part I Material ä Some key themes (continued) ä Math: ä Convexity ä Monotonicity ä Distance Metrics ä Visibility/ Star-shapedness ä Euler’s Formula ä Duality ä Graphs ä Point Line ä Parabolic ä Minkowski Sum ä Randomness ä Graph Theory: Independent Set
Final Exam: Part II Material ä Part II ä Translational Polygon Containment ä Connected Dominating Sets for Wireless Networks ä Mesh Generation using Delaunay Triangulation ä Approximate Nearest Neighbor Searching ä Derandomization for Efficient Geometric Partitioning
Final Exam: Part II Material ä Translational Polygon Containment ä Minkowski Sum properties & use for intersection ä Distance metrics: L p, convex ä Linear programming: Basic model formulation ä Linear equations representing constraints & objective function ä Convex feasible region ä Types of variables ä Connected Dominating Sets for Wireless Networks ä Dominating set of an undirected graph ä Basic algorithm + 2 rules
Final Exam: Part II Material ä Mesh Generation using Delaunay Triangulation ä Incremental Delaunay Triangulation Algorithm ä Edge flipping ä Constrained Delaunay Triangulation ä Mesh refinement ä Robustness: awareness of general issues ä Derandomization for Efficient Geometric Partitioning ä Configurations, configuration spaces, conflict and trigger sets
Final Exam: Part II Material ä Approximate Nearest Neighbor Searching ä Distance computations for high dimensions: ä approximate distance & stopping criterion ä compare without taking roots ä incremental for subdivision ä Binary Box Decomposition (“kd-like”) tree representation for fast ~O(log n) queries: ä Efficiency by eliminating at each step/level a constant fraction of: ä Area (geometric consideration) ä Vertices (combinatorial consideration)