2IMA20 Algorithms for Geographic Data Spring 2016 Lecture 7: Labeling
Label placement Utrecht Nieuwegein Houten Culemborg Zeist Positioning text with point, line, and area features on a map.
Cartographic criteria Utrecht Zeist Bunnik De Bilt Odijk Driebergen Doorn Placementno overlap no ambiguity readable non-obscuring Other criteria harmony in fonts good choice of color size of text corresponds to importance
The label placement problem Compute where the labels should be placed The label placement problem assumes that: font, text, typeface, color, etc., is given all map features other than labels are given only the position of the labels is not given
Point label placement Utrecht Zeist Set of n points to be labeled, each with a text, where a label is a rectangle (bounding box) Objective: label as many points as possible without overlap, where the label must have a corner at its point (NP-hard)
Point label placement Common: the 4-position or the 8-position model Solution method e.g. greedy, dynamic programming, simulated annealing, genetic, LP-based branch & cut 4-position 8-position
Sliding labels 4-slider model2-slider model The label may touch the point anywhere on its (top or bottom) boundary Not yet discretized
Example
How many more points can potentially be labeled? Is there an efficient (heuristic) algorithm for label placement in the slider model? Sliding labels vs. fixed position labels 4-position: 4 labels is optimum 2-slider or 4-slider: 6 labels is optimum
Sliding labels vs. fixed position labels Lemma For unit-size squares, the 2-slider model sometimes allows twice as many labels as the 4-position model, but never more than twice.
Sliding labels vs. fixed position labels Lemma For unit-size squares, the 2-slider model sometimes allows twice as many labels as the 4-position model, but never more than twice. Proof Consider optimal labeling in the 2-slider model: At least half of the labels intersect the odd or even lines. Slide these into corner position. ➨ Never more than twice as many
Sliding labels vs. fixed position labels 1–1– 1–1– 2-slider solution 4-position solution ➨ Sometimes twice as many
Sliding labels vs. fixed position labels The 2-slider model can sometimes label twice as many points as the 4-position model, but never more than twice The 4-slider model can sometimes label twice as many points as the 2-slider model, but never more than twice The 4-slider model can sometimes label 1½ times as many points as any fixed position model (… but we cannot label optimally in any model …)
Maximum non-intersecting subset 4-position: If the four label positions of a point intersect, then we get maximum non-intersecting subset of rectangles Also: maximum independent set in rectangle intersection graphs
Simple heuristics Assume labels have equal height but varying width Heuristic 1: choose any label, eliminate the intersecting candidates and repeat Approximation factor?
First heuristic Approximation factor: Θ(1/n)
Second heuristic Choose shortest label, eliminate the intersecting candidates and repeat Approximation factor?
Second heuristic Any chosen label can eliminate many candidate labels but every eliminated label contains a corner of the chosen label! The chosen label together with the intersected (eliminated) rectangles has an MIS of size 4 ➨ ¼-approximation (tight)
Factor- ½ approximation algorithm Assume labels have equal height but varying width A greedy, left to right placement gives a ½-approximation (to follow)
Greedy algorithm (4-position) Placed label Not place-able Not yet placed Not yet placed; leftmost right edge 4-position example 1. Always choose the label with the leftmost right side 2. Remove labels that cannot be placed anymore 3. Repeat
Greedy algorithm (4-position) Reference points Reference point: lower left corner of label; each point feature has 4 reference points Use efficient data structures to maintain candidate reference points; only maintain candidates for which the label doesn’t intersect any placed label
Greedy algorithm (4-position) Leftmost label Regions where no reference points can lie The algorithm discards all useless candidates immediately after a new label is placed
Greedy algorithm (4-position) Heap data structure that stores all reference points sorted by: “x-coordinate + label width” ➨ To find leftmost candidate Priority search treedata structure that contains all reference points of candidates ➨ To find useless candidates after a placement
1. Get reference point p with minimum “x-coordinate + label width” from the heap 2. Place the label at reference point p 3. Search in priority search tree for all reference points at which the label cannot be placed anymore 4. Delete these from the heap and priority search tree Greedy algorithm (4-position) No candidates here
Greedy (4-position), efficiency The data structures allow each candidate label position to be handled in O(log n) time Overall running time: O(n log n)
Greedy (4-position), approximation What’s the maximum number of labels we could have placed among R and the intersecting label candidates? R Factor ½-approximation:
Greedy (4-position), approximation R cannot exist because R is leftmost non-chosen All candidates that intersect R contain the upper right or lower right corner of R Hence, the max. non-intersecting subset of R and the intersected candidates has size 2 We choose 1, so approximation is ½
Maximum non-intersecting subset in a set of axis-parallel rectangles Labels with varying heights Leftmost rectangle can intersect a large independent set ➨ heuristic gives approximation factor Θ(1/n)
A PTAS for label placement Fixed height rectangles; maximum independent set: polynomial time approximation scheme More precisely: for any integer k > 1, a (k/(k+1))-approximation in time O(n log n + n 2k-1 ) 2/3-approx. in O(n 3 ) time 3/4-approx. in O(n 5 ) time 4/5-approx. in O(n 7 ) time etc.
A PTAS for label placement 1. Optimal algorithm if all rectangles intersect one horizontal line 2. New ½-approximation algorithm 3. Dynamic programming for optimal sub-solutions 4. Shifting lemma to combine sub-solutions into a PTAS
PTAS for labels: one line Assume all rectangles intersect a horizontal line Greedy left to right (first one ending) gives optimal solution Note: equal height is not needed (height is irrelevant)
PTAS for labels: ½-approx. Assume labels have unit height Draw horizontal lines such that: Separation between any two lines is >1 Each line intersects at least one rectangle Each rectangle is intersected by some line
PTAS for labels: ½-approx. 1. Compute the MIS for the rectangles for each line 2. Add the MIS for lines L 1, L 3, L 5, … 3. Add the MIS for lines L 2, L 4, L 6, … 4. Return the larger of the two MIS’s
PTAS for labels: ½-approx. Why a ½-approximation? the MIS for lines L 1, L 3, L 5, … is optimal the MIS for lines L 2, L 4, L 6, … is also optimal The pigeon-hole principle says that the lines L 1, L 3, L 5, … or the lines L 2, L 4, L 6, … must contain half the optimal MIS
PTAS for labels: two lines Yes, using dynamic programming in O(n 3 ) time Can we compute a MIS for a set of rectangles intersected by two horizontal lines?
PTAS for labels: two lines L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 Assuming this result, we compute OPT-MIS of L 1 L 2 and L 2 L 3 and L 3 L 4 and …
PTAS for labels: two lines L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 Note that the solution for L 1 L 2 and for L 4 L 5 cannot have intersections
PTAS for labels: two lines L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 Note that the solution for L 1 L 2 and for L 4 L 5 cannot have intersections
PTAS for labels: two lines L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 Note that the solution for L 1 L 2 and for L 4 L 5 cannot have intersections
PTAS for labels: two lines L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 Note that the solution for L 1 L 2 and for L 4 L 5 cannot have intersections
PTAS for labels: two lines L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 Note that the solution for L 1 L 2 and for L 4 L 5 cannot have intersections
PTAS for labels: two lines 1, 2, 3, 4, 5, 6, 7, 8, 9, … 1. Compute the OPT-MIS for L 1 L 2, L 4 L 5, L 7 L 8, … 2. Compute the OPT-MIS for L 2 L 3, L 5 L 6, L 8 L 9, … 3. Compute the OPT-MIS for L 1, L 3 L 4, L 6 L 7, … 4. Choose the largest solution
PTAS for labels: two lines 1, 2, 3, 4, 5, 6, 7, 8, 9, … Consider the real MIS M Claim: M has at least 2/3 of its rectangles in 1 of the 3 solutions Why? Let M i M be those rectangles of M that intersect line L i The rectangles of any M i are considered in 2 out of 3 sub-problems
Our 3 solutions are optimal for L 1 L 2, L 4 L 5, L 7 L 8, … so at least as large as |M 1 |+ |M 2 |+ |M 4 |+ |M 5 |+ |M 7 |+… PTAS for labels: two lines |M 1 |+ |M 2 |+ |M 4 |+ |M 5 |+ |M 7 |+… |M 2 |+ |M 3 |+ |M 5 |+ |M 6 |+ |M 8 |+… |M 1 |+ |M 3 |+ |M 4 |+ |M 6 |+ |M 7 |+… 2|M 1 |+2|M 2 |+2|M 3 |+2|M 4 |+ 2|M 5 |+2|M 6 |+2|M 7 |+… = 2|M| + ≤ |solution 1| ≤ |solution 2| ≤ |solution 3| ≤ |solution 1| + |solution 2| + |solution 3|
PTAS for labels: two lines 2 |M| ≤ |solution 1| + |solution 2| + |solution 3| ➨ at least one of solutions 1, 2, and 3 must have size |solution i| ≥ 2 |M| / 3 (pigeon-hole principle) ➨ 2/3-approximation
PTAS for labels: k lines 1, 2, …, k, k+1, k+2, …, 2k+1, 2k+2, 2k+3, … 1, 2, …, k, k+1, k+2, k+3, …, 2k+2, 2k+3, 2k+4, … 1, 2, …, k, k+1, k+2, …, 2k+1, 2k+2, 2k+3, … k The rectangles of every line L i are considered in k out of k +1 sub-problems
PTAS for labels: k lines We get k +1 solutions from k +1 sub-problems whose summed size is at least k |M| (= k times OPT) So one of the sub-problems gives a solution of size k / (k +1) ➨ (k / (k +1))-approximation for any integer k > 0 ➨ (1- ε)-approximation for any real ε > 0
PTAS by optimal sub-solutions Shifting strategy (Hochbaum & Maass, 1985) Choose an integer k (for a (1 1/k)-approximation) Partition the problem into “narrow” sub-problems that can be solved optimally in time O(f (n, k)) (polynomial in n) and can be combined into one optimal solution to a “large” sub-problem Use a scheme of partitions into “narrow” sub-problems (each solution part must occur as candidate solution part in many of the partitions in the scheme introduced for covering and packing for VLSI-design
Optimal labeling: 2 lines 1. Normalize to integer coordinates 2. Set up recurrence for optimal solution 3. Use arrays to compute recurrence (to re-use solutions to sub-problems) ➨ dynamic programming
Optimal labeling: 2 lines Normalization 1. Sort all left and right sides by x-coordinate 2. Normalize them to 0, 1, 2, … 3. Sort all bottom and top sides by y-coordinate 4. Normalize them to 0, 1, 2, … ➨
Optimal labeling: 2 lines p q t Set up recurrence Define A(p, q, t) to be the optimal number of rectangles in a certain sub-region
Optimal labeling: 2 lines p q t A(p, q, t) = 2 Set up recurrence Define A(p, q, t) to be the optimal number of rectangles in a certain sub-region: left of green polyline
Optimal labeling: 2 lines Set up recurrence How to define A(p, q, t) expressed in A(.,.,.) with smaller indices? Case 1 no rectangle ends at q and is below t ➨ A(p, q, t) = A(p, q -1, t) p q t q -1
Optimal labeling: 2 lines Set up recurrence Case 2 a rectangle ends at q and is below t and is right of p ➨ A(p, q, t) = max { A(p, q -1, t), 1 + A(p, r, t) } p q t rq-1
Optimal labeling: 2 lines Set up recurrence Case 3 a rectangle ends at q and is below t and is not right of p ➨ A(p, q, t) = max { A(p, q -1, t), 1 + A(p, r, u) } p q t r u q-1
Optimal labeling: 2 lines Set up recurrence: A(p, q, t) depends on A(…) with smaller indices only, and the value is determined by 1 of 3 cases (maximizing a choice in 2) A(p, q, t) can be determined in O(1) time if we know all A(…) with smaller indices Note: If several rectangles end at q we must be a bit more careful
Optimal labeling: 2 lines 1. Make array A[max-p, max-q, max-t] with ≤ n 3 entries 2. Fill A[…] bottom up in O(1) time per entry ➨ the optimal solution for 2 lines is computed in O(n 3 ) time Total for 2/3-approximation is also O(n 3 ) time
Similar, but need a (2k -1)-dimensional array Optimal labeling: k lines t k-1 t2t2 t1t1 p1p1 pkpk A(p 1, …, p k, t 1, …, t k-1 )
Optimal labeling: k lines The (2k -1)-dimensional array has O(n 2k-1 ) entries Each takes O(1) time to fill The approximation factor is k / (k+1) by the shifting strategy Running time is O(kn 2k-1 ) Literature Label placement by maximum independent set in rectangles P. Agarwal, M. van Kreveld, and S. Suri. Computational Geometry: Theory and Applications, 11: , 1998.