2IMA20 Algorithms for Geographic Data

Slides:



Advertisements
Similar presentations
Approximation algorithms for geometric intersection graphs.
Advertisements

NP-Hard Nattee Niparnan.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Label placement Rules, techniques. Labels on a map Text, name of map features No fixed geographical position Labels of point features (0-dim), line features.
The Theory of NP-Completeness
Polynomial Time Approximation Schemes Presented By: Leonid Barenboim Roee Weisbert.
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
CS21 Decidability and Tractability
Vertex Cover, Dominating set, Clique, Independent set
The Theory of NP-Completeness
88- 1 Chapter 8 The Theory of NP-Completeness P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class.
Label placement Rules, techniques. Labels on a map Text, name of map features No fixed geographical position Labels of point features (0-dim), line features.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Copyright © Cengage Learning. All rights reserved. 5 Integrals.
Hardness Results for Problems
1 The Theory of NP-Completeness 2 NP P NPC NP: Non-deterministic Polynomial P: Polynomial NPC: Non-deterministic Polynomial Complete P=NP? X = P.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Design Techniques for Approximation Algorithms and Approximation Classes.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Copyright © Cengage Learning. All rights reserved. 4 Integrals.
Approximation algorithms for TSP with neighborhoods in the plane R 郭秉鈞 R 林傳健.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
NPC.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 The Theory of NP-Completeness 2 Review: Finding lower bound by problem transformation Problem X reduces to problem Y (X  Y ) iff X can be solved by.
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
INTEGRALS 5. INTEGRALS In Chapter 3, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.
2IMA20 Algorithms for Geographic Data Spring 2016 Lecture 7: Labeling.
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
Copyright © Cengage Learning. All rights reserved.
The Theory of NP-Completeness
The NP class. NP-completeness
Chapter 10 NP-Complete Problems.
Copyright © Cengage Learning. All rights reserved.
8.3.2 Constant Distance Approximations
Introduction to Approximation Algorithms
Greedy Algorithms – Chapter 5
Richard Anderson Lecture 26 NP-Completeness
Richard Anderson Lecture 26 NP-Completeness
Richard Anderson Lecture 25 Min Cut Applications and NP-Completeness
Vertex Cover, Dominating set, Clique, Independent set
Chapter 5. Optimal Matchings
Greedy Algorithms / Interval Scheduling Yin Tat Lee
NP-Completeness Yin Tat Lee
Intro to Theory of Computation
Computability and Complexity
ICS 353: Design and Analysis of Algorithms
Richard Anderson Lecture 25 NP-Completeness
I. The Problem of Molding
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 8 NP and Computational Intractability
Chapter 11 Limitations of Algorithm Power
NP-Complete Problems.
NP-Completeness Yin Tat Lee
Complexity Theory in Practice
The Theory of NP-Completeness
CS21 Decidability and Tractability
Clustering.
Complexity Theory in Practice
Instructor: Aaron Roth
Instructor: Aaron Roth
Presentation transcript:

2IMA20 Algorithms for Geographic Data Fall 2017 Lecture 5: Static Map Labeling

Classic Map Labeling “Poor, sloppy, amateurish type placement is irresponsible; it spoils even the best image and impedes reading.” (E. Imhof ‘75) Great experience with manual map labeling in cartography. Some guidelines: Next to or above/below object Best: right or above Avoid overlaps and occlusion Unique graphical assignment … => Automatic label placement can be easily modeled as a computational geometry problem.

Classic Map Labeling “Poor, sloppy, amateurisch type placement is irresponsible; it spoils even the best image and impedes reading.” (E. Imhof ‘75) Great experience with manual map labeling in cartographzy. Some guidelines: Next to or above/below object Best: right or above Avoid overlaps and occlusion Unique graphical assignment … Many other properties contribute to the quality of a labeling: Font, color Bold/italics etc. Size, distance between characters Here: only geometric placement => Automatic label placement can be easily modeled as a computational geometry problem.

Geometric Labeling Models Given: Set of n points in the plane, for each point a label represented by a rectangle (bounding box) Objective: find a valid* labeling for a maximum size subset of the points such that no two labels intersect (MaxNumber) Or find a valid* labeling with all labels so that no two labels intersect and the font size is maximized (MaxSize) *What is a valid label labeling? discrete models slider models settlement settlement town town MaxNumber city hill maxSize city hill village village 1P 2P 4P 1S 2S 4S

Example 4P Sliding labels more powerful: above point labels, below sliding labels 4S Slider models allow up to 15% more labels than a 4-position model in realistic instances.

Part I Complexity

Complexity of Sliding Labels Goal: Show that MaxNumber is NP-complete in the 4-slider model. Idea: Reduction from a special geometric 3-Sat variant, Planar 3-Sat Reminder 3-Sat: Given Boolean formula 𝜑= 𝑐 1 ∧…∧ 𝑐 𝑚 in with three literals per clause 𝑐 𝑖 =𝑥∨𝑦∨𝑧, it is NP-complete to decide whether 𝜑 has a satisfying truth assignment. Def.: A 3-Sat formula 𝜑 is planar if the induced variable—clause graph H 𝜑 =(𝑉,𝐸) is planar: 𝑉: Set of clauses and variables 𝐸: occurrances of variables in clauses Theorem (Lichtenstein ‘82) The 3-Sat problem for planar Boolean formulas is NP-complete.

Drawing of 𝐻 𝜑 𝐻 𝜑 has a planar drawing where variables lie on the x-axis and clauses are E-shapes connecting to the variables from above or below. Idea for the reduction: Construct special point and label sets as gadgets for variables and clauses in the drawing such that 𝜑 is satisfiable all points in the instance can be labeled 𝑥 1 ∨ 𝑥 2 ∨ 𝑥 3 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 2 ∨ 𝑥 3 ∨ 𝑥 4

Variable gadget 4 points as corner of a square, slightly rotated, side length 1 2 +𝜖 Labels are squares of size 1 1 1 1 2 +𝜖 How could we label these four points?

Variable gadget 4 points as corner of a square, slightly rotated, side length 1 2 +𝜖 Labels are squares of size 1 Two fundamentally different possible labelings Encode truth values true (with some slack)

Variable gadget 4 points as corner of a square, slightly rotated, side length 1 2 +𝜖 Labels are squares of size 1 Two fundamentally different possible labelings Encode truth values true (with some slack)

Variable gadget 4 points as corner of a square, slightly rotated, side length 1 2 +𝜖 Labels are squares of size 1 Two fundamentally different possible labelings Encode truth values Variable gadgets form a zig-zag pattern false (almost tight)

Variable gadget 4 points as corner of a square, slightly rotated, side length 1 2 +𝜖 Labels are squares of size 1 Two fundamentally different possible labelings Encode truth values Variable gadgets form a zig-zag pattern false (almost tight)

Variable gadget 4 points as corner of a square, slightly rotated, side length 1 2 +𝜖 Labels are squares of size 1 Two fundamentally different possible labelings Encode truth values Variable gadgets form a zig-zag pattern true (with some slack)

Clause gadget Mimic the E-shapes of clauses from the drawing of 𝐻 𝜑 Working principle: False literals “push” label chain upwards True literals produce no pressure The labels in the center of the clause select a satisfied literal Selectors intersect ⇔ all literals false

Clause gadget Mimic the E-shapes of clauses from the drawing of 𝐻 𝜑 Working principle: False literals “push” label chain upwards True literals produce no pressure The labels in the center of the clause select a satisfied literal Selectors intersect ⇔ all literals false

Clause gadget Mimic the E-shapes of clauses from the drawing of 𝐻 𝜑 Working principle: False literals “push” label chain upwards True literals produce no pressure The labels in the center of the clause select a satisfied literal Selectors intersect ⇔ all literals false

Clause gadget Mimic the E-shapes of clauses from the drawing of 𝐻 𝜑 Working principle: False literals “push” label chain upwards True literals produce no pressure The labels in the center of the clause select a satisfied literal Selectors intersect ⇔ all literals false

Clause gadget Mimic the E-shapes of clauses from the drawing of 𝐻 𝜑 𝑎 𝑥 𝑐 𝑥 𝑎 𝑧 𝑐 𝑧 Attaching variables: 3 adapter points, depending on the sign of literal I True: label for 𝑏 𝑖 down False: label for 𝑏 𝑖 up 𝑏 𝑥 𝑏 𝑧 𝑥 𝑧

Clause gadget Mimic the E-shapes of clauses from the drawing of 𝐻 𝜑 𝑎 𝑥 𝑐 𝑥 𝑎 𝑧 𝑐 𝑧 Attaching variables: 3 adapter points, depending on the sign of literal I True: label for 𝑏 𝑖 down False: label for 𝑏 𝑖 up 𝑏 𝑥 𝑏 𝑧 𝑥 𝑧

Clause gadget Mimic the E-shapes of clauses from the drawing of 𝐻 𝜑 𝑎 𝑥 𝑐 𝑥 𝑎 𝑧 𝑐 𝑧 Attaching variables: 3 adapter points, depending on the sign of literal I True: label for 𝑏 𝑖 down False: label for 𝑏 𝑖 up 𝑏 𝑥 𝑏 𝑧 𝑥 𝑧

Complexity Theorem: (van Kreveld, Strijk, Wolff ‘99) It is NP-complete to decide whether in a given instance I in the 4-slider model it is possible to label all points, even if all labels are unit squares. Proof: By construction formula 𝜑 satisfiable ⇔ instance 𝐼 𝜑 can be labeled Size of 𝐼 𝜑 is quadratic in the number of clauses Why is this problem in NP?

Sliding vs. Fixed Positions Part II Sliding vs. Fixed Positions

Sliding labels vs. fixed position labels How many more points can potentially be labeled? Is there an efficient (heuristic) algorithm for label placement in the slider model? Labels: 4-position: 4 labels is optimum 2-slider or 4-slider: 6 labels is optimum

Sliding labels vs. fixed position labels Lemma For unit-size squares, the 2-slider model sometimes allows twice as many labels as the 4-position model, but never more than twice.

Sliding labels vs. fixed position labels Lemma For unit-size squares, the 2-slider model sometimes allows twice as many labels as the 4-position model, but never more than twice. Proof Consider optimal labeling in the 2-slider model: At least half of the labels intersect the odd or even lines. Slide these into corner position. ➨ Never more than twice as many

Sliding labels vs. fixed position labels 2-slider solution 1– 1– 4-position solution ➨ Sometimes twice as many

Sliding labels vs. fixed position labels The 2-slider model can sometimes label twice as many points as the 4-position model, but never more than twice The 4-slider model can sometimes label twice as many points as the 2-slider model, but never more than twice The 4-slider model can sometimes label 1½ times as many points as any fixed position model (… but we cannot label optimally in any model …)

Approximation Algorithms Part III Approximation Algorithms

Maximum non-intersecting subset 4-position: If the four label positions of a point intersect, then we get maximum non-intersecting subset of rectangles Also: maximum independent set in rectangle intersection graphs

Simple heuristics Assume labels have equal height but varying width Heuristic 1: choose any label, eliminate the intersecting candidates and repeat Approximation factor?

First heuristic Approximation factor: Θ(1/n)

Second heuristic Choose shortest label, eliminate the intersecting candidates and repeat Approximation factor?

Second heuristic Any chosen label can eliminate many candidate labels but every eliminated label contains a corner of the chosen label! The chosen label together with the intersected (eliminated) rectangles has an MIS of size  4 ➨ ¼-approximation (tight)

Factor- ½ approximation algorithm Assume labels have equal height but varying width A greedy, left to right placement gives a ½-approximation (to follow)

Greedy algorithm (4-position) Always choose the label with the leftmost right side Remove labels that cannot be placed anymore Repeat 4-position example Placed label Not place-able Not yet placed Not yet placed; leftmost right edge

Greedy algorithm (4-position) Reference point: lower left corner of label; each point feature has 4 reference points Use efficient data structures to maintain candidate reference points; only maintain candidates for which the label doesn’t intersect any placed label Reference points

Greedy algorithm (4-position) The algorithm discards all useless candidates immediately after a new label is placed Leftmost label Regions where no reference points can lie

Greedy algorithm (4-position) Heap data structure that stores all reference points sorted by: “x-coordinate + label width” ➨ To find leftmost candidate Priority search tree data structure that contains all reference points of candidates ➨ To find useless candidates after a placement

Greedy algorithm (4-position) Get reference point p with minimum “x-coordinate + label width” from the heap Place the label at reference point p Search in priority search tree for all reference points at which the label cannot be placed anymore Delete these from the heap and priority search tree No candidates here

Greedy (4-position), efficiency The data structures allow each candidate label position to be handled in O(log n) time Overall running time: O(n log n)

Greedy (4-position), approximation Factor ½-approximation: R What’s the maximum number of labels we could have placed among R and the intersecting label candidates?

Greedy (4-position), approximation cannot exist because R is leftmost non-chosen All candidates that intersect R contain the upper right or lower right corner of R Hence, the max. non-intersecting subset of R and the intersected candidates has size 2 We choose 1, so approximation is ½

Labels with varying heights Maximum non-intersecting subset in a set of axis-parallel rectangles Leftmost rectangle can intersect a large independent set ➨ heuristic gives approximation factor Θ(1/n)

A PTAS for label placement Fixed height rectangles; maximum independent set: polynomial time approximation scheme More precisely: for any integer k > 1, a (k/(k+1))-approximation in time O(n log n + n2k-1) 2/3-approx. in O(n3) time 3/4-approx. in O(n5) time 4/5-approx. in O(n7) time etc.

A PTAS for label placement Optimal algorithm if all rectangles intersect one horizontal line New ½-approximation algorithm Dynamic programming for optimal sub-solutions Shifting lemma to combine sub-solutions into a PTAS

PTAS for labels: one line Assume all rectangles intersect a horizontal line Greedy left to right (first one ending) gives optimal solution Note: equal height is not needed (height is irrelevant)

PTAS for labels: ½-approx. Assume labels have unit height Draw horizontal lines such that: Separation between any two lines is >1 Each line intersects at least one rectangle Each rectangle is intersected by some line Top to bottom, draw line before we “lose” a rectangle. Easy by sorting in O(n log n) time

PTAS for labels: ½-approx. Compute the MIS for the rectangles for each line Add the MIS for lines L1, L3, L5, … Add the MIS for lines L2, L4, L6, … Return the larger of the two MIS’s Top to bottom, draw line before we “lose” a rectangle. Easy by sorting in O(n log n) time. Observe that a rectangle in L1 and one in L3 cannot intersect.

PTAS for labels: ½-approx. Why a ½-approximation? the MIS for lines L1, L3, L5, … is optimal the MIS for lines L2, L4, L6, … is also optimal The pigeon-hole principle says that the lines L1, L3, L5, … or the lines L2, L4, L6, … must contain half the optimal MIS In other words, the real MIS must have at least half its rectangles on the odd lines or at least half its rectangles on the even lines. Our solution to the odd / even lines is at least as good.

PTAS for labels: two lines Can we compute a MIS for a set of rectangles intersected by two horizontal lines? Yes, using dynamic programming in O(n3) time

PTAS for labels: two lines Assuming this result, we compute OPT-MIS of L1 L2 and L2 L3 and L3 L4 and … L1 L2 L3 L4 L5 L6

PTAS for labels: two lines Note that the solution for L1 L2 and for L4 L5 cannot have intersections L1 L2 L3 L4 L5 L6

PTAS for labels: two lines Note that the solution for L1 L2 and for L4 L5 cannot have intersections L1 L2 L3 L4 L5 L6

PTAS for labels: two lines Note that the solution for L1 L2 and for L4 L5 cannot have intersections L1 L2 L3 L4 L5 L6

PTAS for labels: two lines Note that the solution for L1 L2 and for L4 L5 cannot have intersections L1 L2 L3 L4 L5 L6

PTAS for labels: two lines Note that the solution for L1 L2 and for L4 L5 cannot have intersections L1 L2 L3 L4 L5 L6

PTAS for labels: two lines Compute the OPT-MIS for L1 L2 , L4 L5 , L7 L8 , … Compute the OPT-MIS for L2 L3 , L5 L6 , L8 L9 , … Compute the OPT-MIS for L1 , L3 L4 , L6 L7 , … Choose the largest solution 1, 2, 3, 4, 5, 6, 7, 8, 9, … 1, 2, 3, 4, 5, 6, 7, 8, 9, … 1, 2, 3, 4, 5, 6, 7, 8, 9, …

PTAS for labels: two lines Consider the real MIS M Claim: M has at least 2/3 of its rectangles in one of the three solutions Why? Let Mi  M be those rectangles of M that intersect line Li The rectangles of any Mi are considered in two out of three sub-problems 1, 2, 3, 4, 5, 6, 7, 8, 9, … 1, 2, 3, 4, 5, 6, 7, 8, 9, … 1, 2, 3, 4, 5, 6, 7, 8, 9, …

PTAS for labels: two lines Our three solutions are optimal for L1 L2 , L4 L5 , L7 L8 , … so at least as large as |M1|+ |M2|+ |M4|+ |M5|+ |M7|+… |M1|+ |M2|+ |M4|+ |M5|+ |M7|+… ≤ |solution 1| |M2|+ |M3|+ |M5|+ |M6|+ |M8|+… ≤ |solution 2| |M1|+ |M3|+ |M4|+ |M6|+ |M7|+… ≤ |solution 3| + 2|M1|+2|M2|+2|M3|+2|M4|+ 2|M5|+2|M6|+2|M7|+… = 2|M| ≤ |solution 1| + |solution 2| + |solution 3|

PTAS for labels: two lines 2 |M| ≤ |solution 1| + |solution 2| + |solution 3| ➨ at least one of solutions 1, 2, and 3 must have size |solution i| ≥ 2 |M| / 3 (pigeon-hole principle) ➨ 2/3-approximation Recall we had a ½-approximation as the best one until now

PTAS for labels: k lines 1, 2, …, k, k+1, k+2, … , 2k+1, 2k+2, 2k+3, … 1, 2, …, k, k+1, k+2, … , 2k+1, 2k+2, 2k+3, … 1, 2, …, k, k+1, k+2, k+3, … , 2k+2, 2k+3, 2k+4, … . k +1 . . 1, 2, …, k, k+1, k+2, … , 2k+1, 2k+2, 2k+3, … Assume we can do k consecutive lines optimally The rectangles of every line Li are considered in k out of k +1 sub-problems

PTAS for labels: k lines We get k +1 solutions from k +1 sub-problems whose summed size is at least k |M| (= k times OPT) So one of the sub-problems gives a solution of size k / (k +1) ➨ (k / (k +1))-approximation for any integer k > 0 ➨ (1- ε)-approximation for any real ε > 0 Assume we can do k consecutive lines optimally

PTAS by optimal sub-solutions Shifting strategy (Hochbaum & Maass, 1985) Choose an integer k (for a (1  1/k)-approximation) Partition the problem into “narrow” sub-problems that can be solved optimally in time O(f (n, k)) (polynomial in n) and can be combined into one optimal solution to a “large” sub-problem Use a scheme of partitions into “narrow” sub-problems (each solution part must occur as candidate solution part in many of the partitions in the scheme) introduced for covering and packing for VLSI-design

Optimal labeling: 2 lines Normalize to integer coordinates Set up recurrence for optimal solution Use arrays to compute recurrence (to re-use solutions to sub-problems) ➨ dynamic programming

Optimal labeling: 2 lines Normalization Sort all left and right sides by x-coordinate Normalize them to 0, 1, 2, … Sort all bottom and top sides by y-coordinate 1 2 4 5 6 7 3 ➨ Note that normalization maintains intersections among rectangles: no new ones and none removed. Also note that normalization by y-coordinate destroys unit-height.

Optimal labeling: 2 lines Set up recurrence Define A(p, q, t) to be the optimal number of rectangles in a certain sub-region p q t

Optimal labeling: 2 lines Set up recurrence Define A(p, q, t) to be the optimal number of rectangles in a certain sub-region: left of green polyline p q t A(p, q, t) = 2

Optimal labeling: 2 lines Set up recurrence How to define A(p, q, t) expressed in A(., ., .) with smaller indices? Case 1 no rectangle ends at q and is below t ➨ A(p, q, t) = A(p, q -1, t) p t q -1 q

Optimal labeling: 2 lines Set up recurrence Case 2 a rectangle ends at q and is below t and is right of p ➨ A(p, q, t) = max { A(p, q -1, t) , 1 + A(p, r, t) } p q t r q-1 Coordinate r is chosen to be just left of the rectangle

Optimal labeling: 2 lines Set up recurrence Case 3 a rectangle ends at q and is below t and is not right of p ➨ A(p, q, t) = max { A(p, q -1, t) , 1 + A(p, r, u) } p q t r u q-1 Coordinate u is chosen to be just above the top of the rectangle

Optimal labeling: 2 lines Set up recurrence: A(p, q, t) depends on A(…) with smaller indices only, and the value is determined by 1 of 3 cases (maximizing a choice in 2) A(p, q, t) can be determined in O(1) time if we know all A(…) with smaller indices Note: If several rectangles end at q we must be a bit more careful Coordinate u is chosen to be just above the top of the rectangle

Optimal labeling: 2 lines Make array A[max-p, max-q, max-t] with ≤ n3 entries Fill A[…] bottom up in O(1) time per entry ➨ the optimal solution for 2 lines is computed in O(n3) time Total for 2/3-approximation is also O(n3) time

Optimal labeling: k lines Similar, but need a (2k -1)-dimensional array p1 t1 t2 tk-1 A(p1, …, pk, t1, …, tk-1) pk

Optimal labeling: k lines The (2k -1)-dimensional array has O(n2k-1) entries Each takes O(1) time to fill The approximation factor is k / (k+1) by the shifting strategy Running time is O(kn2k-1) Literature Label placement by maximum independent set in rectangles P. Agarwal, M. van Kreveld, and S. Suri. Computational Geometry: Theory and Applications, 11:209-218, 1998.