Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University.

Similar presentations


Presentation on theme: "Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University."— Presentation transcript:

1 Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University

2 Shuchi Chawla, Carnegie Mellon University 2 Two classes of Graph Optimization problems  Optimization problems on graphs arise in many fields  Typically NP-hard  We consider two classes of problems motivated by machine learning and AI:  Path-planning – Construct a “good” path, given a map  Clustering – Divide objects into groups based on similarity

3 Path-planning Problems

4 Shuchi Chawla, Carnegie Mellon University 4 A Robot Navigation Problem  Task: Deliver packages to certain locations  Faster delivery => greater happiness; “reward”  Want a path with short length and large reward  Classic formulation – Traveling Salesman Find the shortest tour covering all locations  Some complicating constraints Limited battery power – robot may die before finishing task Packages have different deadlines for delivery Preference to the larger reward packages  An alternate formulation – Orienteering Construct a path of length · D Visit as many locations (reward) as possible

5 Shuchi Chawla, Carnegie Mellon University 5 Path-planning in the real-world: Motivation  Given graph (metric) G, construct a path satisfying some constraints and optimizing some function.  Some applications: RoboticsAssembly analysis ManufacturingProduction planning  A trade-off between time and reward maximize reward with bounded length minimize length with reward quota some combination of both

6 Shuchi Chawla, Carnegie Mellon University 6 A time-reward trade-off  Impose a reward quota and minimize length Metric TSP Collect all points k-Path Collect at least k reward  Budget the path-length and maximize reward Orienteering Hard bound on path length Time Window Visit node v within [R v, D v ]  Optimize a combination of reward and length Prize Collecting TSP Min (length + reward left) Discounted Reward TSP max reward; reward decreases with time

7 Shuchi Chawla, Carnegie Mellon University 7 A time-reward trade-off  Impose a reward quota and minimize length Metric TSP 1.5 [Christofides 76] k-Path 2 +  [Chaudhury Godfrey Rao+ 03]  Budget the path-length and maximize reward Orienteering 3 Time Window 3log 2 n [Bansal Blum C Meyerson 04]  Optimize a combination of reward and length Prize Collecting TSP 2 [Goemans Williamson 95] Discounted Reward TSP 6.75 +  [Blum C Karger+ 03] [Blum C Karger Meyerson Minkoff Lane 03]

8 Shuchi Chawla, Carnegie Mellon University 8 Orienteering and k-Path  Orienteering : length · D ; maximize reward  k-Path : reward ¸ k ; minimize length  Complementary problems  Series of results on k-TSP (related to k-Path) [BRV99] [Garg99] [AK00] [CGRT03] … best approx: (2+  )  None for Orienteering until recently!

9 Shuchi Chawla, Carnegie Mellon University 9 Why is Orienteering difficult?  First attempt – Use distance-based approximations to approximate reward  Let OPT(d) = max achievable reward with length d  A 2-approx for distance implies that ALG(d) ≥ OPT(d/2)  However, we may have OPT(d/2) << OPT(d)  Bad trade-off between distance and reward! s OPT(d) APPROX

10 Shuchi Chawla, Carnegie Mellon University 10 Why is Orienteering difficult?  Second attempt – approximate subparts of the optimal path and shortcut other parts  If we stray away from the optimal path by a lot, we may not be able to cover reward that’s far away  Approximate the “extra” length taken by a path over the shortest path length s t OPT APPROX Min-Excess Path Problem

11 Shuchi Chawla, Carnegie Mellon University 11 The Min-Excess Problem  Given graph G, start and end nodes s, t, reward on nodes  v, target reward k, find a path that collects reward at least k and minimizes  (P) = ℓ(P) – d(s,t)  At optimality, this is exactly the same as the k-path objective of minimizing ℓ (P)  However, approximation is different: Min-excess is strictly harder than K-path  We give a (2+  )-approximation for Min-Excess [Blum, C, Karger, Meyerson, Minkoff, Lane, FOCS’03]  Our algorithm returns a path with length d(s,t) + (2+  )  (P) excess

12 Shuchi Chawla, Carnegie Mellon University 12 A 3-approximation to Orienteering  There exists a path from s to t, that collects reward at least  has length  D  Given a 3-approximation to min-excess: 1. Divide into 3 “equal-reward” parts (hypothetically) 2. Approximate the part with the smallest excess  3-approximation to orienteering s t Excess of one path · (  1 +  2 +  3 )/ 3 Can afford an excess up to (  1 +  2 +  3 ) 11 22 33 Excess of path P  (P) = d P (u,v)– d(u,v)  Using an r-approx for Min-excess ( r  Z + ), we get an r-approximation for s-t Orienteering v1v1 v2v2 OPT APPROX Open: Given an r-approx for min-excess (r 2 R + ), can we get r-approx to Orienteering? [Blum C Karger + 03]

13 Shuchi Chawla, Carnegie Mellon University 13 The next step: Deadline-TSP  Every vertex has a deadline D(v); Find a path that maximizes nodes v visited before D(v)  Arises in scheduling, production planning  If the last node on the path has the min deadline, use Orienteering to approximate the reward Don’t need to bother about deadlines of other nodes  Does OPT always have a large subpath with the above property?  There are many subpaths of OPT with the above property that together contain all the reward NO! [Bansal Blum C Meyerson 04]

14 Shuchi Chawla, Carnegie Mellon University 14 A segmentation of OPT Time Deadline

15 Shuchi Chawla, Carnegie Mellon University 15 Deadline-TSP  Segment graph into many parts, approximate each using Orienteering and patch them together How do we find such a segmentation without knowing the optimal path? In order to avoid double-counting of reward, segments should be node-disjoint  Our result – There exists a segmentation based only on deadlines, such that the resulting solution is a (3 log n)- approximation Open: Is there a segmentation based on other properties (eg. distance from the root), that gives a constant approximation?

16 Shuchi Chawla, Carnegie Mellon University 16 An overview of our results Time-Window Problem3 log 2 n ApproximationProblem Discounted-Reward TSP Orienteering3 References [STOC 04] [FOCS 03] 6.75+  Deadline TSP3 logn [STOC 04] Min-Excess2+  [FOCS 03] Time-Window Problem - bicriteria reward: log 1/  deadlines: 1+  [STOC 04]

17 Shuchi Chawla, Carnegie Mellon University 17 Future Directions  Better approximations can we get a constant factor for Time-Windows? special metrics such as trees or planar graphs hardness of approximation?  Asymmetric Path-planning the graph is directed; still obeys triangle inequality polylog-approximations and lower bounds for distance need entirely different ideas for asymmetric-Orienteering is it log-hard?  Group Path-planning Reward is associated with “groups” of nodes visit at least one node in a group to obtain reward

18 Shuchi Chawla, Carnegie Mellon University 18 Future Directions  Stochastic Path-planning Closer home to Robot Navigation; The graph is a Markov Decision Process Each edge is an “action” associated with a probability distribution  The goal: Give a “strategy” to accomplish a given task as fast as possible Best action could be history dependent Can we write down the best strategy in polynomial time? Approximate it in poly-time or even in NP? 0.2 0.7 0.1 0.3 0.2 0.5

19 Correlation Clustering Coming up next :

20 Shuchi Chawla, Carnegie Mellon University 20 Natural Language Processing  In order to understand the article automatically, need to figure out which entities are one and the same  Is “his” in the second line the same person as “The secretary” in the first line?

21 Shuchi Chawla, Carnegie Mellon University 21 Real-World Clustering Problems  A wide variety of clustering problems Co-reference Analysis Web document clustering Co-authorship (Citeseer/DBLP) Computer Vision  Typical characteristics: No well-defined “similarity metric” Number of clusters is unknown No predefined topics – desirable to figure them out as part of the algorithm

22 Shuchi Chawla, Carnegie Mellon University 22 Cohen, McCallum & Richman’s idea Mr. Rumsfield his he Saddam Hussein Strong similarity Strong dissimilarity The secretary “Learn” a similarity measure based on context

23 Shuchi Chawla, Carnegie Mellon University 23 Consistent clustering: edges inside clusters edges between clusters Mr. Rumsfield his he Saddam Hussein The secretary Strong similarity Strong dissimilarity A good clustering

24 Shuchi Chawla, Carnegie Mellon University 24 Inconsistencies or “mistakes” Strong similarity Strong dissimilarity A good clustering Mr. Rumsfield his he Saddam Hussein The secretary Consistent clustering: edges inside clusters edges between clusters

25 Shuchi Chawla, Carnegie Mellon University 25 A good clustering Mistakes No consistent clustering! Goal: Find the most consistent clustering Strong similarity Strong dissimilarity Mr. Rumsfield his he Saddam Hussein The secretary

26 Shuchi Chawla, Carnegie Mellon University 26 Correlation Clustering  Given a graph with positive (similar) and negative (dissimilar) edges, find the most consistent clustering  NP-hard [Bansal, Blum, C, FOCS’02]  Two natural objectives – Maximize agreements (# of +ve inside clusters) + (# of –ve between clusters) Minimize disagreements (# of +ve between clusters) + (# of –ve inside clusters)  Equivalent at optimality, but different in terms of approximation

27 Shuchi Chawla, Carnegie Mellon University 27 Overview of results Weighted graphs Unweighted (complete) graphs Max Agree Min Disagree 17433 [Bansal Blum C 02] 4 [Charikar Guruswami Wirth 03] PTAS [Bansal Blum C 02] 1.3048 O(log n) [CGW 03] 1.3044 [Swamy 04] [Immorlica Demaine 03] [Charikar Guruswami Wirth 03] [Emanuel Fiat 03] 116/11529/28 [CGW 03] APX-hard [CGW 03]

28 Shuchi Chawla, Carnegie Mellon University 28 Minimizing Disagreements [Bansal, Blum, C, FOCS’02]  Goal: approximately minimize number of “mistakes”  Assumption: The graph is unweighted and complete  A lower bound on OPT : Erroneous Triangles Consider + - Any clustering disagrees with at least one of these edges + “Erroneous Triangle” If several edge-disjoint erroneous ∆s, then any clustering makes a mistake on each one D opt  Maximum fractional packing of erroneous triangles

29 Shuchi Chawla, Carnegie Mellon University 29 Using the lower bound:  -clean clusters  Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to erroneous triangles  “clean” clusters each vertex has few disagreements incident on it few is relative to the size of the cluster # of disagreements · ¼ # of erroneous triangles “good” vertex “bad” vertex Clean cluster  All vertices are good

30 Shuchi Chawla, Carnegie Mellon University 30 Using the lower bound:  -clean clusters  Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to erroneous triangles   -clean clusters each vertex in cluster C has fewer than  |C| positive and  |C| negative mistakes # of disagreements · ¼ # of erroneous triangles  A high density of positive edges We can easily spot them in the graph  Possible solution: Find a  -clean clustering, and charge disagreements to erroneous triangles  Caveat: It may not exist

31 Shuchi Chawla, Carnegie Mellon University 31 Using the lower bound:  -clean clusters  We show:  an almost-  -clean clustering that is almost as good as OPT Nice structure helps us find it easily.  Caveat: A  -clean clustering may not exist  An almost-  -clean clustering: All clusters are either  -clean or contain a single node  An almost  -clean clustering always exists – trivially OPT(  )

32 Shuchi Chawla, Carnegie Mellon University 32 OPT(  ) – clean or singleton Optimal Clustering Imaginary Procedure OPT(  ) : All clusters are  -clean or singleton “bad” vertices Few new mistakes

33 Shuchi Chawla, Carnegie Mellon University 33 Finding clean clusters OPT(  ) ALG Clean clusters Charging-off mistakes 1. Mistakes among clean clusters - charge to erron. ∆s 2. Mistakes among singletons - no more than corresponding mistakes in OPT(  )

34 Shuchi Chawla, Carnegie Mellon University 34 A summary of results Weighted graphs Unweighted (complete) graphs Max Agree Min Disagree 17433 [Bansal Blum C 02] 4 [Charikar Guruswami Wirth 03] PTAS [Bansal Blum C 02] 1.3048 O(log n) [CGW 03] 1.3044 [Swamy 04] [Immorlica Demaine 03] [Charikar Guruswami Wirth 03] [Emanuel Fiat 03] 116/11529/28 [CGW 03] APX-hard [CGW 03]

35 Shuchi Chawla, Carnegie Mellon University 35 Future Directions  Better combinatorial approximation The current best algorithms have a large running time -- employ an LP with O(n 2 ) variables  Improving the lower bound: Erroneous cycles – one negative edge and remaining positive The gap of this lower bound is between 2 and 4 [Charikar Guruswami Wirth 03] Can we obtain a 2-approximation?  A good “iterative” approximation on few changes to the graph, quickly recompute a good clustering

36 Shuchi Chawla, Carnegie Mellon University 36 Future Directions  Clustering with small clusters Given that all clusters in OPT have size at most k, find a good approximation Is this NP-hard? Different from finding best clustering with small clusters, without guarantee on OPT  Clustering with few clusters Given that OPT has at most k clusters, find an approximation  Maximizing Correlation number of agreements – number of disagreements Can we get a constant factor approximation?

37 Shuchi Chawla, Carnegie Mellon University 37 Timeline  Plan to finish in a year Summer 04Stochastic/Time-dependent path-planning Clustering with constraints Fall 04Asymmetric/group path-planning Combinatorial/streaming algo for clustering Spring 05Wrap-up; writing; job search!

38 Questions?

39 Shuchi Chawla, Carnegie Mellon University 39 Lower Bounding Idea: Erroneous Triangles If several edge-disjoint erroneous ∆s, then any clustering makes a mistake on each one D opt  Maximum fractional packing of erroneous triangles 1 43 2 5 2 Edge disjoint erroneous triangles (1,2,3), (1,4,5) + - + 3 mistakes

40 Shuchi Chawla, Carnegie Mellon University 40 Open Problems  Clustering with small clusters In most applications, clusters are very small Given that all clusters in OPT have size at most k, find a good approximation Different from finding best clustering with small clusters, without guarantee on OPT  Optimal solution for unweighted graphs? A possible approach… Any two vertices in the same cluster in OPT are neighbors or share a common neighbor. We can find a list O(n2 k ) clusters, such that all OPT’s clusters are in this list When k is small, only polynomially many choices to pick from

41 Shuchi Chawla, Carnegie Mellon University 41 Open Problems  Clustering with few clusters Given that OPT has at most k clusters, find an approximation  Consensus clustering Given a “sum” of k clusterings; find best “consensus” clustering easy 2-approximation; can we get a PTAS?  Maximizing Correlation number of agreements – number of disagreements bad case: # of disagree = constant fraction of total weight Charikar & Wirth obtained a constant factor approximation Can we get a PTAS in unweighted graphs?

42 Shuchi Chawla, Carnegie Mellon University 42 2+  [Chaudhury Godfrey Rao+ 03] 1.5 [Christofides 76] An overview of results 6.75+  3 log 2 n ApproxProblem 2 3 References [Blum C Karger+ 03] [Bansal Blum C Meyerson 04] [Goemans Williamson 95] [Blum C Karger+ 03] Metric TSP k-Path Orienteering Time Window Prize Collecting TSP Discounted Reward TSP

43 Shuchi Chawla, Carnegie Mellon University 43 Why is Orienteering difficult?  First attempt – Use distance-based approximations to approximate reward  Idea – Modify the algorithm itself  Doesn’t help – moat-growing goes for shallow fruit  Orienteering is inherently harder; Perturbation of the input changes the output widely s OPT(d) APPROX

44 Shuchi Chawla, Carnegie Mellon University 44 Path-planning in the real-world; Motivations  Given graph (metric) G, construct a path satisfying some constraints and optimizing some function.  Possible constraints: root, destination s, t rewards on nodes; reward quota  v ;  time windows for nodes(R v, D v )  Objective functions: maximize reward; minimize length; some combination of both  Some applications: RoboticsAssembly analysis ManufacturingProduction planning

45 Shuchi Chawla, Carnegie Mellon University 45 2+  [Chaudhury Godfrey Rao+ 03] 1.5 [Christofides 76] An overview of results 6.75+  3 log 2 n ApproxProblem 2 3 References [Blum C Karger+ 03] [Bansal Blum C Meyerson 04] [Goemans Williamson 95] [Blum C Karger+ 03] Metric TSP k-Path Orienteering Time Window Prize Collecting TSP Discounted Reward TSP

46 Shuchi Chawla, Carnegie Mellon University 46 From Deadlines to Time-Windows  Nodes have deadlines as well as release times  Note that release times are dual to deadlines – if we look at the reverse path from the end to the start, release times become deadlines!  Log-approx for deadlines  log-approx for release dates  Algorithm for Time-Windows: Run the approximation for Deadline-TSP Replace Orienteering by Orienteering with release-dates  O(log 2 n)-approximation for the Time-Window problem Open: log-approx for Time-Windows based on log-approx for Deadlines s t OPT ℓ(OPT) = L v Require ℓ(s,v)  R(v)  ℓ(t,v)  L-R(v) st D(v) = L-R(v) [Bansal Blum C Meyerson 04]

47 Shuchi Chawla, Carnegie Mellon University 47 Open Problems  Asymmetric Path-planning The graph is directed; still obeys triangle inequality Directed-TSP studied extensively – O(log n) approx polylog-approx for directed Steiner tree? Only quasi-poly-time log-approx known [CharikarChekuri+99] need entirely different ideas for asymmetric-Orienteering transformations from general graphs to trees don’t work Is this log-hard?  Group Path-planning Reward is associated with “groups” of nodes visit at least one node in a group to obtain reward Unlike group-tree problems, group-path problems do not reduce to directed versions

48 Shuchi Chawla, Carnegie Mellon University 48 Future Directions  Stochastic Path-planning closer home to Robot Navigation; The graph is a Markov Decision Process Each edge is an “action” associated with a probability distribution  Path-planning on time-varying topologies travel costs change over time eg. traffic patterns; changing environment assume full-knowledge – we know the metric at any point of time in the future

49 Shuchi Chawla, Carnegie Mellon University 49 Future Directions  Path-planning on time-varying topologies travel costs change over time eg. traffic patterns; changing environment assume full-knowledge – we know the metric at any point of time in the future  Time-dependent Orienteering – special cases Bounded ratio – the length of every edge changes by at most a constant factor over time Few different metrics – there are at most k different metrics; the topology cycles between these Can we get a 3k-approximation? Bounded number of changes – the topology changes at most k timesEasy to get a 3k-approximation


Download ppt "Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University."

Similar presentations


Ads by Google