Download presentation
Presentation is loading. Please wait.
Published byElvin Walsh Modified over 8 years ago
1
Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams
2
Komplexitätstheorie und effiziente Algorithmen 2 Data streams Massive data set arriving sequentially Different ways of „arriving“ Examples Network traffic Query logs … Approach Find algorithms that make a single (a few) pass(es) and process data sequentially Introduction
3
Komplexitätstheorie und effiziente Algorithmen 3 Geometric data streams Massive sets of geometric objects arriving sequentially Objects are typically points Different form of arrival: - sequence of points - sequence of updates Questions Find ways to analyze the geometric structure of the input data using small space Introduction
4
Komplexitätstheorie und effiziente Algorithmen 4 Motivation Many computational tasks can be interpreted geometrically Geometric features may be useful in learning and classification Geometry plays an important role in the application Examples Learning Clustering How ‚clusterable‘ is a data set? Road traffic prediction Introduction
5
Komplexitätstheorie und effiziente Algorithmen 5 A basic learning problem We have two classes of objects Introduction
6
Komplexitätstheorie und effiziente Algorithmen 6 A basic learning problem We have two classes of objects Introduction
7
Komplexitätstheorie und effiziente Algorithmen 7 A basic learning problem We have two classes of objects We are given examples from both classes Introduction
8
Komplexitätstheorie und effiziente Algorithmen 8 A basic learning problem We have two classes of objects We are given examples from both classes Introduction
9
Komplexitätstheorie und effiziente Algorithmen 9 A basic learning problem We have two classes of objects We are given examples from both classes Learn from examples to which class future objects belong Introduction ?
10
Komplexitätstheorie und effiziente Algorithmen 10 A basic learning problem We have two classes of objects We are given examples from both classes Learn from examples to which class future objects belong Map object‘s description to Euclidean space Introduction ?
11
Komplexitätstheorie und effiziente Algorithmen 11 A basic learning problem We have two classes of objects We are given examples from both classes Learn from examples to which class future objects belong Map object‘s description to Euclidean space SVM approach Compute maximum margin hyperplane Classifiy points according to their side Introduction ?
12
Komplexitätstheorie und effiziente Algorithmen 12 SVM and SEB (smallest enclosing balls) Dual of certain SVM formulation is SEB [Tax, Duin, Pattern Recognition Letters, ‘99] Geometric streaming SEB can be used as SVM heuristic [Rai, Daume III, Venkatasubramanian, IJCAI‘09] Also: Coresets have been used to construct CSVMs [Tsang, Kwok, Cheung, Journal of Machine Learning Research, ’05] Introduction ?
13
Komplexitätstheorie und effiziente Algorithmen 13 Outline Merge & Reduce Embeddings into tree metrics Estimation of distribution of local neighborhoods Balanced partitions Approximating properties of balanced partitions Introduction
14
Komplexitätstheorie und effiziente Algorithmen 14 Insertion-only streams Sequence of points p,…, p from R Merge & Reduce 1 n d
15
Komplexitätstheorie und effiziente Algorithmen 15 Definition [k-median clustering] Given a weighted set P of points in R the k-median problem is to find a set C R of k points (centers) such that cost(P,C) = w min ||p-c|| is minimized, where w >0 is the weight of point p. Merge & Reduce d pPpP cCcC d p p
16
Komplexitätstheorie und effiziente Algorithmen 16 Coreset [Har-Peled, Mazumdar, STOC’04] A weighted point set S is a (k, )-coreset of a weighted point set P, if for every set C of k centers | cost(P,C) – cost(S,C) | cost(P,C). Merge & Reduce 3 3 3 3 3 4 4
17
Komplexitätstheorie und effiziente Algorithmen 17 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
18
Komplexitätstheorie und effiziente Algorithmen 18 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream Coreset
19
Komplexitätstheorie und effiziente Algorithmen 19 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
20
Komplexitätstheorie und effiziente Algorithmen 20 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream Coreset
21
Komplexitätstheorie und effiziente Algorithmen 21 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream Coreset of Union of Coreset
22
Komplexitätstheorie und effiziente Algorithmen 22 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
23
Komplexitätstheorie und effiziente Algorithmen 23 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
24
Komplexitätstheorie und effiziente Algorithmen 24 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
25
Komplexitätstheorie und effiziente Algorithmen 25 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
26
Komplexitätstheorie und effiziente Algorithmen 26 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
27
Komplexitätstheorie und effiziente Algorithmen 27 Observation Union of two (k, )-coresets is a (k, )-coreset Can compute coreset of a coreset Merge & Reduce … Input Stream
28
Komplexitätstheorie und effiziente Algorithmen 28 Coresets by pre-clustering [Guha, Mishra, Motwani, O‘Callaghan, FOCS’00; Har-Peled, Mazumdar, STOC’04; Frahling, S., STOC‘05] Compute a pre-clustering S with >k centers and cost(P,S) Opt Size exponential in d Merge & Reduce 3 3 3 3 3 4 4 k
29
Komplexitätstheorie und effiziente Algorithmen 29 Coresets by sampling [Chen, SICOMP’09; Feldman, Monemizadeh, S., SoCG‘07] Compute a random non-uniform sample Show that sample approximates all solutions from a net Size polynomial in d Merge & Reduce M M M/4
30
Komplexitätstheorie und effiziente Algorithmen 30 Coresets by reduction to 1D [Har-Peled, Kushal, DCG’07, Feldman, Fiat, Sharir, FOCS‘06] Uses geometric arguments to solve 1D Combine with preclusting using line centers For k-median: Size independent of n (but exponential in d) Merge & Reduce
31
Komplexitätstheorie und effiziente Algorithmen 31 Open problems Coresets for k-median of size independent of n and d ? (Partial result in [Feldman, Monemizadeh, S., SoCG’07] ) Coresets for k-median of size O(d/ ²) Coresets for k-median of size poly(d, log n)/ for constant c=c(d)>0 Coresets for j-subspace 1-median of size poly( , d, j, log n) ? Same questions for k-means objective function Remark: Open questions refer to the definition of coresets from this talk. Merge & Reduce 2-c
32
Komplexitätstheorie und effiziente Algorithmen 32 Insertion/deletion model Stream consists of Insert(p), Delete(p) operations Points are from {1,…, } Stream is consistent, i.e. no Delete(p), if p is not present and no Insert(p), if p is already present in the current set Geometric update streams d
33
Komplexitätstheorie und effiziente Algorithmen 33 Streaming algorithms via embeddings into tree metrics Embeddings in tree metrics p q r s t
34
Komplexitätstheorie und effiziente Algorithmen 34 Streaming algorithms via embeddings into tree metrics Embeddings in tree metrics p q r s t t s r p q
35
Komplexitätstheorie und effiziente Algorithmen 35 Streaming algorithms via embeddings into tree metrics Embeddings in tree metrics p q r s t t s r p q p q s r t 2 i 2 i 2 i 2 i
36
Komplexitätstheorie und effiziente Algorithmen 36 Streaming algorithms via embeddings into tree metrics Embeddings in tree metrics p q r s t t s r p q p q 2 i-1 2 i 2 i 2 i q p s r s t r 2 2 2
37
Komplexitätstheorie und effiziente Algorithmen 37 Streaming algorithms via embeddings into tree metrics Embeddings in tree metrics p q r s t t s r p q p q 2 i 2 i 2 i q p s r s t r 2 i-1 2 2 r s 2 i-2 2
38
Komplexitätstheorie und effiziente Algorithmen 38 Streaming algorithms via embeddings into tree metrics Embeddings in tree metrics p q r s t t s r p q p q 2 i 2 i 2 i q p s r s t r 2 i-1 2 2 r s 2 i-2 2
39
Komplexitätstheorie und effiziente Algorithmen 39 Streaming algorithms via embeddings into tree metrics Embeddings in tree metrics D(.,.) ||p-q|| D(p,q) E[D(p,q)] = O(log ) ||p-q|| [Bartal, FOCS’96; Charikar, Chekuri, Goel, Guha, Plotkin, FOCS’98] t s r p q p q 2 i 2 i 2 i q p s r s t r 2 i-1 2 2 r s 2 i-2 2
40
Komplexitätstheorie und effiziente Algorithmen 40 Estimator for cost of Euclidean minimum spanning tree (EMST) [Indyk, STOC’04] Write EMST for cost of EMST Write MST for cost of minimum spanning tree of tree metric D E[MST ] = O(log ) EMST (linearity of expectation) Use cost of MST of D as estimator Streaming algorithms via embeddings into tree metrics D D
41
Komplexitätstheorie und effiziente Algorithmen 41 Observation [Indyk, STOC’04] The MST of D(.,.) is given by the tree defining the tree metric #edges of length 2 = #non-empty cells in corresponding grid Streaming algorithms via embeddings into tree metrics p q r s t t s r p q p q s r t 2 i 2 i 2 i i 2 i
42
Komplexitätstheorie und effiziente Algorithmen 42 Euclidean minimum spanning tree 1. Use O(log nested grids G(i) with side length 2 2. for each grid 3. approximate |G(i)| := #nonempty cells in G(i) using F sketch 4. return 2 |G(i)| Theorem [Indyk, STOC’04] The above algorithm computes a O(log )-approximation to the cost of the minimum spanning tree. Streaming algorithms via embeddings into tree metrics i i 0
43
Komplexitätstheorie und effiziente Algorithmen 43 Streaming algorithms via embeddings into tree metrics Results using a similar approach [Indyk, STOC’04] Earth mover‘s distance O(log ) Facility location O(log² ) Matching O(log ) k-MedianO(1) 1+ with huge extraction time Problem Approx. factor
44
Komplexitätstheorie und effiziente Algorithmen 44 Streaming algorithms via estimating the distribution of local neighborhoods Distribution of neighborhoods Grids G(i) as before R-neighborhood of C: cells within distance at most R from C m (i) is number of points in i-th cell of the R-neighborhood of C 123 45678 910111213 1415161718 192021 C,R A cell and its 2-neighborhood
45
Komplexitätstheorie und effiziente Algorithmen 45 Streaming algorithms via estimating the distribution of local neighborhoods EMST estimator Define Z (i) = ( m (i) > 0 ) EMST can be approximated from the Z (i) Approx. ratio goes to 1 as R goes to C,R
46
Komplexitätstheorie und effiziente Algorithmen 46 Streaming algorithms via estimating the distribution of local neighborhoods EMST estimator K: Size of R-neighborhood Z are functions from {1,…,K} to {0,1} Random (nonempty) C defines distribution over neighborhoods, i.e. over functions Z:{1,…,K} {0,1} Can still estimate EMST from this distribution C,R
47
Komplexitätstheorie und effiziente Algorithmen 47 Algorithm Sample a certain number of nonempty grid cells and maintain number of points for each cell in their neighborhood Sample gives estimation of the distribution of the Z (.) Obtain estimation for EMST from estimated distribution Theorem [Frahling, Indyk, S., IJCGA’07] Let >0, d be constants.The cost of a Euclidean minimum spanning tree of a point set in R given as an update stream can be estimated with a factor of 1 using polylog( ) space. Streaming algorithms via estimating the distribution of local neighborhoods C,R d
48
Komplexitätstheorie und effiziente Algorithmen 48 Open Problems (1+ )-approximation for matching and/or earth mover‘s distance Other problems? Approach is not very well understood General characterization of problems solvable via approximation of the distribution of local neighborhoods Streaming algorithms via estimating the distribution of local neighborhoods
49
Komplexitätstheorie und effiziente Algorithmen 49 Estimating the distribution [Frahling, S., STOC’05] Divide space into regions For each region maintain #points inside Balance „error“ among regions Notion of error depends on problem Example 1-Median in 1D Error cell width #points in cell Streaming algorithms via balanced partitions
50
Komplexitätstheorie und effiziente Algorithmen 50 Small space? Problem dependent Need to show that decomposition in few regions with sufficiently small error exists Streaming algorithms via balanced partitions
51
Komplexitätstheorie und effiziente Algorithmen 51 One approach [Frahling, S., STOC’05] Nested grids G(i) For each grid maintain cells intersected by random sample (sample sizes differ for different grids) #sample points inside cell -> #points inside cell Combine cells from different grids to space decomposition Streaming algorithms via balanced partitions
52
Komplexitätstheorie und effiziente Algorithmen 52 Works for k-median k-means MaxTSP, MaxMatching, Maximum spanning tree, Average distance, MaxCut Why? Require proof for k-median and k-means Last 5 problems can be reduced to 1-median Streaming algorithms via balanced partitions
53
Komplexitätstheorie und effiziente Algorithmen 53 Approximating properties of balanced partitions [Lammersen, S., ESA‘08] Previous approach may lead to many regions Example: facility location Can approximate properties of balanced partitions, e.g. #regions Only gives approximation of cost of solution More details in Christiane‘s talk Streaming algorithms via approximation of balanced partitions
54
Komplexitätstheorie und effiziente Algorithmen 54 Open problems Min-sum-k-clustering Other problems? Streaming algorithms via balanced partitions
55
Komplexitätstheorie und effiziente Algorithmen 55 (Some) Techniques in geometric streaming: Merge & Reduce Embeddings into tree metrics Estimation of distribution of local neighborhoods Balanced partitions Approximating properties of balanced partitions And lots of open problems to work on… Summary
56
Komplexitätstheorie und effiziente Algorithmen 56 Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.