Download presentation
Presentation is loading. Please wait.
1
Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff SODA 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A A
2
Unbounded Precision Insertion-only Streaming: Head of stream Seen points Unseen points Input:
3
Subspace Problem Find a j-subspace F : Euclidean Distance
4
Subspace Approximation Find a j-subspace such that PTAS:
5
Simple Cases 1-median PCA/SVD Machine Learning LSI, PageRank, HIITS Collaborative Filtering, Recommendation Systems Clustering k-median
6
Simple Cases Linear regression Nonlinear regression Shape-fitting
7
Known Before Coresets (Har-Peled) Dynamic Programming (Arora, Mitchell) d =O(1): Low-dimensions d =O(n): High-dimensions Dimensionality Reduction (Indyk, Rabani, …) d =O(1): Low-dimensions
8
Simple PTAS PTAS: Centroid Set:
9
PTAS Weak Coreset: PTAS:
10
Tools Weak Coreset Centroid Set
11
Coreset Construction Assumptions: d=2, j=1 Fix a 1-subspace (line): Have a 1-subspace (line): GOAL:
12
1 st Try Sampling u.a.r or even non-uniformly:
13
2 nd Try
16
Chernoff Bounds
17
Recursion 0
18 0
19
0 (0,n) 0
20 0
22
Strong Coreset Stream:
23
Centroid Set In time Centroid Set:
24
Centroid Set Construction
25
Bounded Precision Stream S: …., (i,j, -5), …, (i,j, +10), … : |S|=poly(n,M) A[i,j]-5A[i,j]+10 =A[n,d]
26
Bounded Precision 1-pass streaming algorithm Space: Time:
27
Open Problems Coreset size: Stream: PTAS: What other classes of Clustering?
28
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.