Presentation is loading. Please wait.

Presentation is loading. Please wait.

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A A A A A AAA Fitting a Graph to Vector Data Samuel I. Daitch (Yale)

Similar presentations


Presentation on theme: "TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A A A A A AAA Fitting a Graph to Vector Data Samuel I. Daitch (Yale)"— Presentation transcript:

1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A A A A A AAA Fitting a Graph to Vector Data Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale) Machine Learning Summer School, June 2009

2 Given a collection of vectors Find a natural weighted graph on V = {1, …, n} that is 1.Theoretically well-motivated 2.Helps solve ML problems on the vectors 3.Has interesting combinatorial properties 4.Efficiently computable 5.Sparse

3 Given a collection of vectors Find a natural weighted graph on V = {1, …, n} that is 1.Theoretically well-motivated 2.Helps solve ML problems on the vectors 2’. Helps solve TCS problems on the vectors 3.Has interesting combinatorial properties 4.Efficiently computable 5.Sparse

4 Outline Standard graphs Motivation for ours Laplacian view Combinatorial properties (Sparsity & Planarity) Use in classification, regression and clustering Efficient computation Approximate sparsity Lovasz’s graphs Open Questions

5 Standard ways to make graphs Choice of edgesWeights of edges k nearest neighbors neighbors if distance less than threshold ± unweighted Choose one from each column

6 Our first proposal Choose weights to minimize where the weighted degrees satisfy

7 Our first proposal Choose weights to minimize where the weighted degrees satisfy.67.33.16.17

8 Motivation Consider regression problem solving for given Given a weighted graph, natural guess is Sum of squares of errors when leave out each i is

9 Motivation Try to get right on coordinate vectors, Sum of squares of errors over d coordinate vectors

10 Motivation Try to get right on coordinate vectors, Sum of squares of errors over d coordinate vectors Choose edge weights to minimize this.

11 Motivation Try to get right on coordinate vectors, Sum of squares of errors over d coordinate vectors Choose edge weights to minimize this. But, leads to a non-convex problem, and…

12 Motivation, tricky example Consider minimizing

13 Motivation, tricky example Consider minimizing 1 1 1 1 1

14 Motivation, tricky example Consider minimizing 1 1 1 1 1

15 Motivation, tricky example Consider minimizing

16 Motivation, tricky example Consider minimizing 1 1 1 1 1 ² ² ² ² ² Just better as ² goes to 0

17 Motivation, fixing bad example Sum squares of errors, by weighted degree

18 Motivation, fixing bad example Sum squares of errors, by weighted degree To avoid degeneracy, require either or (hard graph) ( ® -soft graph)

19 Example Random points in 2 dimensions (0.1-soft)

20 Example Two Clusters (0.1-soft)

21 Example (0.1-soft).21.06.37.06.45

22 Unique? Not necessarily: Not unique on highly symmetric point sets. (will see why later) But, almost always unique. Conjecture: Unique with probability 1 after infinitesimally small random perturbation.

23 Related to Locally linear embeddings [Roweis-Saul ‘ Chooses allowable neighbors, then chooses asymmetric weights to minimize the same objective function. K-means If restrict graph to be a union of complete graphs, one for each cluster, are minimizing same objective function.

24 In terms of the graph Laplacian where A is the weighted adjacency matrix, and D is the diagonal matrix of degrees.67.33.16.17 1 2 3 4 5

25 In terms of the graph Laplacian where A is the weighted adjacency matrix, and D is the diagonal matrix of degrees (term rejected was ) L is linear in edge weights, so is quadratic

26 Sparse Solution Can fix LX, or difference vectors and weighted degrees: Imposing only n(d+1) constraints on w. If more than n(d+1) non-zero entries, can make one zero, holding all these fixed, keeping w non-negative.

27 Sparsity Theorem: For n vectors in d dimensions, always is a graph with at most (d+1)n edges, average degree 2(d+1). Theorem (later) For n vectors in any dimension, is an ² -approx solution with average degree O(1/ ² 2 ).

28 Degrees on real data Data set Type n dim ave deg (hard) ave deg (soft) abalone regression4177813.112.7 glass 6 classes21498.98.6 heart 2 classes2701311.111 housing regression506138.810.1 ionosphere 2 classes351341311.9 iris 3 classes150477 machine regression20968.98 mpg regression39277.68.8 pima 2 classes768811.210.9 sonar 2 classes2086012.713 vehicle 4 classes8461811.811.5 vowel 11 classes9901010.410.1 wine 3 classes178139.19.9

29 Degrees on real data Data set Type n dim ave deg (hard) ave deg (soft) abalone regression4177813.112.7 glass 6 classes21498.98.6 heart 2 classes2701311.111 housing regression506138.810.1 ionosphere 2 classes351341311.9 iris 3 classes150477 machine regression20968.98 mpg regression39277.68.8 pima 2 classes768811.210.9 sonar 2 classes2086012.713 vehicle 4 classes8461811.811.5 vowel 11 classes9901010.410.1 wine 3 classes178139.19.9

30 Example for which graphs are not unique Consider set of vectors in {0,1} d of even parity. Sum of any two solutions is a solution, so sum over all isomorphisms of the point set Every vertex gets degree at least By previous theorem, is a solution of average degree at most Cannot have symmetry and sparsity

31 Planarity For every set of vectors in 2 dimensions, there is a hard and ® -soft graph that is planar no crossings

32 Planarity For every set of vectors in 2 dimensions, there is a hard and ® -soft graph that is planar no crossings And, no point contained in a triangle. In general dimension, is a simplicial complex.

33 Planarity, proof A graph with maximum sum of weighted degrees has no crossings. + + + + - - increase weights outside decrease weights on crossing edges Will fix LX and thus but increases all weighted degrees

34 Planarity, proof x0x0 x1x1 y0y0 y1y1 z Will fix LX and thus but increases all weighted degrees

35 Classification and Regression Experiments Given a set S of vectors with labels, compute z minimizing subject to With 2 clusters, use an indicator vector for one. With k clusters, use k indicator vectors. [ZGL ‘03]

36 Classification Experiments Used 10-fold cross validation We had no parameters to train. For other means of choosing graphs, chose their parameters by gridding over choices, and evaluating by leave-one-out cross-validation. Repeated 100 times

37 Classification experiments Data set n dimclasses 0.1-softknn thresh glass2149628.326.9233.3 heart27013217.8116.0516.1 ionosphere3513425.5718.56.34 iris150434.214.466.2 pima7688226.6124.5426.45 sonar2086028.6413.814.94 vehicle84618422.4727.729.98 vowel99010110.952.620.98 wine1781332.622.863.64 tried unweighted, and exponential weights varying ¾ varied k for knn, and ± for threshold

38 Classification experiments Data set n dimclasses 0.1-softknn threshlibsvm glass2149628.326.9233.331.44 heart27013217.8116.0516.117.01 ionosphere3513425.5718.56.346.2 iris150434.214.466.23.87 pima7688226.6124.5426.4523.24 sonar2086028.6413.814.9411.71 vehicle84618422.4727.729.9814.87 vowel99010110.952.620.980.64 wine1781332.622.863.642.57 For libsvm [ChangLin], chose params by 10-fold CV on training dat

39 Regression Error Data set n dim hard 0.1-soft knn thresh epsilon-svr abalone417780.4790.4820.4920.657Too long housing506130.1360.1380.2240.5070.138 machine20960.170.1850.1640.6080.394 mpg39270.120.1180.1370.1450.128 Data normalized to have variance 1. 2-fold CV on abalone, 10-fold otherwise

40 Clustering Experiments (0.1-soft) NJW: Ng-Jordan-Weiss, as implemented in Spider. Use graph, run k-means in Nvec eigenvectors. Data setndimk k-means NJWNvec = k glass214960.410.430.45 heart2701320.410.420.19 ionosphere3513420.290.33 iris150430.110.330.17 pima768820.340.35 sonar2086020.450.470.40 vehicle8461840.550.620.56 Vowel99010110.660.760.65 wine1781330.300.330.03

41 Clustering Experiments (0.1-soft) Data setndimk k-means NJWNvec = kNvec chosen glass214960.410.430.450.45 (12) heart2701320.410.420.190.19 (2) ionosphere3513420.290.33 0.09 (15) iris150430.110.330.170.09 (8) pima768820.340.35 0.35 (12) sonar2086020.450.470.40 0.35 (35) vehicle8461840.550.620.56 0.53 (6) Vowel99010110.660.760.650.60 (30) wine1781330.300.330.03 0.03 (3) NJW: Ng, Jordan, Weiss, as implemented in Spider. Use graph, run k-means in Nvec eigenvectors.

42 Choosing number vectors for spectral Project X into eigenvectors. Take vectors until all small coefficient. (Still fuzzy) Vector number (by ¸ ) projection Vector number (by ¸ )

43 Quadratic Program for Edge Weights 1 E V Signed vertex-edge adjacency matrix Diagonal matrix of edge weights

44 Objective function is quadratic is vector of edge weights

45 Soft graph program Minimize Subject to

46 Soft graph program Minimize Subject to Minimize Searching over ¹ until

47 Soft graph program, as NNLSQ Minimize s A non-negative least-squares problem.

48 Computing Soft graph quickly Minimize Look for solution among a subset of edges Given solution restricted to subset of edges check gradient of obj func wrt unused edges if all negative, finished otherwise, include some and solve again.

49 Computation Time (secs) Data set Type n dim soft time hard time abalone regression41778158249986 glass 6 classes2149822 heart 2 classes270131236 housing regression5061323114 ionosphere 2 classes3513472148 iris 3 classes1504417 machine regression20961126 mpg regression39271643 pima 2 classes7688111529 sonar 2 classes208603850 vehicle 4 classes84618159889 vowel 11 classes9901085706 wine 3 classes17813615

50 Approximate Sparse Solutions For every graph G exists a sparse graph with average degree at most 16/ ² 2 So that for all x, If not too small, Leave-one-out regression scores similiar

51 Approximate Sparse Solutions For every graph G exists a sparse graph with average degree at most 16/ ² 2 So that for all x, Want

52 Sparsification Theorem [Batson-S-Srivastava ‘09] For every graph G exists a sparse graph with at most 4 n / ² 2 edges Such that for all x,

53 Sparsification Theorem [Batson-S-Srivastava ‘09] For every graph G exists a sparse graph with at most 4 n / ² 2 edges Such that for all x, Edges Time [S-Srivastava ‘08] [S-Teng ‘04]

54 Lovasz’s Graphs Form a matrix M = P – A, P is positive diagonal A is non-negative (adjacency) MX = 0 (in fact, null( M ) = X ) M has one negative eigenvalue Works if points are on convex hull of polytope. A non-zero only for edges of polytope. “Steinitz representations and the Colin de Verdiere number”, 2000.

55 Open Questions Are there other natural and useful graphs? Are there interesting ways to modify/parameterize our construction? Are our graphs connected? Are our graphs unique (under perturbation)? Do there exist sparse approximate solutions in all dimensions? Can one sparsify for the square of the Laplacian? How to interpolate off the graph, or add points?


Download ppt "TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A A A A A AAA Fitting a Graph to Vector Data Samuel I. Daitch (Yale)"

Similar presentations


Ads by Google