Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.

Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University tdameh@cs.sfu.ca March 29 th, 2010 Joint work with F. Gao, M. Hefeeda and W. Abdel-Majeed 1

Outline Introduction Motivation Local Sensitive Hashing Z and H Curves Affinity propagation Results 2

Introduction Machine learning Kernel-based methods require O(N 2 ) time and space complexities to compute and store non- sparse Gram matrices. We are developing methods to approximate the Gram matrix with a band matrix N points N*N Gram Matrix 3

Motivation Exact vs. Approximate Answer  Approximate might be good-enough and much- faster  Time-quality and memory trade-off  As machine learning point of view; we can live with bounded (controlled) error as long as we can run on large scale data where in normal ways we cann’t at all due to the memory usage. 4

Ideas of approximation To construct the approximated band matrix we evaluate the kernel function only between a fixed neighborhood around each point. This low rank method depends on the observation that the eigen-spectrum of the kernel function is a Radial Basis Function (real-valued function whose value depends only on the Euclidean distance ) (The most information is stored in the first of eigen vectors) 5

How to choose this neighborhood window? Since kernel function is monotonically decreasing with the Euclidian distance between the input points, so we can compute the kernel function only between close points. We should find a fast and reliable technique to order the points.  Space filling Curves: Z-Curve and H-Curve  Locality Sensitive Hashing 6

LSH: Motivation Similarity Search over large scale High-Dimensional Data Exact vs. Approximate Answer  Approximate might be good-enough and much-faster  Time-quality trade-off 7

LSH: Key idea Hash the data-point using several LSH functions so that probability of collision is higher for closer objects Algorithm: Input − Set of N points { p 1, …….. p n } − L ( number of hash tables ) Output − Hash tables T i, i = 1, 2, …. L Foreach i = 1, 2, …. L − Initialize T i with a random hash function g i (.) Foreach i = 1, 2, …. L Foreach j = 1, 2, …. N Store point p j on bucket g i (p j ) of hash table T i 8

LSH: Algorithm g 1 (p i )g 2 (p i )g L (p i ) TLTL T2T2 T1T1 pipi P 9

LSH: Analysis Family H of (r 1, r 2, p 1, p 2 )-sensitive functions, {h i (.)} − dist(p,q) < r 1  Prob H [h(q) = h(p)]  p 1 − dist(p,q)  r 2  Prob H [h(q) = h(p)]  p 2 − p 1 > p 2 and r 1 < r 2 LSH functions: g i (.) = { h 1 (.) …h k (.) } 10

Our approach N points  Hash the points using LSH functions family Compute the kernel function only between the points in same bucket (0 between points on different buckets) Using “m” size hash table we can achieve as best case O (N 2 /m) memory and computation 11

Validation Methods Low Level (matrix Level)  Frobenius Norm  Eigen spectrum High Level (Application Level)  Affinity Propagation  Support Vector Machines 12

Example i,k012345 0 -4-25-36-49 1 -26-37-50 2-4 -29-40-53 3-25-26-29 -4 4-36-37-40 5-49-50-53-4 i,k012 0 -4 1 2-4 i,k345 3 -4 4 5-4 S (i,k) S 0 (i,k)S 1 (i,k) 1 2 567 P0 P1 P2 P3P4P5 0P0 P1 P2 1P3 P4 P5 LSH FrobNorm (S) = 230.469 FrobNorm ( [S0 S1] ) = 217.853 13

Results Memory Usage All dataZ512Z1024Lsh3000Lsh5000 64 K4 G32 M64 M19 M18 M 128 K16 G64 M128 M77 M76 M 256 K64 G128 M256 M309 M304 M 512 K256 G256 M512 M1244 M1231 M 14

References [1] M. Hussein and W. Abd-Almageed, “Efficient band approximation of gram matrices for large scale kernel methods on gpus,” in Proc. of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SuperComputing’09), Portland, OR, November 2009. [2] A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” Communications of the ACM, vol. 51, no. 1, pp. 117–122, January 2008. [3] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976, 2007. 16

AP : Motivation No priori on the number of clusters Independence on initialization Processing time to achieve good performance 17

Affinity Propagation Take each data point as a node in the network Consider all data points as potential cluster centers Start the clustering with a similarity between pairs of data points Exchange messages between data points until the good cluster centers are found 18

Terminology and Notation Similarity s(i,k): single evidence of data k to be the exemplar for data I (Kernel function for the Gram matrix) 19

Responsibility r (i,k ): accumulated evidence of data k to be the exemplar for data i 20

Availability a ( i,k) : accumulated evidence of data i pick data k as the exemplar 21

flow chart 23

Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.

Similar presentations

Presentation on theme: "Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.

Similar presentations

Presentation on theme: "Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010."— Presentation transcript:

Similar presentations

About project

Feedback