Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.

Similar presentations


Presentation on theme: "Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA."— Presentation transcript:

1 Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA

2

3 Which images are most similar?

4 centeredleftright Based on where there face is turned

5 male female It Based on gender

6 Distance Metric

7 Motivation-1 Distance metric is used in a variety of machine learning algorithms like: Classification and Regression eg. KNN Clustering eg. k-means Recommendation Systems Document/text retrieval ◦Find most similar fingerprints in DB to given sample ◦Find most similar web pages to document/keywords Nonlinear dimensionality reduction methods: ◦Isomap, Maximum Variance Unfolding, Laplacian Eigenmaps, etc

8 Motivation-2 Euclidean distance doesn’t work well with high dimensional data. Hence, there is a shift to Mahalanobis distance M. Mahalanobis distance M. M is a d-by-d matrix, when the feature dimension d is huge,the size of M quickly becomes intractable for a single machine e.g. if d contains 1 million features, then M contains 1 trillion parameters.

9 DML Problem

10 Why not shift to distributed platforms? DML requires substantial redesign of the original algorithm and optimization-friendly parallel communication strategy not supported bulk synchronization parallelism (BSP) adopted by Hadoop and Spark. Moreover, BSP model will prove to be costly for frequent inter-machine synchronization needed to keep the machines local views of M consistent with each other. In Bulk Synchronous Parallel (BSP) systems like Hadoop/Spark, workers must wait for each other at the end of every iteration.

11 Mahalanobis Distance M A Mahalanobis distance metric computes the distance between vectors x and y as: Here x, y are d dimensional feature vectors. M(to be learned) is a positive semidefinite matrix of dimensions d*d. When M is equal to the identity matrix, Eq. reduces to the Euclidean distance metric.

12 Class-Equivalence Side information

13 Move similarly labeled inputs together M Metric for DML

14 Move different labeled inputs apart M Metric for DML

15 Solution proposed

16 Distance Metric Learning (Xing)

17 Xing 2002 (contd)

18 Reformulation of DML

19 Slack Variable and Hinge Loss Slack variable ᶓ introduced to relax constraint. Using hinge loss, converted the above to unconstrained problem and use stochastic gradient for optimization

20 ARCHITECTURE

21

22 DATASETS USED Experiments on MNIST are done on machines each of which has 16 CPUs and 64G main memory. Experiments on ImageNet-63K and ImageNet-1M are performed on machines each of which has 64 CPUs and 128G main memory.

23 EXPERIMENTAL RESULTS (Convergence curves on MNIST Dataset)

24 EXPERIMENTAL RESULTS ( Convergence curves on ImageNET 63K Dataset )

25 EXPERIMENTAL RESULTS (Convergence curves on ImageNET 1M Dataset)

26 EXPERIMENTAL RESULTS (Speedup on MNIST Dataset)

27 EXPERIMENTAL RESULTS (Speedup on ImageNET 63K Dataset)

28 EXPERIMENTAL RESULTS (Speedup on ImageNET 1M Dataset)

29 EXPERIMENTAL RESULTS (Average Precision vs Running time on MNIST)

30 EXPERIMENTAL RESULTS (Precision Recall curves on MNIST)

31 EXPERIMENTAL RESULTS (Precision recall curves on ImageNET 1M)

32 CONCLUSIONS DML has been converted to unconstrained optimization problem which has eased its parallelization. It is a better way for DML on large data sets as now eigen decomposition no longer need to be done on M ie order O(d3).Now complexity is O(dk). Asynchronous stochastic gradient descent is used to update the parameters. Strength: This technique helped in solving distance metric problem in linear time as it made eigen value decomposition redundant.

33 My Thoughts The paper doesn’t describe how it will deal with worker failure. Incase a worker node fails, there is no way to track the local value of L. It doesn’t address the issue of parameter server failure. They have not included how they deal with queue overflow.


Download ppt "Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA."

Similar presentations


Ads by Google