Optimistic Concurrency Control for Distributed Learning Xinghao Pan Joseph E. Gonzalez Stefanie Jegelka Tamara Broderick Michael I. Jordan
Data Model Parameters Machine Learning Algorithm
Data Model Parameters Distributed Machine Learning
Data Model Parameters ! ! Distributed Machine Learning Concurrency: more machines = less time Correctness: serial equivalence
Data Model Parameters Coordination-free
Data Model Parameters Mutual Exclusion
Data Model Parameters Mutual Exclusion
Correctness Concurrency Coordination- free Mutual exclusion High LowHigh Low Optimistic Concurrency Control ?
Data Model Parameters Optimistic Concurrency Control Optimistic updates Validation : detect conflict Resolution : fix conflict ! ! Hsiang-Tsung Kung and John T Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, Concurrency Correctness
Optimistic Concurrency Control Application: Clustering Natural domain for parallelization K-means – popular algorithm Fixed number of clusters – not fit for Big Data Big Data solution: DP-means + OCC
Example
Example: K-means Bad!
Example: DP-means Correct clusters Sequential! Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of 23rd International Conference on Machine Learning, 2012.
OCC DP-means Validation Resolution
Evaluation: Amazon EC2 OCC DP-means Runtime Projected Linear Scaling 2x #machines ≈ ½x runtime ~140 million data points; 1, 2, 4, 8 machines
Optimistic Concurrency Control High concurrency: Conflicts rare Validation easy Resolution cheap OCCified Algorithms Online facility location BP-means: feature modeling Ongoing Stochastic gradient descent Collapsed Gibbs sampling
What can OCC do for you? See poster session! Optimistic Concurrency Control Big NIPS Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, and Michael I. Jordan. Optimistic concurrency control for distributed unsupervised learning. ArXiv e-prints arXiv: , 2013.