Concurrency Control for Machine Learning Joseph E. Gonzalez Post-doc, UC Berkeley AMPLab jegonzal@eecs.berkeley.edu In Collaboration with Xinghao Pan, Stefanie Jegelka, Tamara Broderick, Michael I. Jordan
Serial Machine Learning Algorithm Model Parameters Data
Parallel Machine Learning Model Parameters Data
Parallel Machine Learning ! ! Model Parameters Data Concurrency: more machines = less time Correctness: serial equivalence
Coordination-free Model Parameters Data
Concurrency Control Model Parameters Data
Serializability Model Parameters Data
Research Summary Coordination Free (e.g., Hogwild): Provably fast and correct under key assumptions. Concurrency Control (e.g., Mutual Exclusion): Provably correct and fast under key assumptions. Research Focus
? Optimistic Concurrency Control Mechanism for ensuring correctness Coordination- free Optimistic Concurrency Control ? High Conflicts are rare Low Mutual exclusion Stability & Correctness Low High
Optimistic Concurrency Control to parallelize: Non-Parametric Clustering and Sub-modular Maximization
Optimistic Concurrency Control ! ! Model Parameters Data Optimistic updates Validation: detect conflict Resolution: fix conflict Concurrency Correctness Hsiang-Tsung Kung and John T Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.
Example: Serial DP-means Clustering Sequential! Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of 23rd International Conference on Machine Learning, 2012.
Example: OCC DP-means Clustering Assumption No new cluster created nearby Validation Resolution First proposal wins
Optimistic Concurrency Control for DP-means Theorem: OCC DP-means is serializable. Corollary: OCC DP-means preserves theoretical properties of DP-means. Theorem: Expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set. Correctness Concurrency
~140 million data points; 1, 2, 4, 8 machines Evaluation: Amazon EC2 ~140 million data points; 1, 2, 4, 8 machines OCC DP-means Runtime Projected Linear Scaling
Sub-modular Maximization Summary Optimistic Concurrency Control to parallelize Non-Parametric Clustering Sub-modular Maximization Next
Motivating Example Bidding on Keywords: Keywords Common Queries Apple iPhone Android Games xBox Samsung Microwave Appliances Keywords “How big is Apple iPhone” “iPhone vs Android” “best Android and iPhone games” “Samsung sues Apple over iPhone” “Samsung Microwaves” “Appliance stores in SF” “Playing games on a Samsung TV” “xBox game of the year” Common Queries
Motivating Example Bidding on Keywords: Keywords Common Queries Apple iPhone Android Games xBox Samsung Microwave Appliances Keywords “How big is Apple iPhone” “iPhone vs Android” “best Android and iPhone games” “Samsung sues Apple over iPhone” “Samsung Microwaves” “Appliance stores in SF” “Playing games on a Samsung TV” “xBox game of the year” Common Queries Keywords Queries A 1 B 2 C 3 D 4 E 5 F 6 G 7 H 8
Motivating Example Bidding on Keywords: Keywords Queries $2 $5 $1 $4 A $3 $6 $5 $1 B 2 C 3 D 4 Costs Value E 5 F 6 G 7 H 8
Motivating Example $12 $5 $7 Bidding on Keywords: Revenue: - Cost: Queries $2 $5 $1 $4 A 1 $2 $4 $3 $6 $5 $1 $12 Revenue: Cover Purchase B 2 $5 - Cost: C 3 D 4 $7 Profit: Costs Value E 5 F 6 G 7 H 8
Motivating Example $12 $5 +1 $6 $7 Bidding on Keywords: Revenue: Queries $2 $5 $1 $4 A 1 $2 $4 $3 $6 $5 $1 $12 Cover Revenue: Purchase B 2 $5 +1 - Cost: Purchase C 3 D 4 $7 $6 Costs Value Profit: E 5 Submodularity = Diminishing Returns F 6 G 7 H 8
Motivating Example $20 $10 $10 Bidding on Keywords: Revenue: - Cost: Queries Purchase $20 $2 A 1 $2 Revenue: Purchase $5 B 2 $2 $10 - Cost: Purchase $1 C 3 $4 Purchase Costs Value $2 D 4 $4 $10 Profit: $5 E 5 $3 $1 F 6 $6 $4 G 7 $5 $2 H 8 $1
Motivating Example $20 +6 $10 - 4 $10 $20 Bidding on Keywords: Queries Purchase $20 +6 $2 A 1 $2 Revenue: $5 B 2 $2 $10 - 4 - Cost: Purchase $1 C 3 $4 Purchase Costs Value $2 D 4 $4 $10 $20 Profit: $5 E 5 $3 Purchase $1 F 6 $6 NP-Hard in General $4 G 7 $5 $2 H 8 $1
Submodular Maximization NP-Hard in General Buchbinder et al. [FOCS’12] proposed the double-greedy randomized algorithm which is provably optimal.
Double Greedy Algorithm Process keywords serially Set X Set Y Keywords Queries A 1 A A f( , X, Y ) = A B 2 B rand C 3 Add X Rem. Y 1 C D 4 D E 5 Keywords to purchase E F 6 F
Double Greedy Algorithm Process keywords serially Set X Set Y Keywords Queries A 1 A A f( , X, Y ) = B B 2 B rand C 3 Add X Rem. Y 1 C D 4 D E 5 Keywords to purchase E F 6 F
Double Greedy Algorithm Process keywords serially Set X Set Y Keywords Queries A 1 A A f( , X, Y ) = C B 2 rand C 3 Add X Rem. Y 1 C C D 4 D E 5 Keywords to purchase E F 6 F
Concurrency Control Double Greedy Algorithm Process keywords in parallel Set X Set Y Within each processor: Keywords Queries f( , Xbnd,Ybnd)= A A 1 A B 2 B Subset of true X Superset of true Y C 3 C Add X Rem. Y 1 Uncertainty D 4 D E 5 Keywords to purchase E F 6 F Sets X and Y are shared by all processors.
Concurrency Control Double Greedy Algorithm Process keywords in parallel Set X Set Y Within each processor: Keywords Queries f( , Xbnd,Ybnd)= A A 1 A A B 2 B Subset of true X Superset of true Y C 3 C rand rand Add X Rem. Y 1 Uncertainty D 4 D E 5 Keywords to purchase E Unsafe Must Validate F 6 F Safe Sets X and Y are shared by all processors.
Concurrency Control Double Greedy Algorithm System Design Implemented in multicore (shared memory): Model Server (Validator) Set X Set Y A C D E F Validation Queue Published Bounds (X,Y) Bound (X,Y) D Thread 1 f( , Xbnd,Ybnd)= Add X Rem. Y 1 D Uncertainty Trx. Add X D Bound (X,Y) E Thread 2 f( , Xbnd,Ybnd)= Add X Rem. Y 1 E Uncertainty Fail E
Provable Properties Theorem: CC double greedy is serializable. Corollary: CC double greedy preserves optimal approximation guarantee of ½OPT. Lemma: CC has bounded overhead. set cover with costs: 2τ sparse max cut: 2cτ/n Correctness Concurrency
Provable Properties – coord free? Theorem: CF double greedy is serializable. Lemma: CF double greedy achieves approximation guarantee of ½OPT – ¼ Lemma: CC has bounded overhead. set cover with costs: 2τ sparse max cut: 2cτ/n Correctness depends on uncertainty region similar order of CC overhead! Concurrency
Provable Properties – coord free? Theorem: CF double greedy is serializable. Lemma: CF double greedy achieves approximation guarantee of ½OPT – ¼ CF: no coordination overhead. Correctness depends on uncertainty region similar order of CC overhead! Concurrency
Early Results
Runtime and Strong-Scaling Concurrency Ctrl. Coordination Free IT-2004: Italian Web-graph (41M Vertices, 1.1B Edges) UK-2005: UK Web-graph (39M, 921M Edges) Arabic-2005: Arabic Web-graph (22M, 631M Edges)
Coordination and Guarantees Increase in Coordination Bad Decrease in Objective IT-2004: Italian Web-graph (41M Vertices, 1.1B Edges) UK-2005: UK Web-graph (39M, 921M Edges) Arabic-2005: Arabic Web-graph (22M, 631M Edges)
Summary New primitives for robust parallel algorithm design Exploit properties in ML algorithms Introduced parallel algorithms for: DP-Means Submodular Maximization Future Work: Integrate with Velox Model Server