Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab

Slides:



Advertisements
Similar presentations
Concurrency Control for Machine Learning
Advertisements

1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.
Recap: Mining association rules from large datasets
Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
Generalization and Specialization of Kernelization Daniel Lokshtanov.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Daniel Golovin and Andreas Krause.
Parallel Double Greedy Submodular Maxmization Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael I. Jordan.
Approximating Maximum Subgraphs Without Short Cycles Guy Kortsarz Join work with Michael Langberg and Zeev Nutov.
Maximizing the Spread of Influence through a Social Network
Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.
Variational Inference in Bayesian Submodular Models
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Efficient Informative Sensing using Multiple Robots
A Decentralised Coordination Algorithm for Maximising Sensor Coverage in Large Sensor Networks Ruben Stranders, Alex Rogers and Nicholas R. Jennings School.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
Minimizing general submodular functions
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Optimistic Concurrency Control for Distributed Learning Xinghao Pan Joseph E. Gonzalez Stefanie Jegelka Tamara Broderick Michael I. Jordan.
Submodular Maximization with Cardinality Constraints Moran Feldman Based On Submodular Maximization with Cardinality Constraints. Niv Buchbinder, Moran.
Maximizing Symmetric Submodular Functions Moran Feldman EPFL.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
Independent Cascade Model and Linear Threshold Model
Greedy & Heuristic algorithms in Influence Maximization
Monitoring rivers and lakes [IJCAI ‘07]
Sathya Ronak Alisha Zach Devin Josh
Near-optimal Observation Selection using Submodular Functions
Minimum Dominating Set Approximation in Graphs of Bounded Arboricity
Moran Feldman The Open University of Israel
Maximal Independent Set
Distributed Submodular Maximization in Massive Datasets
Independent Cascade Model and Linear Threshold Model
Localizing the Delaunay Triangulation and its Parallel Implementation
Maximizing the Spread of Influence through a Social Network
The Importance of Communities for Learning to Influence
CIS 700: “algorithms for Big Data”
Coverage Approximation Algorithms
A Faster Algorithm for Computing the Principal Sequence of Partitions
CSCI B609: “Foundations of Data Science”
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Cost-effective Outbreak Detection in Networks
Submodular Maximization Through the Lens of the Multilinear Relaxation
Asymmetric Transitivity Preserving Graph Embedding
Near-Optimal Sensor Placements in Gaussian Processes
Submodular Maximization in the Big Data Era
Independent Cascade Model and Linear Threshold Model
Gordana Vujovic - 30th ACM Symposium on Parallelism in Algorithms and Architectures April 2019.
Submodular Maximization with Cardinality Constraints
Guess Free Maximization of Submodular and Linear Sums
Presentation transcript:

Optimistic Concurrency Control in the Design and Analysis of Parallel Learning Algorithms Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal@eecs.berkeley.edu

Serial Inference Model State Data

Parallel Inference Model State Processor 1 Data Processor 2

? Parallel Inference Correctness: Concurrency: serial equivalence Model State Processor 1 Data Processor 2 Correctness: serial equivalence Concurrency: more machines = less time

Coordination Free Parallel Inference ? Model State Processor 1 Data Processor 2 Correctness? Depends on Assumptions Keep Calm and Carry On. Concurrency: (almost) free

Serial Correctness Low High

Concurrency Coordination- free High Low Serial Correctness Low High

Concurrency Control Concurrency Coordination- free Serial Correctness High Database mechanisms Guarantee correctness Maximize concurrency Mutual exclusion Optimistic CC Low Serial Correctness Low High

Optimistic Concurrency Control Model State Processor 1 Data Processor 2 Allow computation to proceed without blocking. Kung & Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems 1981

Optimistic Concurrency Control Model State ✔ ? Valid outcome Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems 1981

Optimistic Concurrency Control ✗ Invalid Outcome ✗ Model State Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems 1981

Optimistic Concurrency Control ✗ Rollback and Redo ✗ Model State Processor 1 Data Processor 2 Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems 1981

Optimistic Concurrency Control Rollback and Redo Model State Processor 1 Data Processor 2 Non-Blocking Computation Concurrency Validation: Identify Errors Resolution: Correct Errors Correctness

Optimistic Concurrency Control for Machine Learning Non-parametric Clustering Distributed DP-Means [NIPS’13] Submodular Optimization Double Greedy Submodular Maximization [NIPS’14]

Optimistic Concurrency Control for Submodular Maxmization Today, talk about submodular maximization. But doing so in a scalable way while preserving optimal approximation. Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael I. Jordan

Submodular Set Functions Diminishing Returns Property F: 2V → R, such that for all and

Submodular Examples Sensing Network Analysis Max Cut S (Krause & Guestrin 2005) Sensing (Kempe, Kleinberg, Tardos 2003, Mossel & Roch 2007) Network Analysis clustering graph partitioning Max Cut S Graph Algorithms Document Summarization (Lin & Bilmes 2011)

Submodular Maximization max F(A), A ⊆ V Monotone (increasing) functions [ Positive marginal gains ] Non-monotone functions Greedy (Nemhauser et al, 1978) (1-1/e) - approximation Optimal polytime Double Greedy (Buchbinder et al, 2012) ½ - approximation Optimal polytime Sequential ----- Meeting Notes (10/20/14 15:53) ----- Monotone -- positive marginal gains Greedy is sequential, depends on what we have seen Parallelization is non-trivial People have done parallel Non-monotone is important, for example if you have costs What about the parallel setting? GREEDI (Mirzasoleiman et al, 2013) (1-1/e)2 / p – approximation 1 MapReduce round (Kumar et al, 2013) 1 / (2+ε) – approximation O(1/ε) MapReduce rounds Concurrency Control Double Greedy Optimal ½ - approximation Bounded overhead Coordination Free Double Greedy Bounded error Minimal overhead ? Parallel / Distributed

Double Greedy Algorithm v w x y z Set A Set B u v w Maintain set A, initialized to empty. Double greedy  maintain additional set B, initialized to the full set. Sequentially process each element, greedily decide to either add each element to A or remove it from B. x y z

Double Greedy Algorithm v w x y z Set A Set B u v w x y z

Double Greedy Algorithm v w x y z Set A Set B Marginal gains Δ+(u|A) = F(A ∪ u) – F(A), Δ-(u|B) = F(B \ u) – F(B). u u v w p( | A, B ) = u rand + A - B 1 Specifically, for each element, compute the marginal gains for Adding to A Removing from B Random decision to either add u to A, or to remove u from B Preference proportional to the marginal gains of each We draw a random number, and if falls in the red region, we add the element to A. x y z

Double Greedy Algorithm v w x y z Set A Set B Marginal gains Δ+(u|A) = F(A ∪ u) – F(A), Δ-(u|B) = F(B \ u) – F(B). u u v w p( | A, B ) = u rand + A - B 1 x y z

Double Greedy Algorithm v w x y z Set A Set B Marginal gains Δ+(v|A) = F(A ∪ v) – F(A), Δ-(v|B) = F(B \ v) – F(B). u u v v w p( | A, B ) = v rand + A - B 1 x y z

Double Greedy Algorithm w x y z Set A Set B u u w x y z Return A

Parallel Double Greedy Algorithm v w x y z Set A Set B CPU 1 u v w CPU 2 x y z

Parallel Double Greedy Algorithm Set A Set B CPU 1 u w y Δ+(u | ?) = ? Δ-(u | ?) = ? ? u ? u u v v v w CPU 2 v x z x Δ+(v | ?) = ? Δ-(v | ?) = ? y z

Concurrency Control Double Greedy Maintain bounds on A, B  Enable threads to make decisions locally Set A Set B CPU 1 u w y u u u v w CPU 2 v x z x v v y z

Concurrency Control Double Greedy Maintain bounds on A, B  Enable threads to make decisions locally Set A Set B CPU 1 w y u Δ+(u|A) ∈ [Δ+min(u|A), Δ+max(u|A)] Δ-(u|B) ∈ [Δ-min(u|B), Δ-max(u|B) ] u u p( | A, B ) = u + A - B 1 Uncertainty v v w CPU 2 x z x v Δ+(v|A) ∈ [Δ+min(v|A), Δ+max(v|A)] Δ-(v|B) ∈ [Δ-min(v|B), Δ-max(v|B) ] y p( | A, B ) = v z + A - B 1 Uncertainty

Concurrency Control Double Greedy Maintain bounds on A, B  Enable threads to make decisions locally Set A Set B CPU 1 w y u u Δ+(u|A) ∈ [Δ+min(u|A), Δ+max(u|A)] Δ-(u|B) ∈ [Δ-min(u|B), Δ-max(u|B) ] u u p( | A, B ) = u rand + A - B 1 Uncertainty v v w CPU 2 x z x v Δ+(v|A) ∈ [Δ+min(v|A), Δ+max(v|A)] Δ-(v|B) ∈ [Δ-min(v|B), Δ-max(v|B) ] y p( | A, B ) = v z + A - B 1 Uncertainty

Concurrency Control Double Greedy Maintain bounds on A, B  Enable threads to make decisions locally Set A Set B CPU 1 w y Common, Fast Δ+(u|A) ∈ [Δ+min(u|A), Δ+max(u|A)] Δ-(u|B) ∈ [Δ-min(u|B), Δ-max(u|B) ] u u p( | A, B ) = u rand + A - B 1 Uncertainty v v w CPU 2 x z Rare, Slow x v v Δ+(v|A) = F(A ∪ v) – F(A) Δ-(v|B) = F(B \ v) – F(B) Δ+(v|A) ∈ [Δ+min(v|A), Δ+max(v|A)] Δ-(v|B) ∈ [Δ-min(v|B), Δ-max(v|B) ] y p( | A, B ) = v rand z + A - B 1 Uncertainty

Properties of CC Double Greedy Theorem: CC double greedy is serializable. Corollary: CC double greedy preserves optimal approximation guarantee of ½OPT. Lemma: CC has bounded overhead. Expected number of blocked elements set cover with costs: < 2τ sparse max cut: < 2τ |E| / |V| Correctness Set A Set B w D E z x y u Concurrency τ

Change in Analysis Correctness Scalability Easy Proof Coordination Free: Provably fast and correct under key assumptions. Concurrency Control: Provably correct and fast under key assumptions. Correctness Easy Proof Scalability Challenging Proof

Empirical Validation Multicore up to 16 threads Set cover, Max graph cut Real and synthetic graphs Graph Vertices Edges IT-2004 Italian web-graph 41 Million 1.1 Billion UK-2005 UK web-graph 39 Million 0.9 Billion Arabic-2005 Arabic web-graph 22 Million 0.6 Billion Friendster Social sub-network 10 Million Erdos-Renyi Synthetic random 20 Million 2.0 Billion ZigZag Synthetic expander 25 Million

CC Double Greedy Coordination Increase in Coordination % elements failed + blocked Only 0.01% needs blocking!

Runtime and Strong-Scaling Concurrency Ctrl. Sequential ----- Meeting Notes (10/20/14 16:14) ----- Remove CF

Conclusion Paper @ NIPS 2014: Scalability Approximation Always slow Always optimal Sequential Double Greedy Usually fast Always optimal Concurrency Control Double Greedy Always fast Near optimal Coordination Free Double Greedy Algorithms developed in the context of broader research that looks into applying concurrency control to sequential algorithms guarantees an equivalent outcome and provides scalability Paper @ NIPS 2014: Parallel Double Greedy Submodular Maximization.

Backup Slides

from same dependencies Concurrency Control Coordination Free Theorem: serializable. preserves optimal approximation bound ½OPT. Lemma: bounded overhead. Lemma: approximation bound ½OPT - error Correctness Correctness set cover with costs: sparse maxcut: Set A Set B w D E z x y u τ from same dependencies (uncertainty region) no overhead Concurrency Concurrency set cover with costs: sparse maxcut:

 Increased coordination Adversarial Setting Processing order Overlapping covers  Increased coordination

Adversarial Setting – Ring Set Cover Faster Larger Error Coord Free Always fast Possibly wrong Conc Ctrl Possibly slow Always optimal