GraphLab A New Parallel Framework for Machine Learning Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury.

Slides:



Advertisements
Similar presentations
Danny Bickson Parallel Machine Learning for Large-Scale Graphs
Advertisements

Neural networks Introduction Fitting neural networks
Lecture 19: Parallel Algorithms
epiC: an Extensible and Scalable System for Processing Big Data
Make Sense of Big Data Researched by JIANG Wen-rui Led by Pro. ZOU
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Distributed Graph Processing Abhishek Verma CS425.
Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New Parallel Framework.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe Hellerstein.
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.
Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron Alex Smola The Next.
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Kanat Tangwon- gsan Carlos Guestrin Guy Blelloch Joe Hellerstein.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.
1 Parallel Algorithms III Topics: graph and sort algorithms.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Distributed Systems CS Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1.
Data Structures and Algorithms1 Graphs 3 Adapted From: Data Structures and Their Algorithms, by Harry R. Lewis and Larry Denenberg (Harvard University:
Big Learning with Graph Computation Joseph Gonzalez Download the talk:
Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The.
GraphLab A New Framework for Parallel Machine Learning
L21: “Irregular” Graph Algorithms November 11, 2010.
Carnegie Mellon University GraphLab Tutorial Yucheng Low.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
An Example of Course Project Face Identification.
GraphLab: how I understood it with sample code Aapo Kyrola, Carnegie Mellon Univ. Oct 1, 2009.
Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.
Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.
Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.
Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.
Distributed Systems CS
GraphX: Graph Analytics on Spark
Data Structures and Algorithms in Parallel Computing
Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.
Logistic Regression & Elastic Net
Carnegie Mellon University Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Joe Hellerstein Alex Smola The Next Generation.
PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University;
A Distributed Framework for Machine Learning and Data Mining in the Cloud BY YUCHENG LOW, JOSEPH GONZALEZ, AAPO KYROLA, DANNY BICKSON, CARLOS GUESTRIN.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation of.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
Parallel and Distributed Systems for Probabilistic Reasoning
TensorFlow– A system for large-scale machine learning
Big Data: Graph Processing
Majd F. Sakr, Suhail Rehman and
A New Parallel Framework for Machine Learning
Big Learning with Graphs
Sublinear Computational Time Modeling in Statistical Machine Learning Theory for Markov Random Fields Kazuyuki Tanaka GSIS, Tohoku University, Sendai,
CSCI5570 Large Scale Data Processing Systems
Exact Inference Continued
Lecture 22: Parallel Algorithms
COS 518: Advanced Computer Systems Lecture 12 Mike Freedman
Distributed Systems CS
CSE 373 Graphs 4: Topological Sort reading: Weiss Ch. 9
COS 418: Distributed Systems Lecture 19 Wyatt Lloyd
Exact Inference Continued
Big Data I: Graph Processing, Distributed Machine Learning
CMPT 733, SPRING 2017 Jiannan Wang
Markov Networks.
Presentation transcript:

GraphLab A New Parallel Framework for Machine Learning Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury

Belief Propagation SVM Kernel Methods Deep Belief Networks Neural Networks Tensor Factorization PageRank Lasso The Need for a New Abstraction 2 Data-Parallel Graph-Parallel Cross Validation Feature Extraction Map Reduce Computing Sufficient Statistics Pregel (Giraph)

GraphLab wants to support 1.Sparse Computational Dependencies 2.Asynchronous Iterative Computation 3.Sequential Consistency 4.Prioritized Ordering 5.Rapid Development

The GraphLab Framework Scheduler Consistency Model Graph Based Data Representation Update Functions User Computation 4

Data Graph 5 A graph with arbitrary data (C++ Objects) associated with each vertex and edge. Vertex Data: User profile text Current interests estimates Edge Data: Similarity weights Graph: Social Network

label_prop(i, scope){ // Get Neighborhood data (Likes[i], W ij, Likes[j])  scope; // Update the vertex data // Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); } Update Functions 6 An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

The Scheduler 7 CPU 1 CPU 2 The scheduler determines the order that vertices are updated. e e f f g g k k j j i i h h d d c c b b a a b b i i h h a a i i b b e e f f j j c c Scheduler The process repeats until the scheduler is empty.

Sequential Consistency Models – Full Consistency – Edge Consistency Write Canonical Lock Ordering ReadWrite Read Write

Consistency Through Scheduling Edge Consistency Model: – Two vertices can be Updated simultaneously if they do not share an edge. Graph Coloring: – Two vertices can be assigned the same color if they do not share an edge. Barrier Phase 1 Barrier Phase 2 Barrier Phase 3

Algorithms Implemented PageRank Loopy Belief Propagation Gibbs Sampling CoEM Graphical Model Parameter Learning Probabilistic Matrix/Tensor Factorization Alternating Least Squares Lasso with Sparse Features Support Vector Machines with Sparse Features Label-Propagation …

The Table