Distributed Computation Framework for Machine Learning

Slides:



Advertisements
Similar presentations
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Advertisements

epiC: an Extensible and Scalable System for Processing Big Data
Nokia Technology Institute Natural Partner for Innovation.
SCALING SGD to Big dATA & Huge Models
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min.
Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Java Implementation of Petuum Yuxin Su September 2, 2014.
OPL: Our Pattern Language. Background Design Patterns: Elements of Reusable Object-Oriented Software o Introduced patterns o Very influential book Pattern.
Traffic Sign Recognition Using Artificial Neural Network Radi Bekker
Sebastian Schelter, Venu Satuluri, Reza Zadeh
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Tyson Condie.
CSCI-2950u :: Data-Intensive Scalable Computing Rodrigo Fonseca (rfonseca)
Online Learning for Matrix Factorization and Sparse Coding
Distributed Computing Systems Current Issues in DCS Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
SU YUXIN JAN 20, 2014 Petuum: An Iterative-Convergent Distributed Machine Learning Framework.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.
PETUUM A New Platform for Distributed Machine Learning on Big Data
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
Data Structures and Algorithms in Parallel Computing Lecture 4.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.
CSci6702 Parallel Computing Andrew Rau-Chaplin
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Biomedicine and Big Data Analyzing spatio-temporal patterns in biomedical data Normal Stiff Wavy.
Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.
Experimental Perspectives on Lasso-related Algorithms on Parallel Computing Frameworks
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
The role of optimization in machine learning
Big Data is a Big Deal!.
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Sathya Ronak Alisha Zach Devin Josh
Large-scale Machine Learning
Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides
Spark Presentation.
Map Reduce.
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Paper by David peleg Presentation by Vanessa surjadidjaja
Hadoop Clusters Tess Fulkerson.
Distributed Systems CS
Applying Twister to Scientific Applications
Sample Projects.
Apache Spark & Complex Network
CS110: Discussion about Spark
Scalable Parallel Interoperable Data Analytics Library
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
MapReduce.
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
HPML Conference, Lyon, Sept 2018
Overview of big data tools
Big Data Young Lee BUS 550.
Introduction Are you looking to bag a dream job as a Hadoop YARN developer? If yes, then you must buck up your efforts and start preparing for all the.
Lecture 16 (Intro to MapReduce and Hadoop)
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
Distributed Systems CS
TensorFlow: A System for Large-Scale Machine Learning
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Presentation transcript:

Distributed Computation Framework for Machine Learning Yuxin Su Big Data Group Apr. 25, 2014

Popular Frameworks Hadoop / MapReduce GraphLab Data-oriented Model Unfriendly to Iterative- based Algorithms GraphLab Dependence-oriented Model

I’m focusing on Petuum A new extension of Bulk Synchronous Parallel Error tolerance for reducing communication demand

Non-negative Matrix Factorization A commonly used algorithm It’s hard to scale up W ≥ 0, H ≥ 0

NMF on Petuum My goal: find an efficient approach to handle huge matrix with billions items Current work Design parallel coordinate descent to solve NMF Analyze the convergence of my approach on Petuum