Distributed Computation Framework for Machine Learning

Slides:

Advertisements

Similar presentations

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.

Advertisements

epiC: an Extensible and Scalable System for Processing Big Data

Nokia Technology Institute Natural Partner for Innovation.

SCALING SGD to Big dATA & Huge Models

DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,

Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.

Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min.

Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

Java Implementation of Petuum Yuxin Su September 2, 2014.

OPL: Our Pattern Language. Background Design Patterns: Elements of Reusable Object-Oriented Software o Introduced patterns o Very influential book Pattern.

Traffic Sign Recognition Using Artificial Neural Network Radi Bekker

Sebastian Schelter, Venu Satuluri, Reza Zadeh

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

CSCI-2950u :: Data-Intensive Scalable Computing Rodrigo Fonseca (rfonseca)

Online Learning for Matrix Factorization and Sparse Coding

Distributed Computing Systems Current Issues in DCS Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,

SU YUXIN JAN 20, 2014 Petuum: An Iterative-Convergent Distributed Machine Learning Framework.

Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.

HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla.

W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.

CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.

MRPGA ： An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter ：古乃卉.

Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.

PETUUM A New Platform for Distributed Machine Learning on Big Data

PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.

Data Structures and Algorithms in Parallel Computing Lecture 4.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.

CSci6702 Parallel Computing Andrew Rau-Chaplin

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

Biomedicine and Big Data Analyzing spatio-temporal patterns in biomedical data Normal Stiff Wavy.

Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.

Experimental Perspectives on Lasso-related Algorithms on Parallel Computing Frameworks

MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.

Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

The role of optimization in machine learning

Big Data is a Big Deal!.

Sushant Ahuja, Cassio Cristovao, Sameep Mohta

Sathya Ronak Alisha Zach Devin Josh

Large-scale Machine Learning

Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides

Spark Presentation.

Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.

Paper by David peleg Presentation by Vanessa surjadidjaja

Hadoop Clusters Tess Fulkerson.

Distributed Systems CS

Applying Twister to Scientific Applications

Sample Projects.

Apache Spark & Complex Network

CS110: Discussion about Spark

Scalable Parallel Interoperable Data Analytics Library

Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC

KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner

HPML Conference, Lyon, Sept 2018

Overview of big data tools

Big Data Young Lee BUS 550.

Introduction Are you looking to bag a dream job as a Hadoop YARN developer? If yes, then you must buck up your efforts and start preparing for all the.

Lecture 16 (Intro to MapReduce and Hadoop)

Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC

Distributed Systems CS

TensorFlow: A System for Large-Scale Machine Learning

Multiple features Linear Regression with multiple variables

Multiple features Linear Regression with multiple variables

Presentation transcript:

Distributed Computation Framework for Machine Learning Yuxin Su Big Data Group Apr. 25, 2014

Popular Frameworks Hadoop / MapReduce GraphLab Data-oriented Model Unfriendly to Iterative- based Algorithms GraphLab Dependence-oriented Model

I’m focusing on Petuum A new extension of Bulk Synchronous Parallel Error tolerance for reducing communication demand

Non-negative Matrix Factorization A commonly used algorithm It’s hard to scale up W ≥ 0, H ≥ 0

NMF on Petuum My goal: find an efficient approach to handle huge matrix with billions items Current work Design parallel coordinate descent to solve NMF Analyze the convergence of my approach on Petuum