1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org.

Slides:

Advertisements

Similar presentations

CS525: Special Topics in DBs Large-Scale Data Management

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Regularization David Kauchak CS 451 – Fall 2013.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Supervised Learning Recap

Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,

APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Recommender systems Ram Akella November 26 th 2008.

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

Distributed Iterative Training Kevin Gimpel Shay Cohen Severin Hacker Noah A. Smith.

Radial Basis Function Networks

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Collaborative Filtering Matrix Factorization Approach

Introduction of Apache Hama Edward J. Yoon, October 11, 2011.

CSE 185 Introduction to Computer Vision Pattern Recognition.

Radial Basis Function Networks

CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

Classification / Regression Neural Networks 2

Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.

Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.

Scalable Machine Learning CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki

Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

M Machine Learning F# and Accord.net.

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Optimization Indiana University July Geoffrey Fox

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

Apache Mahout Industrial Strength Machine Learning Jeff Eastman.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Recommendation Systems ARGEDOR. Introduction Sample Data Tools Cases.

A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

A Simple Approach for Author Profiling in MapReduce

Image taken from: slideshare

Big data classification using neural network

Semi-Supervised Clustering

A Straightforward Author Profiling Approach in MapReduce

Introducing Apache Mahout

Multimodal Learning with Deep Boltzmann Machines

10701 / Machine Learning.

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Classification with Perceptrons Reading:

Machine Learning Basics

CMPT 733, SPRING 2016 Jiannan Wang

Advanced Artificial Intelligence

Collaborative Filtering Matrix Factorization Approach

CS110: Discussion about Spark

Overview of big data tools

Convolutional networks

CSE 491/891 Lecture 25 (Mahout).

Indiana University July Geoffrey Fox

Logistic Regression Geoff Hulten.

Presentation transcript:

1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org

2 About me ASF member having fun with: Lucene / Solr Hama UIMA Stanbol … some others SW Adobe R&D

3 Agenda Apache Hama and BSP Why machine learning on BSP Some examples Benchmarks

4 Apache Hama Bulk Synchronous Parallel computing framework on top of HDFS for massive scientific computations TLP since May release out soon Growing community

5 BSP supersteps A BSP algorithm is composed by a sequence of “supersteps”

6 BSP supersteps Each task Superstep 1 Do some computation Communicate with other tasks Synchronize Superstep 2 Do some computation Communicate with other tasks Synchronize … Superstep N Do some computation Communicate with other tasks Synchronize

7 Why BSP Simple programming model Supersteps semantic is easy Preserve data locality Improve performance Well suited for iterative algorithms

8 Apache Hama architecture BSP Program execution flow

9 Apache Hama architecture

10 Apache Hama Features BSP API M/R like I/O API Graph API Job management / monitoring Checkpoint recovery Local & (Pseudo) Distributed run modes Pluggable message transfer architecture YARN supported Running in Apache Whirr

11 Apache Hama BSP API public abstract class BSP … K1, V1 are key, values for inputs K2, V2 are key, values for outputs M are they type of messages used for task communication

12 Apache Hama BSP API public void bsp(BSPPeer peer) throws.. public void setup(BSPPeer peer) throws.. public void cleanup(BSPPeer peer) throws..

13 Machine learning on BSP Lots (most?) of ML algorithms are inherently iterative Hama ML module currently counts Collaborative filtering Clustering Gradient descent

14 Benchmarking architecture HDFS Solr Lucene DBMS Hama Mahout Node

15 Collaborative filtering Given user preferences on movies We want to find users “near” to some specific user So that that user can “follow” them And/or see what they like (which he/she could like too)

16 Collaborative filtering BSP Given a specific user Iteratively (for each task) Superstep 1*i Read a new user preference row Find how near is that user from the current user That is finding how near their preferences are Since they are given as vectors we may use vector distance measures like Euclidean, cosine, etc. distance algorithms Broadcast the measure output to other peers Superstep 2*i Aggregate measure outputs Update most relevant users Still to be committed (HAMA-612)

17 Collaborative filtering BSP Given user ratings about movies "john" -> 0, 0, 0, 9.5, 4.5, 9.5, 8 "paula" -> 7, 3, 8, 2, 8.5, 0, 0 "jim” -> 4, 5, 0, 5, 8, 0, 1.5 "tom" -> 9, 4, 9, 1, 5, 0, 8 "timothy" -> 7, 3, 5.5, 0, 9.5, 6.5, 0 We ask for 2 nearest users to “paula” and we get “timothy” and “tom” user recommendation We can extract highly rated movies “timothy” and “tom” that “paula” didn’t see Item recommendation

18 Benchmarks Fairly simple algorithm Highly iterative Comparing to Apache Mahout Behaves better than ALS-WR Behaves similarly to RecommenderJob and ItemSimilarityJob

19 K-Means clustering We have a bunch of data (e.g. documents) We want to group those docs in k homogeneous clusters Iteratively for each cluster Calculate new cluster center Add doc nearest to new center to the cluster

20 K-Means clustering

21 K-Means clustering BSP Iteratively Superstep 1*i Assignment phase Read vectors splits Sum up temporary centers with assigned vectors Broadcast sum and ingested vectors count Superstep 2*i Update phase Calculate the total sum over all received messages and average Replace old centers with new centers and check for convergence

22 Benchmarks One rack (16 nodes 256 cores) cluster 10G network On average faster than Mahout’s impl

23 Gradient descent Optimization algorithm Find a (local) minimum of some function Used for solving linear systems solving non linear systems in machine learning tasks linear regression logistic regression neural networks backpropagation …

24 Gradient descent Minimize a given (cost) function Give the function a starting point (set of parameters) Iteratively change parameters in order to minimize the function Stop at the (local) minimum There’s some math but intuitively: evaluate derivatives at a given point in order to choose where to “go” next

25 Gradient descent BSP Iteratively Superstep 1*i each task calculates and broadcasts portions of the cost function with the current parameters Superstep 2*i aggregate and update cost function check the aggregated cost and iterations count cost should always decrease Superstep 3*i each task calculates and broadcasts portions of (partial) derivatives Superstep 4*i aggregate and update parameters

26 Gradient descent BSP Simplistic example Linear regression Given real estate market dataset Estimate new houses prices given known houses’ size, geographic region and prices Expected output: actual parameters for the (linear) prediction function

27 Gradient descent BSP Generate a different model for each region House item vectors price -> size 150k -> 80 2 dimensional space ~1.3M vectors dataset

28 Gradient descent BSP Dataset and model fit

29 Gradient descent BSP Cost checking

30 Gradient descent BSP Classification Logistic regression with gradient descent Real estate market dataset We want to find which estate listings belong to agencies To avoid buying from them Same algorithm With different cost function and features Existing items are tagged or not as “belonging to agency” Create vectors from items’ text Sample vector 1 ->

31 Gradient descent BSP Classification

32 Benchmarks Not directly comparable to Mahout’s regression algorithms Both SGD and CGD are inherently better than plain GD But Hama GD had on average same performance of Mahout’s SGD / CGD Next step is implementing SGD / CGD on top of Hama

33 Wrap up Even if ML module is still “young” / work in progress and tools like Apache Mahout have better “coverage” Apache Hama can be particularly useful in certain “highly iterative” use cases Interesting benchmarks

34 Thanks!