Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org.

Similar presentations


Presentation on theme: "1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org."— Presentation transcript:

1 1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org

2 2 About me ASF member having fun with: Lucene / Solr Hama UIMA Stanbol … some others SW engineer @ Adobe R&D

3 3 Agenda Apache Hama and BSP Why machine learning on BSP Some examples Benchmarks

4 4 Apache Hama Bulk Synchronous Parallel computing framework on top of HDFS for massive scientific computations TLP since May 2012 0.6.0 release out soon Growing community

5 5 BSP supersteps A BSP algorithm is composed by a sequence of “supersteps”

6 6 BSP supersteps Each task Superstep 1 Do some computation Communicate with other tasks Synchronize Superstep 2 Do some computation Communicate with other tasks Synchronize … Superstep N Do some computation Communicate with other tasks Synchronize

7 7 Why BSP Simple programming model Supersteps semantic is easy Preserve data locality Improve performance Well suited for iterative algorithms

8 8 Apache Hama architecture BSP Program execution flow

9 9 Apache Hama architecture

10 10 Apache Hama Features BSP API M/R like I/O API Graph API Job management / monitoring Checkpoint recovery Local & (Pseudo) Distributed run modes Pluggable message transfer architecture YARN supported Running in Apache Whirr

11 11 Apache Hama BSP API public abstract class BSP … K1, V1 are key, values for inputs K2, V2 are key, values for outputs M are they type of messages used for task communication

12 12 Apache Hama BSP API public void bsp(BSPPeer peer) throws.. public void setup(BSPPeer peer) throws.. public void cleanup(BSPPeer peer) throws..

13 13 Machine learning on BSP Lots (most?) of ML algorithms are inherently iterative Hama ML module currently counts Collaborative filtering Clustering Gradient descent

14 14 Benchmarking architecture HDFS Solr Lucene DBMS Hama Mahout Node

15 15 Collaborative filtering Given user preferences on movies We want to find users “near” to some specific user So that that user can “follow” them And/or see what they like (which he/she could like too)

16 16 Collaborative filtering BSP Given a specific user Iteratively (for each task) Superstep 1*i Read a new user preference row Find how near is that user from the current user That is finding how near their preferences are Since they are given as vectors we may use vector distance measures like Euclidean, cosine, etc. distance algorithms Broadcast the measure output to other peers Superstep 2*i Aggregate measure outputs Update most relevant users Still to be committed (HAMA-612)

17 17 Collaborative filtering BSP Given user ratings about movies "john" -> 0, 0, 0, 9.5, 4.5, 9.5, 8 "paula" -> 7, 3, 8, 2, 8.5, 0, 0 "jim” -> 4, 5, 0, 5, 8, 0, 1.5 "tom" -> 9, 4, 9, 1, 5, 0, 8 "timothy" -> 7, 3, 5.5, 0, 9.5, 6.5, 0 We ask for 2 nearest users to “paula” and we get “timothy” and “tom” user recommendation We can extract highly rated movies “timothy” and “tom” that “paula” didn’t see Item recommendation

18 18 Benchmarks Fairly simple algorithm Highly iterative Comparing to Apache Mahout Behaves better than ALS-WR Behaves similarly to RecommenderJob and ItemSimilarityJob

19 19 K-Means clustering We have a bunch of data (e.g. documents) We want to group those docs in k homogeneous clusters Iteratively for each cluster Calculate new cluster center Add doc nearest to new center to the cluster

20 20 K-Means clustering

21 21 K-Means clustering BSP Iteratively Superstep 1*i Assignment phase Read vectors splits Sum up temporary centers with assigned vectors Broadcast sum and ingested vectors count Superstep 2*i Update phase Calculate the total sum over all received messages and average Replace old centers with new centers and check for convergence

22 22 Benchmarks One rack (16 nodes 256 cores) cluster 10G network On average faster than Mahout’s impl

23 23 Gradient descent Optimization algorithm Find a (local) minimum of some function Used for solving linear systems solving non linear systems in machine learning tasks linear regression logistic regression neural networks backpropagation …

24 24 Gradient descent Minimize a given (cost) function Give the function a starting point (set of parameters) Iteratively change parameters in order to minimize the function Stop at the (local) minimum There’s some math but intuitively: evaluate derivatives at a given point in order to choose where to “go” next

25 25 Gradient descent BSP Iteratively Superstep 1*i each task calculates and broadcasts portions of the cost function with the current parameters Superstep 2*i aggregate and update cost function check the aggregated cost and iterations count cost should always decrease Superstep 3*i each task calculates and broadcasts portions of (partial) derivatives Superstep 4*i aggregate and update parameters

26 26 Gradient descent BSP Simplistic example Linear regression Given real estate market dataset Estimate new houses prices given known houses’ size, geographic region and prices Expected output: actual parameters for the (linear) prediction function

27 27 Gradient descent BSP Generate a different model for each region House item vectors price -> size 150k -> 80 2 dimensional space ~1.3M vectors dataset

28 28 Gradient descent BSP Dataset and model fit

29 29 Gradient descent BSP Cost checking

30 30 Gradient descent BSP Classification Logistic regression with gradient descent Real estate market dataset We want to find which estate listings belong to agencies To avoid buying from them Same algorithm With different cost function and features Existing items are tagged or not as “belonging to agency” Create vectors from items’ text Sample vector 1 -> 1 3 0 0 5 3 4 1

31 31 Gradient descent BSP Classification

32 32 Benchmarks Not directly comparable to Mahout’s regression algorithms Both SGD and CGD are inherently better than plain GD But Hama GD had on average same performance of Mahout’s SGD / CGD Next step is implementing SGD / CGD on top of Hama

33 33 Wrap up Even if ML module is still “young” / work in progress and tools like Apache Mahout have better “coverage” Apache Hama can be particularly useful in certain “highly iterative” use cases Interesting benchmarks

34 34 Thanks!


Download ppt "1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org."

Similar presentations


Ads by Google