Download presentation
Presentation is loading. Please wait.
Published byNoah Owen Modified over 9 years ago
1
1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org
2
2 About me ASF member having fun with: Lucene / Solr Hama UIMA Stanbol … some others SW engineer @ Adobe R&D
3
3 Agenda Apache Hama and BSP Why machine learning on BSP Some examples Benchmarks
4
4 Apache Hama Bulk Synchronous Parallel computing framework on top of HDFS for massive scientific computations TLP since May 2012 0.6.0 release out soon Growing community
5
5 BSP supersteps A BSP algorithm is composed by a sequence of “supersteps”
6
6 BSP supersteps Each task Superstep 1 Do some computation Communicate with other tasks Synchronize Superstep 2 Do some computation Communicate with other tasks Synchronize … Superstep N Do some computation Communicate with other tasks Synchronize
7
7 Why BSP Simple programming model Supersteps semantic is easy Preserve data locality Improve performance Well suited for iterative algorithms
8
8 Apache Hama architecture BSP Program execution flow
9
9 Apache Hama architecture
10
10 Apache Hama Features BSP API M/R like I/O API Graph API Job management / monitoring Checkpoint recovery Local & (Pseudo) Distributed run modes Pluggable message transfer architecture YARN supported Running in Apache Whirr
11
11 Apache Hama BSP API public abstract class BSP … K1, V1 are key, values for inputs K2, V2 are key, values for outputs M are they type of messages used for task communication
12
12 Apache Hama BSP API public void bsp(BSPPeer peer) throws.. public void setup(BSPPeer peer) throws.. public void cleanup(BSPPeer peer) throws..
13
13 Machine learning on BSP Lots (most?) of ML algorithms are inherently iterative Hama ML module currently counts Collaborative filtering Clustering Gradient descent
14
14 Benchmarking architecture HDFS Solr Lucene DBMS Hama Mahout Node
15
15 Collaborative filtering Given user preferences on movies We want to find users “near” to some specific user So that that user can “follow” them And/or see what they like (which he/she could like too)
16
16 Collaborative filtering BSP Given a specific user Iteratively (for each task) Superstep 1*i Read a new user preference row Find how near is that user from the current user That is finding how near their preferences are Since they are given as vectors we may use vector distance measures like Euclidean, cosine, etc. distance algorithms Broadcast the measure output to other peers Superstep 2*i Aggregate measure outputs Update most relevant users Still to be committed (HAMA-612)
17
17 Collaborative filtering BSP Given user ratings about movies "john" -> 0, 0, 0, 9.5, 4.5, 9.5, 8 "paula" -> 7, 3, 8, 2, 8.5, 0, 0 "jim” -> 4, 5, 0, 5, 8, 0, 1.5 "tom" -> 9, 4, 9, 1, 5, 0, 8 "timothy" -> 7, 3, 5.5, 0, 9.5, 6.5, 0 We ask for 2 nearest users to “paula” and we get “timothy” and “tom” user recommendation We can extract highly rated movies “timothy” and “tom” that “paula” didn’t see Item recommendation
18
18 Benchmarks Fairly simple algorithm Highly iterative Comparing to Apache Mahout Behaves better than ALS-WR Behaves similarly to RecommenderJob and ItemSimilarityJob
19
19 K-Means clustering We have a bunch of data (e.g. documents) We want to group those docs in k homogeneous clusters Iteratively for each cluster Calculate new cluster center Add doc nearest to new center to the cluster
20
20 K-Means clustering
21
21 K-Means clustering BSP Iteratively Superstep 1*i Assignment phase Read vectors splits Sum up temporary centers with assigned vectors Broadcast sum and ingested vectors count Superstep 2*i Update phase Calculate the total sum over all received messages and average Replace old centers with new centers and check for convergence
22
22 Benchmarks One rack (16 nodes 256 cores) cluster 10G network On average faster than Mahout’s impl
23
23 Gradient descent Optimization algorithm Find a (local) minimum of some function Used for solving linear systems solving non linear systems in machine learning tasks linear regression logistic regression neural networks backpropagation …
24
24 Gradient descent Minimize a given (cost) function Give the function a starting point (set of parameters) Iteratively change parameters in order to minimize the function Stop at the (local) minimum There’s some math but intuitively: evaluate derivatives at a given point in order to choose where to “go” next
25
25 Gradient descent BSP Iteratively Superstep 1*i each task calculates and broadcasts portions of the cost function with the current parameters Superstep 2*i aggregate and update cost function check the aggregated cost and iterations count cost should always decrease Superstep 3*i each task calculates and broadcasts portions of (partial) derivatives Superstep 4*i aggregate and update parameters
26
26 Gradient descent BSP Simplistic example Linear regression Given real estate market dataset Estimate new houses prices given known houses’ size, geographic region and prices Expected output: actual parameters for the (linear) prediction function
27
27 Gradient descent BSP Generate a different model for each region House item vectors price -> size 150k -> 80 2 dimensional space ~1.3M vectors dataset
28
28 Gradient descent BSP Dataset and model fit
29
29 Gradient descent BSP Cost checking
30
30 Gradient descent BSP Classification Logistic regression with gradient descent Real estate market dataset We want to find which estate listings belong to agencies To avoid buying from them Same algorithm With different cost function and features Existing items are tagged or not as “belonging to agency” Create vectors from items’ text Sample vector 1 -> 1 3 0 0 5 3 4 1
31
31 Gradient descent BSP Classification
32
32 Benchmarks Not directly comparable to Mahout’s regression algorithms Both SGD and CGD are inherently better than plain GD But Hama GD had on average same performance of Mahout’s SGD / CGD Next step is implementing SGD / CGD on top of Hama
33
33 Wrap up Even if ML module is still “young” / work in progress and tools like Apache Mahout have better “coverage” Apache Hama can be particularly useful in certain “highly iterative” use cases Interesting benchmarks
34
34 Thanks!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.