Download presentation
Presentation is loading. Please wait.
Published bySheena White Modified over 9 years ago
1
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading Group
2
Motivations Industry-wide shift to multicore No good framework for parallelize ML algorithms Goal: develop a general and exact technique for parallel programming of a large class of ML algorithms for multicore processors
3
Idea Statistical Query Model Summation Form Map-Reduce
4
Outline Introduction Statistical Query Model and Summation Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion
5
Valiant Model [Valiant’84] x is the input y is a function of x that we want to learn In Valiant model, the learning algorithm uses randomly drawn examples to learn the target function
6
Statistical Query Model [Kearns’98] A restriction on Valiant model A learning algorithm uses some aggregates over the examples, not the individual examples More precisely, the learning algorithm interacts with a statistical query oracle Learning algorithm asks about f(x,y) Oracle returns the expectation that f(x,y) is true
7
Summation Form Aggregate over the data: Divide the data set into pieces Compute aggregates on each cores Combine all results at the end
8
Example: Linear Regression using Least Squares Model: Goal: Solution: Given m examples: (x1, y1), (x2, y2), …, (xm, ym) We write a matrix X with x1, …, xm as rows, and row vector Y=(y1, y2, …ym). Then the solution is Parallel computation: Cut to m/num_processor pieces
9
Outline Introduction Statistical Query Model and Summation Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion
10
Lighter Weight Map-Reduce for Multicore
11
Outline Introduction Statistical Query Model and Summation Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion
12
Locally Weighted Linear Regression (LWLR) Mappers: one sets compute A, the other set compute b Two reducers for computing A and b Finally compute the solution When wi==1, this is least squares. Solve:
13
Naïve Bayes (NB) Goal: estimate P(xj=k|y=1) and P(xj=k|y=0) Computation: count the occurrence of (xj=k, y=1) and (xj=k, y=0), count the occurrence of (y=1) and (y=0), the compute division Mappers: count a subgroup of training samples Reducer: aggregate the intermediate counts, and calculate the final result
14
Gaussian Discriminative Analysis (GDA) Goal: classification of x into classes of y assuming each class is a Gaussian Mixture model with different means but same covariance. Computation: Mappers: compute for a subset of training samples Reducer: aggregate intermediate results
15
K-means Computing the Euclidean distance between sample vectors and centroids Recalculating the centroids Divide the computation to subgroups to be handled by map-reduce
16
Expectation Maximization (EM) E-step computes some prob or counts per training example M-step combines these values to update the parameters Both of them can be parallelized using map-reduce
17
Neural Network (NN) Back-propagation, 3-layer network Input, middle, 2 output nodes Goal: compute the weights in the NN by back propagation Mapper: propagate its set of training data through the network, and propagate errors to calculate the partial gradient for weights Reducer: sums the partial gradients and does a batch gradient descent to update the weights
18
Principal Components Analysis (PCA) Compute the principle eigenvectors of the covariance matrix Clearly, we can compute the summation form using map-reduce
19
Other Algorithms Logistic Regression Independent Component Analysis Support Vector Machine
20
Time Complexity
21
Outline Introduction Statistical Query Model and Summation Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion
22
Setup Compare map-reduce version and sequential version 10 data sets Machines: Dual-processor Pentium-III 700MHz, 1GB RAM 16-way Sun Enterprise 6000 (these are SMP, not multicore)
23
Dual-Processor SpeedUps
24
2-16 processor speedups More data in the paper
25
Multicore Simulator Results A paragraph on this Basically, says that results are better than multiprocessor machines. Could be because of less communication cost
26
Conclusion Parallelize summation forms Use map-reduce on a single machine
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.