Download presentation
Presentation is loading. Please wait.
Published byRandell Hood Modified over 9 years ago
1
Chao Liu Internet Services Research Center Microsoft Research-Redmond
2
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm Customized ML on MapReduce Click Modeling Behavior Targeting Conclusions 11/2/20152
3
Data on the Web Scale: terabyte-to-petabyte data ▪ Around 20TB log data per day from Bing Dynamics: evolving data streams ▪ Click data streams with evolving/emerging topics Applications: Non-traditional ML tasks ▪ Predicting clicks & ads 11/2/20153
4
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm Customized ML on MapReduce Click Modeling Behavior Targeting Conclusions 11/2/20154
5
Parallel computing All processors have access to a shared memory, which can be used to exchange information between processors Distributed computing Each processor has its own private memory (distributed memory), communicating over the network ▪ Message passing ▪ MapReduce 11/2/20155
6
MPI is for task parallelism Suitable for CPU-intensive jobs Fine-grained communication control, powerful computation model MapReduce is for data parallelism Suitable for data-intensive jobs A restricted computation model 11/2/20156
7
7 Reducer Aggregate values by keys …… …… Mapper docs (docId, doc) pairs (w1,1) (w2,1) (w3,1) (w1, ) (w1, 3) Mapper docs (docId, doc) pairs (w1,1) (w3,1) Mapper docs (docId, doc) pairs (w1,1) (w2,1) (w3,1) Reducer (w2, ) (w2, 2) Reducer (w3, ) (w3, 3) … Web corpus on multiple machines Mapper: for each word w in a doc, emit (w, 1) Intermediate (key,value) pairs are aggregated by word Reducer is copied to each machine to run over the intermediate data locally to produce the result
8
A big picture: Not Omnipotent but good enough 11/2/20158 Standard ML AlgorithmCustomized ML Algorithm MapReduce Friendly Classification: Naïve Bayes, logistic regression, MART, etc Clustering: k-means, NMF, co- clustering, etc Modeling: EM algorithm, Gaussian mixture, Latent Dirichlet Allocation, etc PageRank Click Models Behavior Tageting MapReduce Unfriendly Classification: SVM Clustering: Spectrum clustering Learning-to-Rank
9
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm Customized ML on MapReduce Click Modeling Behavior Targeting Conclusions 11/2/20159
10
P(C|X) P(C) P(X|C) =P(C)∏P(X j |C) 10 …… Mapper (x (i),y (i) ) (j, x j (i),y (i) ) Reduce on y (i) P(C) Reduce on j P(X j |C) (x (i),y (i) ) Mapper …… ……
11
Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006] Interpretable dimensionality reduction [Lee & Seung, 1999] Document clustering [Shahnaz et al., 2006, Xu et al, 2006] Challenge: Can we scale NMF to million-by-million matrices
13
Data Partition: A, W and H across machines ………….....
15
… … … … Map-I Reduce-I Map-II Reduce-II Map-III Map-IV Map-V … … … … … … Reduce-III Reduce-V
16
… … … Map-I Reduce-I Map-II Reduce-II … … …
17
… … Map-III Map-IV … Reduce-III........
18
… Map-V … … … … Reduce -V
19
… … … … Map-I Reduce-I Map-II Reduce-II Map-III Map-IV Map-V … … … … … … Reduce-III Reduce-V
20
3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values
21
Map Evaluate Compute Reduce 11/2/201521
22
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm Customized ML on MapReduce Click Modeling Behavior Targeting Conclusions 11/2/201522
23
Clicks are good… Are these two clicks equally “good”? Non-clicks may have excuses: Not relevant Not examined 11/2/201523
24
2411/2/2015
25
query URL 1 URL 2 URL 3 URL 4 C1C1 C2C2 C3C3 C4C4 S1S1 S2S2 S3S3 S4S4 Relevance E1E1 E2E2 E3E3 E4E4 Examine Snippet ClickThroughs
26
S1S1 E1E1 E2E2 C1C1 S2S2 C2C2 … … … SiSi EiEi CiCi the preceding click position before i
27
Ultimate goal Observation: conditional independence
28
Likelihood of search instance From S to R:
29
Posterior with Re-organize by R j ’s How many times d j was clicked How many times d j was not clicked when it is at position (r + d) and the preceding click is on position r
30
Exact inference with joint posterior in closed form Joint posterior factorizes and hence mutually independent At most M(M+1)/2 + 1 numbers to fully characterize each posterior Count vector:
31
Compute Count vector for R 4 0 0 00 0 0 0 1 2 3 2 13 2 1 0 N4N4 N 4, r, d 1 1
32
Map: emit((q,u), idx) Reduce: construct the count vector
33
(U1, 0) (U2, 4) (U3, 0) Map (U1, 1) (U3, 0) (U4, 7) Map (U1, 1) (U3, 0) (U4, 0) Map (U1, 0, 1, 1) (U2, 4)(U4, 0, 7)(U3, 0, 0, 0) Reduce
34
Setup: 8 weeks data, 8 jobs Job k takes first k- week data Experiment platform – SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [Chaiken et al, VLDB’08]
35
Increasing computation load more queries, more urls, more impressions Near-constant elapse time Computation Overload Elapse Time on SCOPE 3 hours Scan 265 terabyte data Full posteriors for 1.15 billion (query, url) pairs
36
Behavior targeting Ad serving based on users’ historical behaviors Complementary to sponsored Ads and content Ads 11/2/201536
37
Goal Given ads in a certain category, locate qualified users based on users’ past behaviors Data User is identified by cookie Past behavior, profiled as a vector x, includes ad clicks, ad views, page views, search queries, clicks, etc Challenges: Scale: e.g., 9TB ad data with 500B entries in Aug'08 Sparse: e.g., the CTR of automotive display ads is 0.05% Dynamic: i.e., user behavior changes over time. 11/2/201537
38
CTR = ClickCnt/ViewCnt A model to predict expected click count A model to predict expected view count Linear Poisson model MLE on w 11/2/201538
39
Learning Map: Compute and Reduce: Update Prediction 11/2/201539
40
Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm Customized ML on MapReduce Click Modeling Behavior Targeting Conclusions 11/2/201540
41
Challenges imposed by Web data Scalability of standard algorithms Application-driven customized algorithms Capability to consume huge amount of data outweighs algorithm sophistication Simple counting is no less powerful than sophisticated algorithms when data is abundant or even infinite MapReduce: a restricted computation model Not omnipotent but powerful enough Things we want to do turn out to be things we can do 11/2/201541
42
Thank You! 11/2/2015SEWM‘10 Keynote, Chengdu, China42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.