Chao Liu Internet Services Research Center Microsoft Research-Redmond.

Slides:

Advertisements

Similar presentations

Google News Personalization: Scalable Online Collaborative Filtering

Advertisements

Rocchio’s Algorithm 1. Motivation Naïve Bayes is unusual as a learner: – Only one pass through data – Order doesn’t matter 2.

Parallel Computing MapReduce Examples Parallel Efficiency Assignment

DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,

Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.

Probabilistic Clustering-Projection Model for Discrete Data

K-means clustering Hongning Wang

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University.

Cloud Computing Lecture #3 More MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, September 10, 2008 This work is licensed under a Creative.

Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.

Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009

Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.

Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.

SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Ch 4. The Evolution of Analytic Scalability

Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS

USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

Distributed Nonnegative Matrix Factorization for Web- Scale Dyadic Data Analysis on MapReduce Challenge : the scalability of available tools Definition.

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.

Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.

Lecture 2: Statistical learning primer for biologists

MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.

Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.

PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.

Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.

Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Image taken from: slideshare

Resource Elasticity for Large-Scale Machine Learning

湖南大学-信息科学与工程学院-计算机与科学系

CMPT 733, SPRING 2016 Jiannan Wang

Probabilistic Models with Latent Variables

Ch 4. The Evolution of Analytic Scalability

Michal Rosen-Zvi University of California, Irvine

Click Chain Model in Web Search

Efficient Multiple-Click Models in Web Search

Presentation transcript:

Chao Liu Internet Services Research Center Microsoft Research-Redmond

 Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/20152

 Data on the Web  Scale: terabyte-to-petabyte data ▪ Around 20TB log data per day from Bing  Dynamics: evolving data streams ▪ Click data streams with evolving/emerging topics  Applications: Non-traditional ML tasks ▪ Predicting clicks & ads 11/2/20153

 Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/20154

 Parallel computing  All processors have access to a shared memory, which can be used to exchange information between processors  Distributed computing  Each processor has its own private memory (distributed memory), communicating over the network ▪ Message passing ▪ MapReduce 11/2/20155

 MPI is for task parallelism  Suitable for CPU-intensive jobs  Fine-grained communication control, powerful computation model  MapReduce is for data parallelism  Suitable for data-intensive jobs  A restricted computation model 11/2/20156

7 Reducer Aggregate values by keys …… …… Mapper docs (docId, doc) pairs (w1,1) (w2,1) (w3,1) (w1, ) (w1, 3) Mapper docs (docId, doc) pairs (w1,1) (w3,1) Mapper docs (docId, doc) pairs (w1,1) (w2,1) (w3,1) Reducer (w2, ) (w2, 2) Reducer (w3, ) (w3, 3) … Web corpus on multiple machines Mapper: for each word w in a doc, emit (w, 1) Intermediate (key,value) pairs are aggregated by word Reducer is copied to each machine to run over the intermediate data locally to produce the result

 A big picture: Not Omnipotent but good enough 11/2/20158 Standard ML AlgorithmCustomized ML Algorithm MapReduce Friendly Classification: Naïve Bayes, logistic regression, MART, etc Clustering: k-means, NMF, co- clustering, etc Modeling: EM algorithm, Gaussian mixture, Latent Dirichlet Allocation, etc PageRank Click Models Behavior Tageting MapReduce Unfriendly Classification: SVM Clustering: Spectrum clustering Learning-to-Rank

 Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/20159

 P(C|X) P(C) P(X|C) =P(C)∏P(X j |C) 10 …… Mapper (x (i),y (i) ) (j, x j (i),y (i) ) Reduce on y (i) P(C) Reduce on j P(X j |C) (x (i),y (i) ) Mapper …… ……

 Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006]  Interpretable dimensionality reduction [Lee & Seung, 1999]  Document clustering [Shahnaz et al., 2006, Xu et al, 2006] Challenge: Can we scale NMF to million-by-million matrices

 Data Partition: A, W and H across machines ………….....

… … … … Map-I Reduce-I Map-II Reduce-II Map-III Map-IV Map-V … … … … … … Reduce-III Reduce-V

… … … Map-I Reduce-I Map-II Reduce-II … … …

… … Map-III Map-IV … Reduce-III

… Map-V … … … … Reduce -V

… … … … Map-I Reduce-I Map-II Reduce-II Map-III Map-IV Map-V … … … … … … Reduce-III Reduce-V

3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values

 Map  Evaluate  Compute  Reduce  11/2/201521

 Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/201522

 Clicks are good…  Are these two clicks equally “good”?  Non-clicks may have excuses:  Not relevant  Not examined 11/2/201523

2411/2/2015

query URL 1 URL 2 URL 3 URL 4 C1C1 C2C2 C3C3 C4C4 S1S1 S2S2 S3S3 S4S4 Relevance E1E1 E2E2 E3E3 E4E4 Examine Snippet ClickThroughs

S1S1 E1E1 E2E2 C1C1 S2S2 C2C2 … … … SiSi EiEi CiCi the preceding click position before i

 Ultimate goal  Observation: conditional independence

 Likelihood of search instance  From S to R:

 Posterior with  Re-organize by R j ’s How many times d j was clicked How many times d j was not clicked when it is at position (r + d) and the preceding click is on position r

 Exact inference with joint posterior in closed form  Joint posterior factorizes and hence mutually independent  At most M(M+1)/2 + 1 numbers to fully characterize each posterior  Count vector:

 Compute  Count vector for R N4N4 N 4, r, d 1 1

 Map: emit((q,u), idx)  Reduce: construct the count vector

(U1, 0) (U2, 4) (U3, 0) Map (U1, 1) (U3, 0) (U4, 7) Map (U1, 1) (U3, 0) (U4, 0) Map (U1, 0, 1, 1) (U2, 4)(U4, 0, 7)(U3, 0, 0, 0) Reduce

 Setup:  8 weeks data, 8 jobs  Job k takes first k- week data Experiment platform – SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [Chaiken et al, VLDB’08]

 Increasing computation load  more queries, more urls, more impressions  Near-constant elapse time Computation Overload Elapse Time on SCOPE 3 hours Scan 265 terabyte data Full posteriors for 1.15 billion (query, url) pairs

 Behavior targeting  Ad serving based on users’ historical behaviors  Complementary to sponsored Ads and content Ads 11/2/201536

 Goal  Given ads in a certain category, locate qualified users based on users’ past behaviors  Data  User is identified by cookie  Past behavior, profiled as a vector x, includes ad clicks, ad views, page views, search queries, clicks, etc  Challenges:  Scale: e.g., 9TB ad data with 500B entries in Aug'08  Sparse: e.g., the CTR of automotive display ads is 0.05%  Dynamic: i.e., user behavior changes over time. 11/2/201537

 CTR = ClickCnt/ViewCnt  A model to predict expected click count  A model to predict expected view count  Linear Poisson model  MLE on w 11/2/201538

 Learning  Map: Compute and  Reduce: Update  Prediction 11/2/201539

 Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/201540

 Challenges imposed by Web data  Scalability of standard algorithms  Application-driven customized algorithms  Capability to consume huge amount of data outweighs algorithm sophistication  Simple counting is no less powerful than sophisticated algorithms when data is abundant or even infinite  MapReduce: a restricted computation model  Not omnipotent but powerful enough  Things we want to do turn out to be things we can do 11/2/201541

Thank You! 11/2/2015SEWM‘10 Keynote, Chengdu, China42