Distributed Nonnegative Matrix Factorization for Web- Scale Dyadic Data Analysis on MapReduce Challenge : the scalability of available tools Definition.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Chapter 5 Multiple Linear Regression
1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.
Seunghwa Kang David A. Bader Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System.
Semi-Stochastic Gradient Descent Methods Jakub Konečný University of Edinburgh BASP Frontiers Workshop January 28, 2014.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.
Dimensionality Reduction PCA -- SVD
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Dimensional reduction, PCA
Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min.
Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
© John M. Abowd 2005, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2005.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Review of Lecture Two Linear Regression Normal Equation
Hypothesis Testing in Linear Regression Analysis
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Summarized by Soo-Jin Kim
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Solving Scalar Linear Systems Iterative approach Lecture 15 MA/CS 471 Fall 2003.
Online Learning for Matrix Factorization and Sparse Coding
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Yaomin Jin Design of Experiments Morris Method.
Targil 6 Notes This week: –Linear time Sort – continue: Radix Sort Some Cormen Questions –Sparse Matrix representation & usage. Bucket sort Counting sort.
Online Learning for Collaborative Filtering
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture Gagan Agrawal Department of Computer and Information Sciences Ohio.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON.
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
 The need for parallelization  Challenges towards effective parallelization  A multilevel parallelization framework for BEM: A compute intensive application.
CELLULAR MANUFACTURING. Definition Objectives of Cellular Manufacturing  To reduce WIP inventory  To shorten manufacturing lead times  To simplify.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Ultra-high dimensional feature selection Yun Li
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Optimizing the Performance of Sparse Matrix-Vector Multiplication
Hidden Markov Models BMI/CS 576
Analysis of Sparse Convolutional Neural Networks
Sathya Ronak Alisha Zach Devin Josh
Author: Vikas Sindhwani and Amol Ghoting Presenter: Jinze Li
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Machine Learning Basics
"Developing an Efficient Sparse Matrix Framework Targeting SSI Applications" Diego Rivera and David Kaeli The Center for Subsurface Sensing and Imaging.
CSE8380 Parallel and Distributed Processing Presentation
Scaling up Link Prediction with Ensembles
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Presentation transcript:

Distributed Nonnegative Matrix Factorization for Web- Scale Dyadic Data Analysis on MapReduce Challenge : the scalability of available tools Definition : In general, dyadic data are the measurements on dyads, which are pairs of two elements coming from two sets. Feature : High-dimensional, Sparse, Nonnegative, Dynamic.

What we do In this paper, by observing a variety of Web dyadic data that conforms to different probabilistic distributions, we put forward a probabilistic NMF framework, which not only encompasses the two classic GNMF and PNMF, but also presents the Exponential NMF (ENMF) for modeling Web lifetime dyadic data. Furthermore, we scale up the NMF to arbitrarily large matrices on MapReduce clusters.

GNMF maximizing the likelihood of observing A w.r.t. W and H under the i.i.d. assumption is equivalent to minimizing

PNMF then maximizing the likelihood of observing A becomes to minimizing

ENMF maximizing the likelihood of observing A w.r.t. W and H becomes to minimizing

( a ) just like the cuda programming model, and (b) is map-reduce programming. (a) puts the full row or column in the shared memory, and goes on computing with multi-core GPU. So if we have a very long column that can’t put in the shared memory, (a) can’t work efficiently, because the access overhead becomes large.

The updating formula for H (Eqn. 1) is composed of three components:

Then computing Y=CH

Experiment The experiment is based on GNMF. Sandbox Experiment This is designed for a clear understanding the algorithm but not for showcasing the scalability. And all the reported time is the time taken just by one iteration. Experiment on real Web data The experiment is for investigating the effectiveness of this approach, with a particular focus on its scalability, because a method that does not scale well will be of limited utility in practice.

Sandbox Experiment We generate the sample matrices which has m rows and n columns. And the m = pow(2,17), n=pow(2, 16), sparsity=pow(2, -10) and sparsity = pow(2, -7).

Performance The time taken for one iteration needed more and more as the number of nonzero cells in A increase.

Performance The time taken increases as the value of k is added. And the slope for sparsity= pow(2,-10) is much smaller than that for sparsity= pow(2,-7). Both is smaller than 1.

Performance The ideal speedup is along the diagonal, which upper-bounds the practical speedup because of the Amdahl’s Law. On the matrix with = pow(2, -7), the speedup is nearly 5 when 8 workers are enabled. The dominant factor becomes extra overhead such as shuffle, as the sparsity decreases.

Experiment on real Web data We record a UID-by-website count matrix involving 17.8 million distinct UIDs and 7.24 million distinct websites, which is training set. For effective test, we take the top-1000 UIDs that have the largest number of associated websites, and this gives us distinct (UID, website) pairs. To measure the effectiveness, we randomly hold out a visited website for each UID, and mix it with another 99 un-visited sites. The 100 websites then constitute the test case for that UID. The goal is to check the rank of the holdout visited website among the 100 websites for each UID, and the overall effectiveness s measured by the average rank across the 1000 test cases, the smaller the better.

Scalability On the one hand, it shows that the elapse time increases linearly with the number of iterations, but on the other hand, we note that the average time per iteration becomes smaller when more iterations are executed in one job.

Scalability It reveals the linearity between the elapse time and the dimensionality k.

Scalability It lists how the algorithm scales with increasingly larger data sampled from increasingly longer time periods.

Conclusion