Matrix Factorization Reporter : Sun Yuanshuai

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
Pattern Recognition and Machine Learning
Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh.
Semi-Stochastic Gradient Descent Methods Jakub Konečný University of Edinburgh BASP Frontiers Workshop January 28, 2014.
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
Separating Hyperplanes
Distributed Optimization with Arbitrary Local Solvers
Linear Systems of Equations
The loss function, the normal equation,
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Planning under Uncertainty
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
ADDITIONAL ANALYSIS TECHNIQUES LEARNING GOALS REVIEW LINEARITY The property has two equivalent definitions. We show and application of homogeneity APPLY.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh SIAM Annual Meeting, Chicago July 7, 2014.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Radial Basis Function Networks
Collaborative Filtering Matrix Factorization Approach
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Approximating the Algebraic Solution of Systems of Interval Linear Equations with Use of Neural Networks Nguyen Hoang Viet Michal Kleiber Institute of.
Learning with large datasets Machine Learning Large scale machine learning.
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
Non Negative Matrix Factorization
Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.
Using Partitioning in the Numerical Treatment of ODE Systems with Applications to Atmospheric Modelling Zahari Zlatev National Environmental Research Institute.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Managerial Decision Making and Problem Solving
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Online Learning for Collaborative Filtering
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.
Geometry of Shape Manifolds
Gaussian Processes Li An Li An
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Logistic Regression William Cohen.
Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.
GeoMF: Joint Geographical Modeling and Matrix Factorization for Point-of-Interest Recommendation Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, EnhongChen,
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
1 Ch.19 Divide and Conquer. 2 BIRD’S-EYE VIEW Divide and conquer algorithms Decompose a problem instance into several smaller independent instances May.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
ILAS Threshold partitioning for iterative aggregation – disaggregation method Ivana Pultarova Czech Technical University in Prague, Czech Republic.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Collaborative Filtering for Streaming data
The Gradient Descent Algorithm
Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides
Learning Recommender Systems with Adaptive Regularization
A Fast Trust Region Newton Method for Logistic Regression
A Simple Artificial Neuron
Classification with Perceptrons Reading:
Machine Learning Basics
Data Mining Practical Machine Learning Tools and Techniques
Collaborative Filtering Matrix Factorization Approach
Logistic Regression & Parallel SGD
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
The Updated experiment based on LSTM
Section 3: Second Order Methods
ADDITIONAL ANALYSIS TECHNIQUES
Presentation transcript:

Matrix Factorization Reporter : Sun Yuanshuai

Content 1 1 MF Introduction 2 2 Application Area 3 3 My Work 4 4 Difficulty in my work

MF Introduction Matrix factorization (abbr. MF), just as the name suggests, decomposes a big matrix into the multiplication form of several small matrix. It defines mathematically as follows, We here assume the target matrix, the factor matrix and, where K << min (m, n), so it is

MF Introduction We quantify the quality of the approximation with the Euclidean distance, so we can get the objective function as follows, Where i.e. is the predict value.

MF Introduction 1. Alternating Descent Method This method only works, when the loss function implies with Euclidean distance. So, we can get The same to.

MF Introduction 2. Gradient Descent Method The update rules of U defines as follows, where The same to.

MF Introduction Gradient Algorithm Stochastic Gradient Algorithm

MF Introduction Online Algorithm Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender Systems

MF Introduction Loss Function We update the factor V for reducing the objective function f with the conventional gradient descendent, as follows, Here we set, so it is reachable, the same to factor matrix U. go

MF Introduction Here, we go on based on an assumption that SSGD can converge to a set of stationary points.

MF Introduction The idea of DSGD is to specialize the SSGD algorithm, choosing the strata with special layout such that SGD can be run on each stratum in a distributed manner. We see that there exists dependence between the current solution and the last one gotten by iteration operation, i.e. the last solution has to be known before the current can be computed. To solve the problem, we propose the notion interchangeablility :

We can get the theorem from definition about interchangeability, as follows: From the theorem, we can compute the train matrix which is block-diagonal in parallel, i.e. can be computed independ -ently. MF Introduction

We can compute the block-diagonal matrix in parallel. Our target, however, is to make the general matrix decomposition parallelism. How can we make it? Now we can stratify the input matrix, such that each stratum meets the interchangeable condition. Assume we cut input matrix into 3*3 blocks, as follows: MF Introduction

Application Area Any area where dyadic data can be generated. Dyadic Data : In general, dyadic data are the measurements on dyads, which are pairs of two elements coming from two sets. =(userId, itemId, rating) customer product buy

My Work

×= Left Matrix Right Matrix ×= ×= + + ||

Difficulty in my work DataSet I use a total of 5 jobs. But the job can’t work. Just because the data generated in the procedure is too big which is 6000GB, Analyzed as follows, the left matrix is 300 thousand * 250 thousand approximately, the right matrix F is 250 thousand * 10 approximately, so the additional data generated is 250K*10*250K*8=6000G, where the 8 implies the number of bytes taken to store a double.

Difficulty in my work The techniques I have used: Combiner Compress

THANK YOU FOR YOUR TIME!