Matrix Factorization.

Slides:



Advertisements
Similar presentations
Overview of this week Debugging tips for ML algorithms
Advertisements

Rocchio’s Algorithm 1. Motivation Naïve Bayes is unusual as a learner: – Only one pass through data – Order doesn’t matter 2.
Introduction to Neural Networks Computing
Latent Semantic Analysis
SCALING SGD to Big dATA & Huge Models
1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Learning Techniques for Information Retrieval Perceptron algorithm Least mean.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Learning Techniques for Information Retrieval We cover 1.Perceptron algorithm 2.Least mean square algorithm 3.Chapter 5.2 User relevance feedback (pp )
HCC class lecture 14 comments John Canny 3/9/05. Administrivia.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
Collaborative Filtering Matrix Factorization Approach
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.
Non Negative Matrix Factorization
Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
SINGULAR VALUE DECOMPOSITION (SVD)
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
ISchool, Cloud Computing Class Talk, Oct 6 th Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective Tamer Elsayed,
Techniques for Dimensionality Reduction
Other Map-Reduce (ish) Frameworks: Spark William Cohen 1.
Dimensionality Reduction
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
Matrix Factorization 1. Recovering latent factors in a matrix m columns v11 … …… vij … vnm n rows 2.
Collaborative filtering applied to real- time bidding.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Image taken from: slideshare
Big Data Infrastructure
Matrix Factorization and Collaborative Filtering
He Xiangnan Research Fellow National University of Singapore
Dimensionality Reduction and Principle Components Analysis
Large Graph Mining: Power Tools and a Practitioner’s guide
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Monte Carlo simulation
Near Duplicate Detection
Statistics – Chapter 1 Data Collection
Alternating Series; Absolute and Conditional Convergence
Techniques for Dimensionality Reduction
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Step-By-Step Instructions for Miniproject 2
Advanced Artificial Intelligence
Collaborative Filtering Matrix Factorization Approach
Discovering Functional Communities in Social Media
Test the series for convergence or divergence. {image}
Logistic Regression & Parallel SGD
HCC class lecture 13 comments
Test the series for convergence or divergence. {image}
HPML Conference, Lyon, Sept 2018
Lecture 7 – Algorithmic Approaches
Topic models for corpora and for graphs
* * Joint work with Michal Aharon Freddy Bruckstein Michael Elad
Neuro-Computing Lecture 2 Single-Layer Perceptrons
From Heather’s blog:
Word embeddings (continued)
Homework 2 Let us simulate the savings experiment of Kojima et al. (2004) assuming that the learner models the hidden state of the world with a 3x1 vector.
Determine whether the sequence converges or diverges. {image}
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Department of Computer Science Ben-Gurion University of the Negev
Non-Negative Matrix Factorization
Chi Square Test of Homogeneity
Latent Semantic Analysis
Presentation transcript:

Matrix Factorization

Recovering latent factors in a matrix m columns v11 … vij n rows … vnm

Recovering latent factors in a matrix K * m ~ x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij n * K … vnm

What is this for? ~ K * m n * K x1 y1 x2 y2 .. … xn yn vij a1 a2 .. … am b1 b2 bm v11 … vij n * K … vnm

MF for collaborative filtering

What is collaborative filtering?

What is collaborative filtering?

What is collaborative filtering?

What is collaborative filtering?

Recovering latent factors in a matrix m movies v11 … vij n users … vnm V[i,j] = user i’s rating of movie j

Recovering latent factors in a matrix m movies m movies ~ x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij n users … vnm V[i,j] = user i’s rating of movie j

MF for image modeling

V[i,j] = pixel j in image i MF for images PC1 10,000 pixels 1000 * 10,000,00 2 prototypes ~ x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij 1000 images … vnm PC2 V[i,j] = pixel j in image i

MF for modeling text

The Neatest Little Guide to Stock Market Investing Investing For Dummies, 4th Edition The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns The Little Book of Value Investing Value Investing: From Graham to Buffett and Beyond Rich Dad’s Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not! Investing in Real Estate, 5th Edition Stock Investing For Dummies Rich Dad’s Advisors: The ABC’s of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/ https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/ https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

Recovering latent factors in a matrix m terms doc term matrix ~ x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij n documents … vnm V[i,j] = TFIDF score of term j in doc i

Investing for real estate Rich Dad’s Advisor’s: The ABCs of Real Estate Investment …

The little book of common sense investing: … Neatest Little Guide to Stock Market Investing

MF is like clustering

indicators for r clusters k-means as MF indicators for r clusters cluster means original data set ~ M 1 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij Z X n examples … vnm

How do you do it? ~ K * m n * K x1 y1 x2 y2 .. … xn yn vij a1 a2 .. … am b1 b2 bm v11 … vij n * K … vnm

KDD 2011 talk pilfered from  …..

Recovering latent factors in a matrix m movies m movies ~ H x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij W V n users … vnm V[i,j] = user i’s rating of movie j

Matrix factorization as SGD step size why does this work?

Matrix factorization as SGD - why does this work? Here’s the key claim:

Checking the claim Think for SGD for logistic regression LR loss = compare y and ŷ = dot(w,x) similar but now update w (user weights) and x (movie weight)

What loss functions are possible? N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies “generalized” KL-divergence

What loss functions are possible?

What loss functions are possible?

ALS = alternating least squares

KDD 2011 talk pilfered from  …..

Similar to McDonnell et al with perceptron learning

Slow convergence…..

More detail…. Randomly permute rows/cols of matrix Chop V,W,H into blocks of size d x d m/d blocks in W, n/d blocks in H Group the data: Pick a set of blocks with no overlapping rows or columns (a stratum) Repeat until all blocks in V are covered Train the SGD Process strata in series Process blocks within a stratum in parallel

More detail…. Z was V

More detail…. Initialize W,H randomly not at zero  Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch” Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k Use “bold driver” to set step size: increase step size when loss decreases (in an epoch) decrease step size when loss increases Implemented in Hadoop and R/Snowfall

Wall Clock Time 8 nodes, 64 cores, R/snow

Number of Epochs

Varying rank 100 epochs for all

Hadoop process setup time starts to dominate Hadoop scalability Hadoop process setup time starts to dominate

Hadoop scalability