Step-By-Step Instructions for Miniproject 2

Slides:



Advertisements
Similar presentations
Independent Component Analysis
Advertisements

Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.
Section 2.3 Gauss-Jordan Method for General Systems of Equations
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Computer Vision Lecture 3: Digital Images
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
D. van Alphen1 ECE 455 – Lecture 12 Orthogonal Matrices Singular Value Decomposition (SVD) SVD for Image Compression Recall: If vectors a and b have the.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
SINGULAR VALUE DECOMPOSITION (SVD)
Dimensionality Reduction Motivation I: Data Compression Machine Learning.
QM Spring 2002 Business Statistics Probability Distributions.
Cosine Similarity Item Based Predictions 77B Recommender Systems.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Principal Component Analysis (PCA)
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Jeffrey D. Ullman Stanford University.  Often, our data can be represented by an m-by-n matrix.  And this matrix can be closely approximated by the.
Principal Components Analysis ( PCA)
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
Unsupervised Learning
Dimensionality Reduction and Principle Components Analysis
Linear Algebra Review.
PREDICT 422: Practical Machine Learning
Dimensionality Reduction
Exploring Microarray data
Principal Component Analysis (PCA)
Mean Shift Segmentation
Dimension Reduction via PCA (Principal Component Analysis)
Recommender Systems.
LSI, SVD and Data Management
Singular Value Decomposition
Computer Vision Lecture 3: Digital Images
Advanced Artificial Intelligence
Principal Component Analysis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning Math Essentials Part 2
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Lecture 13: Singular Value Decomposition (SVD)
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Lecture 16. Classification (II): Practical Considerations
Backtracking, Search, Heuristics
Recommender Systems Problem formulation Machine Learning.
Unsupervised Learning
Presentation transcript:

Step-By-Step Instructions for Miniproject 2

Matrix Factorization with Missing Values N Movies K N Y (missing values) UT V K = M Users M “Latent Factors” Goal #1: Learn a Latent Factor Model U & V Goal #2: Visualize & Interpret U & V (mostly V) Step-By-Step Instructions for Miniproject 2

Final Product: Create Something Like This (Your visualization will probably not be as clean as this one, that is OK) You need to create your own visualization (will have different projection of movies/users onto 2-dimensional plane than example above) You need to interpret your dimensions and/or clusters of movies in your projection http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf Step-By-Step Instructions for Miniproject 2

Step-By-Step Instructions for Miniproject 2 Outline Step 1: Learn U & V Step 2: Project U & V down to 2 dimensions Basically SVD in Matlab or Python Step 3: Plot projected U & V Give your own interpretation of the two projected dimensions Step-By-Step Instructions for Miniproject 2

Step-By-Step Instructions for Miniproject 2 Step 1: Learning U & V You don’t have to solve this exact objective. (many off-the-shelf solve something related.) Choice of regularization doesn’t matter too much S = set of indices (i,j) of observed ratings Use off-the-shelf-software And your own implementation Step-By-Step Instructions for Miniproject 2

Off-the-Shelf Software Search for “Collaborative Filtering Matlab” or “Collaborative Filtering Python” or “Collaborative Filtering code” https://cambridgespark.com/content/tutorials/implementing-your-own-recommender-systems-in-Python/index.html https://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/ http://surpriselib.com/ Step-By-Step Instructions for Miniproject 2

Step 1b: Learning U & V (More Advanced) Choice of regularization doesn’t matter too much S = set of indices (i,j) of observed ratings Vector of bias/offset terms One for each user & movie Model the global tendency of a movie’s average rating Model the global tendency of how a user rates on average This keeps U & V more focused on variability between users and movies. Should be an option that you can turn on in many off-the-shelf implementations Step-By-Step Instructions for Miniproject 2

Step 1c: Learning U & V (Even More Advanced) Choice of regularization doesn’t matter too much μ is average of all observations in Y S = set of indices (i,j) of observed ratings Vector of bias/offset terms One for each user & movie Model global bias μ as average over all observed Y Treat a as user-specific deviation from global bias Treat b as movie-specific deviation from global bias Should be an option that you can turn on in many off-the-shelf implementations Step-By-Step Instructions for Miniproject 2

Step-By-Step Instructions for Miniproject 2 Step 1: Interpretation Common K-dimensional representation over users & movies Rating defined by dot product (aka un-normalized cosine similarity): Does our representation make sense? (i.e., is it interpretable?) Need to visualize! But can only (easily) visualize 2-dim points, not K-dim points! or Step-By-Step Instructions for Miniproject 2

Step 2: Projecting U & V to 2 Dimensions Step 2a: (Optional) mean center V: each row of V has zero mean Compute SVD of V: The first two columns of A correspond to best 2-dimensional projection of movies V Orthogonal Orthogonal Diagonal Step-By-Step Instructions for Miniproject 2

Step 2: Projecting U & V to 2 Dimensions Step 2b: Project every movie & user using A1:2 Now each user & movie is represented using a two dimensional point. Visualize and interpret! If you mean centered V, you need to shift U by same amount first Step-By-Step Instructions for Miniproject 2

Step 2: Projecting U & V to 2 Dimensions Step 2c (optional): Do Steps 2a & 2b: Then rescale dimensions: E.g., each row of Ũ has unit variance. Otherwise, visualization might look stretched: Step-By-Step Instructions for Miniproject 2

Step-By-Step Instructions for Miniproject 2 Step 2: Interpretation The top D dimensions of matrix A define a D-dim projection that best preserves the learned movie features V: We want 2-dimensional projection for visualization purposes So we take top 2 dimensions of SVD Now we can visualize movies in 2D plot And see if close-by movies have similarities E.g., horror, action, etc. Minimizes loss of feature representation: Projected representation Preservation Loss of projection Step-By-Step Instructions for Miniproject 2

Step 2: Alternatives & Core Requirements You don’t have to do it the above way Although the above method should always give you something reasonable to visualize Core requirement: Projection should preserves as much of the original features as possible A dot product in the 2-D representation should approximate the dot product in the K-D representation Step-By-Step Instructions for Miniproject 2

Step-By-Step Instructions for Miniproject 2 Step 3: Plot U & V Plotting V is more important: Pick a few movies and plot their projected 2D representation Verify that distances/angles/axes in your plot can be interpreted Can also plot the genres provided: E.g., where is the average horror movie? E.g., compute the average v for all movies that belong to horror genre (Your visualization will probably not be as clean as this one, that is OK) Example: Step-By-Step Instructions for Miniproject 2

Step-By-Step Instructions for Miniproject 2 My Own Example Trained using Step 1c (lambda=10) Stochastic GD SVD of Movie Matrix Project top 2 bases Picked a few popular movies, and plotted them. Then found a few extreme points (e.g., Clockwork Orange). Removed most children’s movies (didn’t seem to project well using 1st two SVD bases – maybe most ratings are by adults). Free Willy Movies Close Together Action Movies Mostly Sci-Fi & Horror Movies More Historical Movies (Jurassic Park excepted) Star Wars Movies Close together Step-By-Step Instructions for Miniproject 2