Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.

Slides:

Advertisements

Similar presentations

Machine Learning and Data Mining Linear regression

Advertisements

Regularization David Kauchak CS 451 – Fall 2013.

Semi-Stochastic Gradient Descent Methods Jakub Konečný University of Edinburgh BASP Frontiers Workshop January 28, 2014.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimensionality Reduction PCA -- SVD

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Radial Basis Functions

Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.

Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

Autoencoders Mostafa Heidarpour

Today Wrap up of probability Vectors, Matrices. Calculus

Evaluating Performance for Data Mining Techniques

Collaborative Filtering Matrix Factorization Approach

Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Summarized by Soo-Jin Kim

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

This week: overview on pattern recognition (related to machine learning)

Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.

Non Negative Matrix Factorization

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Online Learning for Collaborative Filtering

Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

SINGULAR VALUE DECOMPOSITION (SVD)

Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.

Investigation of Various Factorization Methods for Large Recommender Systems G. Takács, I. Pilászy, B. Németh and D. Tikk 10th International.

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.

ADALINE (ADAptive LInear NEuron) Network and

Lecture 5 Instructor: Max Welling Squared Error Matrix Factorization.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.

Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.

Matrix Factorization and its applications By Zachary 16 th Nov, 2010.

Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Learning with Green’s Function with Application to Semi-Supervised Learning and Recommender System ----Chris Ding, R. Jin, T. Li and H.D. Simon. A Learning.

DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Jeffrey D. Ullman Stanford University.  Often, our data can be represented by an m-by-n matrix.  And this matrix can be closely approximated by the.

Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.

Singular Value Decomposition, and Application to Recommender Systems

LECTURE 11: Advanced Discriminant Analysis

Lecture 8:Eigenfaces and Shared Features

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Classification with Perceptrons Reading:

Image2Vec: Learning word and image representations for reasoning

Machine Learning Dimensionality Reduction

Step-By-Step Instructions for Miniproject 2

Advanced Artificial Intelligence

Collaborative Filtering Matrix Factorization Approach

Word Embedding Word2Vec.

CSE 491/891 Lecture 25 (Mahout).

Unsupervised Learning and Clustering

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Facultad de Ingeniería, Centro de Cálculo

Presentation transcript:

Yue Xu Shu Zhang

 A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this problem can be achieved by several high dimensional data algorithms.

 Dataset MovieLens (A movie recommendation website) 100,000 ratings, 943 users, 1682 movies collected between September, 1997 to April, 1998  Information Ratings range from 1 to 5 as integers from most dislike to most like. Data has been cleared up, each user has rated at least 20 movies. Users information are given (age, gender, occupation) Movies simple genres are given.

 Missing Data Data can be transformed to a 1682*943 matrix. Not all users have rated all of the movies.  Objectives Kmeans Clustering & Sparse SVD algorithms to find out the missing ratings. Use Root Mean Standard Error (RMSE) to evaluate the result. Sparse Rating Matrix Stochastic Gradient Descent

 Data Preprocessing Randomly choose 90% of the data as training data 10% of the data as test data  Evaluation Compute the full rating matrix based on training data, then calculate RMSE on test data to evaluate the results.

 Movie Data Movies have some simple genres in the raw data. 19 genres, 1 = yes; 0 = no 1682*19 matrix  Clustering k = 4 (clusters) movie id | unknown | Action | Adventure | Animation |...

 Get missing ratings by clustering 1. Add a column of movie cluster number to the user -movie, rating matrix. By locating movie IDs, the 4th column shows which cluster it is in. 2. Form a 943* 1682matrix by user ID and movie ID. Zeros are the missing ratings.

 Get missing ratings by clustering 3. Whenever there are missing ratings (zero) in the matrix, locate the userID and cluster number 4. calculate the average rating of this cluster from this user. Use this value instead of zero in this matrix. RMSE = 1.119

 This approach is an extension of SVD method. It models both users and movies by giving them coordinates in a low dimensional feature space.  Set all missing values of rating matrix( 943*1682) as zero for modeling.  R = U∑V T = MovieFactor * UserFactor T, MovieFactor = U, UserFactor T = ∑V T  So that we find a low-rank movie factor and user factor based on the huge sparse rating matrix. Each rating is a inner product of movie vector and user vector.  Therefore if we need to find out missing values of rating matrix, we need to calculate movie vector and user vector matrix.

 Calculate MovieFactor (m) & UserFactor (u) Minimize sum of squared errors Regularization, to solve overfitting.  Overfitting A statistical model describes random error or noise instead of the underlying relationship. μ = 0.1 to μ = 0.01 to find the minimum SSE μ = 0.03 is the optimum! Error Length

 Regularization parameter chosen Lamda = 0; No regularization considered Training set error decrease a lot with increasing k. But test set error increases. Lamda = 0.2; Both training set and test set error decrease a lot with increasing k. So we apply lamda = 0.2!

 Calculate MovieFactor (m) & UserFactor (u) 1. Initial value using SVD of r m = U (left singular vector) u T = ∑V T ( the product of singular value and the right singular vector) 2. Stochastic Gradient Descent e ij = 2 * (r ij – m i *u j ) m i = m i + μ ( e ij *u j – lamda*m i ), m i is the ith row of m u j = u j + μ ( e ij *m i – lamda*u j ), u j is the jth column of u t μ = 0.03 lamda = Iterate to get the final value.

 Get Missing Values r = m*u T  Evaluation RMSE = 0.95 after 40 iterations

 Evaluation by RMSE - K-means Clustering = 1.15 There are only 4 clusters, a user would have same ratings for all movies in a cluster. Ratings are still missing when a user has not seen any movie in a cluster. - Stochastic Gradient Descent = 0.95, much better one! considers both user and movie factors

 Computed the missing values for a very sparse matrix by two dimensional reduction algorithms.  Stochastic Gradient Descent is much better than k-means clustering. A RMSE of 0.95 is achieved.  Other methods may work, such as Bayes Network.  This result can be useful to recommend movies to a user if he/she already rated some other movies.

Thank You For Listening! Any Questions?