EMIS 8381 – Spring 2012 1 Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Slides:

Advertisements

Similar presentations

Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.

Advertisements

Item Based Collaborative Filtering Recommendation Algorithms

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.

Component Analysis (Review)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimensionality Reduction PCA -- SVD

COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.

Bayesian belief networks 2. PCA and ICA

Recommender systems Ram Akella November 26 th 2008.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

Radial Basis Function (RBF) Networks

Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.

Summarized by Soo-Jin Kim

Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.

NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.

This week: overview on pattern recognition (related to machine learning)

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Authors: Rosario Sotomayor, Joe Carthy and John Dunnion Speaker: Rosario Sotomayor Intelligent Information Retrieval Group (IIRG) UCD School of Computer.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl

SINGULAR VALUE DECOMPOSITION (SVD)

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.

Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

2D-LDA: A statistical linear discriminant analysis for image matrix

DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Principal Component Analysis (PCA)

Statistics 202: Statistical Aspects of Data Mining

LECTURE 11: Advanced Discriminant Analysis

School of Computer Science & Engineering

Lecture 8:Eigenfaces and Shared Features

Machine Learning With Python Sreejith.S Jaganadh.G.

Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.

PCA vs ICA vs LDA.

Advanced Artificial Intelligence

Q4 : How does Netflix recommend movies?

Collaborative Filtering Matrix Factorization Approach

Descriptive Statistics vs. Factor Analysis

Word Embedding Word2Vec.

Parallelization of Sparse Coding & Dictionary Learning

Biointelligence Laboratory, Seoul National University

Feature Selection Methods

Recommendation Systems

Presentation transcript:

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381

EMIS 8381 – Spring Elevator Pitch: Situation: Problem: You have finished your NLP homework, have some downtime but don’t know what movie to watch… Solution: Netflix’s collaborative filter (cf) based movie recommendation engine that utilizes NLP methods. Relevancy: Customer: –The algorithm makes more relevant movie recommendations than nonlinear methods –Better movie choices Company: –Decreased resource utilization (program complexity) –Increased customer retention.

EMIS 8381 – Spring 2012 Agenda: Netflix - Business of movie recommendation Collaborative filtering Linear Aspects Nonlinear Aspects Performance Improvement 3

EMIS 8381 – Spring 2012 Netflix - Business of movie recommendation 4 - Movie recommendation system: System that seeks to predict or anticipate a user’s preference for a film that has not been viewed by utilizing an algorithm that takes into consideration the collaborative nature of the website (user ratings via collaborative based filtering) -Utilizes linear, nonlinear and statistical methods

EMIS 8381 – Spring 2012 Collaborative filtering 5 Collaborative filtering is a process to make automated recommendations based upon crowd sourced information such as preference, taste and patterns. Two advantages: Wisdom of crowds Large Numbers Netflix collects four pieces of information from its users: User Movie Date of grade Grade Group unity

EMIS 8381 – Spring 2012 Linear Approach: 6 Computes a prediction for an item (i) using the weighted sum of items similar to i. Corresponding similarity s i,j Captures how users rate similar items

EMIS 8381 – Spring 2012 Linear Approach Cont.: 7 Traditionally: NxM modeled by NxC N= users M= movies Rows= user feature columns are movie feature vectors Low rank approximations can be found Data Sets are sparse Result: difficult non-convex problem approximation of gradient is difficult to approximate Need: nonlinear aspects.

EMIS 8381 – Spring 2012 Nonlinear approaches to movie recommendation: Nonlinear Principal Component Analysis (h-NLPCA) PCA – Principal Component Analysis –Well established data analysis technique –Transformation of recorded observations to produce independent score variables –Captures linear relationships well –Not sufficient to capture nonlinear patterns Introducing ANN: Artificial Neural Network (models defining a function f: X-> Y) Function approximated. 8 Nonlinear Component, Associative network

EMIS 8381 – Spring 2012 Nonlinear Approach: NLPCA ANN allows for mappings onto a reduced dimensional space. Relies on the SVD: Singular value decomposition (factorization of matrix – first step for CF) Example: 9 Variable of principal components Uniquely determined single value variances Conjugate transpose Hidden layer enables the function to perform nonlinear mapping functions from extraction : X -> Z to generate: Z -> ^X Associative network performs identify mapping, reducing the squared reconstruction error: ½||x^-x||^2

EMIS 8381 – Spring 2012 Nonlinear Approach: NLPCA Cont. Nonlinear principal component analysis provides: –Optimal nonlinear subspace spanned by components (different groups formed) –Constrains nonlinear hierarchical order of linear components –A minimum error between groups using conjugate gradient descent algorithm –N components explain the maximal variance –Tries to search for a k-dimensional subspace of minimal mean square error 10 Application to class: Is used to find a local minimum (not global) Works when function is quadratic (twice differentiable) = step size Update iteration Pi = nonlinear

EMIS 8381 – Spring 2012 Nonlinear Approach: NLPCA Algorithm Step 1: Data representation –Figure out missing values in original user item matrix (we know how to do this) Step 2: Low rank representation –Use conjugate gradient descent algorithm –Hierarchical error minimized Step 3: Neighborhood Formation –Calculate similarity between each user and his closest neighbors. –A = reconstructed matrix, r ij = rating of user u i on item i j. –Summations of l are calculated when both users (ua and ui) have rated a movie 11

EMIS 8381 – Spring 2012 Nonlinear Approach: NLPCA Algorithm Cont. Step 4: Prediction Generation –Matching of neighborhood to user 12 Neighborhood formation User ratings Original item average

EMIS 8381 – Spring 2012 Nonlinear effectiveness 13 Dimensions Accuracy Nonlinear can account for more variance. True accuracy:.7843

EMIS 8381 – Spring 2012 Nonlinear effectiveness: Conclusions Faster convergence Less resources For small data sets, i.e. not many film ratings, nonlinear provides better suggestions faster: More difficult to implement from a programming stand point 14

EMIS 8381 – Spring 2012 Questions? 15