Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.

Slides:



Advertisements
Similar presentations
Quantitative Techniques Lecture 1: Economic data 30 September 2004.
Advertisements

BiG-Align: Fast Bipartite Graph Alignment
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Dimensionality Reduction PCA -- SVD
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Planning under Uncertainty
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo.
PolyFlix Recommendation System Trevor Koritza Gabriel De La Calzada.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 20 Solution of Linear System of Equations - Iterative Methods.
How to use Matrix Market matrices in Matlab The Matrix Market is an interesting collection of matrices from a variety of applications.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.
Generalized Communication System: Error Control Coding Occurs In Right Column. 6.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
M ULTIFRAME P OINT C ORRESPONDENCE By Naseem Mahajna & Muhammad Zoabi.
MATH 685/CSI 700 Lecture Notes Lecture 1. Intro to Scientific Computing.
Soft Computing Lecture 18 Foundations of genetic algorithms (GA). Using of GA.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
F ACTORIZATION M ACHINE : MODEL, OPTIMIZATION AND APPLICATIONS Yang LIU Supervisors: Prof. Andrew Yao Prof. Shengyu Zhang 1.
Report #1 By Team: Green Ensemble AusDM 2009 ENSEMBLE Analytical Challenge: Rules, Objectives, and Our Approach.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.
Matching Users and Items Across Domains to Improve the Recommendation Quality Created by: Chung-Yi Li, Shou-De Lin Presented by: I Gde Dharma Nugraha 1.
A n = c 1 a n-1 + c2an-2 + … + c d a n-d d= degree and t= the number of training data (notes) The assumption is that the notes in the piece are generated.
Temporal Diversity in Recommender Systems Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain SIGIR 2010 April 6, 2011 Hyunwoo Kim.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
Lecture 5 Instructor: Max Welling Squared Error Matrix Factorization.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Genomic Data Privacy Protection Using Compressive Sensing 1 »University of Oklahoma -Tulsa Aminmohammad Roozgard, Nafise Barzigar, Dr. Pramode Verma, Dr.
FISM: Factored Item Similarity Models for Top-N Recommender Systems
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
CY3A2 System identification Input signals Signals need to be realisable, and excite the typical modes of the system. Ideally the signal should be persistent.
Tutorial I: Missing Value Analysis
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD
Matrix Factorization and Collaborative Filtering
TJTS505: Master's Thesis Seminar
Recommender Systems Session I
Extreme Learning Machine
CSE 4705 Artificial Intelligence
Outlier Processing via L1-Principal Subspaces
Machine Learning With Python Sreejith.S Jaganadh.G.
Summary Presented by : Aishwarya Deep Shukla
Computer Architecture Experimental Design
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Generalization ..
What is Pattern Recognition?
Differential Privacy in Practice
Solving Quadratic Equations using Square Roots
Ensembles.
Project P06441: See Through Fog Imaging
The European Conference on e-learing ,2017/10
Cancer Clustering Phenomenon
Introduction to Stream Computing and Reservoir Sampling
Analytics – Statistical Approaches
Collaborative Filtering Non-negative Matrix Factorization
Presentation transcript:

Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang

What is Machine Learning? A scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases, to extract patterns or trends in the data Studies how to automatically learn to make accurate predictions based on past observations

Preliminary Research Learned about data mining/machine learning topics Learned about data mining/machine learning topics Informally reviewed Candes and Recht’sExact MatrixCompletion via ConvexOptimization Informally reviewed Candes and Recht’sExact MatrixCompletion via ConvexOptimization Familiarized ourselves with MATLAB Familiarized ourselves with MATLAB

Netflix Competition Competition to improve movie recommendation system Competition to improve movie recommendation system Released training dataset Released training dataset – 100,480,507 ratings from 480,189 users and 17,770 movies; Competition is over, but dataset is still available for research purposes Competition is over, but dataset is still available for research purposes

Movie Ratings Example

Research Problem Hypothesis Hypothesis: – Rounded user ratings cause an innate amount of error in recommendation systems – We’re not advocating using real numbers, just studying how the numbers affect the systems.

Matrix Completion What does matrix completion have to do with our problem? What does matrix completion have to do with our problem? – Users rate movies they have seen, but obviously cannot rate movies they have not – A matrix completion algorithm estimates how users would rate movies that they have never seen based on movies they have rated

Matrix Completion SparseMatrix Full Matrix

Matrix Completion A (good) matrix completion algorithm can take a sparse matrix with a small number of entries and return a completed matrix with high probability of successful recreation of the original matrix In our case, 6% of entries

Our Solution M 0 = Create initial matrix from Netflix training dataset M 0 = Create initial matrix from Netflix training dataset

Our Solution

M 0 = Create initial matrix from Netflix training dataset M 0 = Create initial matrix from Netflix training dataset M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’) M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’)

Our Solution

M 0 = Create initial matrix from Netflix training dataset M 0 = Create initial matrix from Netflix training dataset M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’) M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’) M s = Randomly remove entries from M. M s = Randomly remove entries from M.

Our Solution

M 0 = Create initial matrix from Netflix training dataset M 0 = Create initial matrix from Netflix training dataset M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’) M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’) M s = Randomly remove entries from M. M s = Randomly remove entries from M. M r = Round the entries in M s. M r = Round the entries in M s.

Our Solution

M 0 = Create initial matrix from Netflix training dataset M 0 = Create initial matrix from Netflix training dataset M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’) M = Add ‘noise’ to M 0 (i.e. a rating of ‘3’ becomes ‘3.212’) M s = Randomly remove entries from M. M s = Randomly remove entries from M. M r = Round the entries in M s. M r = Round the entries in M s. M R = Round the entries in M. M R = Round the entries in M.

Our Solution

Run matrix completion algorithm on M s and M r to create M s~ andM r~ Run matrix completion algorithm on M s and M r to create M s~ andM r~

Our Solution Sparse matrix Completed matrix

Our Solution Completed matrix Original Matrix Compare

Our Solution Use Root Mean Squared Error (RMSE) to calculate the amount of error between the completed matrices and their already complete counterparts Use Root Mean Squared Error (RMSE) to calculate the amount of error between the completed matrices and their already complete counterparts m x n 2

Experiment – Testing Rank Fixed matrices to 1000 x1000 Fixed matrices to 1000 x1000 Sparsity was set to 10% Sparsity was set to 10% Iterated ranks 2 to 20 by 2 Iterated ranks 2 to 20 by 2 Ran matrix completion algorithm for each rank 20 times Ran matrix completion algorithm for each rank 20 times

Results - Rank

So what? We determined that there is a cap to how much better Netflix’s recommendation system can get (as well as otherrecommendation systems that use rounded user ratings)

So what? This means that until the Matrix Completion technique is improved upon, that Netflix’s recommendation can’t get much better!