F ACTORIZATION M ACHINE : MODEL, OPTIMIZATION AND APPLICATIONS Yang LIU Supervisors: Prof. Andrew Yao Prof. Shengyu Zhang 1.

Slides:



Advertisements
Similar presentations
Semi-Stochastic Gradient Descent Peter Richtárik ANC/DTC Seminar, School of Informatics, University of Edinburgh Edinburgh - November 4, 2014.
Advertisements

Neural networks Introduction Fitting neural networks
Regularization David Kauchak CS 451 – Fall 2013.
Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh.
Semi-Stochastic Gradient Descent Methods Jakub Konečný University of Edinburgh BASP Frontiers Workshop January 28, 2014.
Pattern Recognition and Machine Learning
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Recommender Systems Problem formulation Machine Learning.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Unconstrained Optimization Problem
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh SIAM Annual Meeting, Chicago July 7, 2014.
Intelligible Models for Classification and Regression
Collaborative Filtering Matrix Factorization Approach
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Learning with large datasets Machine Learning Large scale machine learning.
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
GAUSSIAN PROCESS FACTORIZATION MACHINES FOR CONTEXT-AWARE RECOMMENDATIONS Trung V. Nguyen, Alexandros Karatzoglou, Linas Baltrunas SIGIR 2014 Presentation:
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Yang Liu Shengyu Zhang The Chinese University of Hong Kong Fast quantum algorithms for Least Squares Regression and Statistic Leverage Scores.
Online Learning for Collaborative Filtering
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.
The problem of overfitting
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Collaborative filtering applied to real- time bidding.
Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Matrix Factorization and Collaborative Filtering
He Xiangnan Research Fellow National University of Singapore
Learning Recommender Systems with Adaptive Regularization
Boosting and Additive Trees (2)
Linear regression project
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Probabilistic Models for Linear Regression
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Collaborative Filtering Matrix Factorization Approach
Logistic Regression & Parallel SGD
Support Vector Machine I
CSE 491/891 Lecture 25 (Mahout).
Softmax Classifier.
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Semi-Supervised Learning
Jia-Bin Huang Virginia Tech
Linear regression with one variable
Recommender Systems Problem formulation Machine Learning.
Logistic Regression Geoff Hulten.
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

F ACTORIZATION M ACHINE : MODEL, OPTIMIZATION AND APPLICATIONS Yang LIU Supervisors: Prof. Andrew Yao Prof. Shengyu Zhang 1

O UTLINE Factorization machine (FM) A generic predictor Auto feature interaction Learning algorithm Stochastic gradient descent (SGD) … Applications Recommendation systems Regression and classification … 2

D OU B AN MOVIE 3

P REDICTION T ASK e.g. Alice rates Titanic 5 at time 13 4 ? ?

P REDICTION T ASK 5

L INEAR M ODEL – F EATURE E NGINEERING Linear SVM Logistic Regression 6

F ACTORIZATION MODEL 7 Interaction between variables

W W I NTERACTION MATRIX 8

W W 9

W W ? ? 10

V V VTVT VTVT k W W I NTERACTION MATRIX = 11

= I NTERACTION MATRIX V V VTVT VTVT W W k 12

= I NTERACTION MATRIX V V VTVT VTVT W W 13

= I NTERACTION MATRIX V V VTVT VTVT W W 14

= I NTERACTION MATRIX V V VTVT VTVT W W Factorization 15

= I NTERACTION MATRIX V V VTVT VTVT W W Factorization Machine 16

FM: PROPERTIES 17

O PTIMIZATION T ARGET 18

S TOCHASTIC G RADIENT D ESCENT (SGD) 19

A PPLICATIONS EMI Music Hackathon 2012 Song recommendation Given: Historical ratings User demographics # features: 51K # items in training: 188K 20 ? ?

R ESULTS FOR EMI MUSIC FM: Root Mean Square Error (RMSE) Target value [0,100] The best (SVD++) is Details Regression Converges in 100 iterations Time for each iteration: < 1 s Win 7, Intel Core 2 Duo CPU 2.53GHz, 6G RAM 21

O THER APPLICATIONS Ads CTR prediction (KDD Cup 2012) Features User_info, Ad_info, Query_info, Position, etc. # features: 7.2M # items in training: 160M Classification Performance: AUC: , the best (SVM) is

O THER APPLICATIONS HiCloud App Recommendation Features App_info, Smartphone model, installed apps, etc. # features: 9.5M # items in training: 16M Classification Performance: Top 5: 8%, Top 10: 18%, Top 20: 32%; AUC:

S UMMARY FM: a general predictor Works under sparsity Linear computation complexity Estimates interactions automatically Works with any real valued feature vector 24 T HANKS !