Collaborative Filtering for Streaming data

Collaborative Filtering for Streaming data
颜荣圻

Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 2 2

Introduction Movie Rating 4

Introduction Netflix Prize 5

Introduction Recommendtion 6

How to fill in the blanks?
Introduction Collaborative Filtering What do I recommend? = How to fill in the blanks? 7

Introduction Goal Predict movies of interest to a user based on movies watched by user and others Collectively filter the ratings of a large group of users Data: a stream of incoming ratings 8

m: number of rows(users) n: number of columns(items)
Matrix Factorization Model: Matrix Completion Problem m: number of rows(users) n: number of columns(items) R: incomplement matrix of ratings Rij : the rating user i assigns to item j 10

Matrix Factorization V k UT
Approach: Low-rank Matrix Factorization Find a low-rank matrix factorization of the ratings matrix U=[u1,u2,... ui ... um] V=[v1,v2,... vj ... vn] k UT V 11

Matrix Factorization Low-rank Matrix Factorization: action: yes
k: we assume that the ratings users assign to items are determined by k latent features. vAve=[8, 7, 0, 1,...,2] T fiction: yes action: yes romance: no comedy: a little 12

Matrix Factorization Low-rank Matrix Factorization:
Each item j has k weights, The rating user i assigns to item j: 13

Matrix Factorization Optimization problem:
Let denote the set of entries in R for which the rating is known We wish to find matrixes and which minimize the sum min<Rij, R'ij> 14

Matrix Factorization Optimization problem:
To prevent overfitting, we introduce a regularization term with a penalty parameter λ > 0. denotes the Frobenius norm, 15

Matrix Factorization Optimization problem: is a given loss function
The regularized square loss: non-convex 16

Outline Introduction Matirx Factorization Distributed SGD

Distributed SGD Method: Stochastic Gradient Descent (SGD)
>0 is the step size 18

Distributed SGD 19

Distributed SGD Algorithm Design: 20

Distributed SGD Distributed Matrix =>Parallelized SGD:
Divide the ratings matrix R into blocks and distribute the blocks to different processors. For and : and 21

Distributed SGD Batch Streaming Algorithm:
We must distribute the blocks in a way that avoids conflicting updates. 22

Initialize Matrix R & U &V
Streaming DSGD Batch Streaming Algorithm: Join U &V, R=UTV make perdiction Divide Matrix Select blocks Initialize Matrix R & U &V Worker Driver Make RDD live data stream(i,j,R) SGD Block 3 Block 1 Block 2 Update u1 & v1 Update u2 & v2 Update u3 & v3 23

Collaborative Filtering for Streaming data

Similar presentations

Presentation on theme: "Collaborative Filtering for Streaming data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Collaborative Filtering for Streaming data

Similar presentations

Presentation on theme: "Collaborative Filtering for Streaming data"— Presentation transcript:

Similar presentations

About project

Feedback