Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collaborative Filtering for Streaming data

Similar presentations


Presentation on theme: "Collaborative Filtering for Streaming data"— Presentation transcript:

1 Collaborative Filtering for Streaming data
颜荣圻

2 Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 2 2

3 Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 3 3

4 Introduction Movie Rating 4

5 Introduction Netflix Prize 5

6 Introduction Recommendtion 6

7 How to fill in the blanks?
Introduction Collaborative Filtering What do I recommend? = How to fill in the blanks? 7

8 Introduction Goal Predict movies of interest to a user based on movies watched by user and others Collectively filter the ratings of a large group of users Data: a stream of incoming ratings 8

9 Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 9 9

10 m: number of rows(users) n: number of columns(items)
Matrix Factorization Model: Matrix Completion Problem m: number of rows(users) n: number of columns(items) R: incomplement matrix of ratings Rij : the rating user i assigns to item j 10

11 Matrix Factorization V k UT
Approach: Low-rank Matrix Factorization Find a low-rank matrix factorization of the ratings matrix U=[u1,u2,... ui ... um] V=[v1,v2,... vj ... vn] k UT V 11

12 Matrix Factorization Low-rank Matrix Factorization: action: yes
k: we assume that the ratings users assign to items are determined by k latent features. vAve=[8, 7, 0, 1,...,2] T fiction: yes action: yes romance: no comedy: a little 12

13 Matrix Factorization Low-rank Matrix Factorization:
Each item j has k weights, The rating user i assigns to item j: 13

14 Matrix Factorization Optimization problem:
Let denote the set of entries in R for which the rating is known We wish to find matrixes and which minimize the sum min<Rij, R'ij> 14

15 Matrix Factorization Optimization problem:
To prevent overfitting, we introduce a regularization term with a penalty parameter λ > 0. denotes the Frobenius norm, 15

16 Matrix Factorization Optimization problem: is a given loss function
The regularized square loss: non-convex 16

17 Outline Introduction Matirx Factorization Distributed SGD
Spark & Spark Streaming Our Work 17 17

18 Distributed SGD Method: Stochastic Gradient Descent (SGD)
>0 is the step size 18

19 Distributed SGD 19

20 Distributed SGD Algorithm Design: 20

21 Distributed SGD Distributed Matrix =>Parallelized SGD:
Divide the ratings matrix R into blocks and distribute the blocks to different processors. For and : and 21

22 Distributed SGD Batch Streaming Algorithm:
We must distribute the blocks in a way that avoids conflicting updates. 22

23 Initialize Matrix R & U &V
Streaming DSGD Batch Streaming Algorithm: Join U &V, R=UTV make perdiction Divide Matrix Select blocks Initialize Matrix R & U &V Worker Driver Make RDD live data stream(i,j,R) SGD Block 3 Block 1 Block 2 Update u1 & v1 Update u2 & v2 Update u3 & v3 23

24 Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 24 24


Download ppt "Collaborative Filtering for Streaming data"

Similar presentations


Ads by Google