Download presentation
Presentation is loading. Please wait.
1
Collaborative Filtering for Streaming data
颜荣圻
2
Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 2 2
3
Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 3 3
4
Introduction Movie Rating 4
5
Introduction Netflix Prize 5
6
Introduction Recommendtion 6
7
How to fill in the blanks?
Introduction Collaborative Filtering What do I recommend? = How to fill in the blanks? 7
8
Introduction Goal Predict movies of interest to a user based on movies watched by user and others Collectively filter the ratings of a large group of users Data: a stream of incoming ratings 8
9
Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 9 9
10
m: number of rows(users) n: number of columns(items)
Matrix Factorization Model: Matrix Completion Problem m: number of rows(users) n: number of columns(items) R: incomplement matrix of ratings Rij : the rating user i assigns to item j 10
11
Matrix Factorization V k UT
Approach: Low-rank Matrix Factorization Find a low-rank matrix factorization of the ratings matrix U=[u1,u2,... ui ... um] V=[v1,v2,... vj ... vn] k UT V 11
12
Matrix Factorization Low-rank Matrix Factorization: action: yes
k: we assume that the ratings users assign to items are determined by k latent features. vAve=[8, 7, 0, 1,...,2] T fiction: yes action: yes romance: no comedy: a little 12
13
Matrix Factorization Low-rank Matrix Factorization:
Each item j has k weights, The rating user i assigns to item j: 13
14
Matrix Factorization Optimization problem:
Let denote the set of entries in R for which the rating is known We wish to find matrixes and which minimize the sum min<Rij, R'ij> 14
15
Matrix Factorization Optimization problem:
To prevent overfitting, we introduce a regularization term with a penalty parameter λ > 0. denotes the Frobenius norm, 15
16
Matrix Factorization Optimization problem: is a given loss function
The regularized square loss: non-convex 16
17
Outline Introduction Matirx Factorization Distributed SGD
Spark & Spark Streaming Our Work 17 17
18
Distributed SGD Method: Stochastic Gradient Descent (SGD)
>0 is the step size 18
19
Distributed SGD 19
20
Distributed SGD Algorithm Design: 20
21
Distributed SGD Distributed Matrix =>Parallelized SGD:
Divide the ratings matrix R into blocks and distribute the blocks to different processors. For and : and 21
22
Distributed SGD Batch Streaming Algorithm:
We must distribute the blocks in a way that avoids conflicting updates. 22
23
Initialize Matrix R & U &V
Streaming DSGD Batch Streaming Algorithm: Join U &V, R=UTV make perdiction Divide Matrix Select blocks Initialize Matrix R & U &V Worker Driver Make RDD live data stream(i,j,R) SGD Block 3 Block 1 Block 2 Update u1 & v1 Update u2 & v2 Update u3 & v3 23
24
Outline Introduction Matrix Factorization Distributed SGD
Spark & Spark Streaming Our Work 24 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.