Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matrix Factorization.

Similar presentations


Presentation on theme: "Matrix Factorization."— Presentation transcript:

1 Matrix Factorization

2 Recovering latent factors in a matrix
m columns v11 vij n rows vnm

3 Recovering latent factors in a matrix
K * m ~ x1 y1 x2 y2 .. xn yn a1 a2 .. am b1 b2 bm v11 vij n * K vnm

4 What is this for? ~ K * m n * K x1 y1 x2 y2 .. … xn yn vij a1 a2 .. …
am b1 b2 bm v11 vij n * K vnm

5 MF for collaborative filtering

6 What is collaborative filtering?

7 What is collaborative filtering?

8 What is collaborative filtering?

9 What is collaborative filtering?

10

11 Recovering latent factors in a matrix
m movies v11 vij n users vnm V[i,j] = user i’s rating of movie j

12 Recovering latent factors in a matrix
m movies m movies ~ x1 y1 x2 y2 .. xn yn a1 a2 .. am b1 b2 bm v11 vij n users vnm V[i,j] = user i’s rating of movie j

13

14 MF for image modeling

15

16 V[i,j] = pixel j in image i
MF for images PC1 10,000 pixels 1000 * 10,000,00 2 prototypes ~ x1 y1 x2 y2 .. xn yn a1 a2 .. am b1 b2 bm v11 vij 1000 images vnm PC2 V[i,j] = pixel j in image i

17 MF for modeling text

18 The Neatest Little Guide to Stock Market Investing
Investing For Dummies, 4th Edition The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns The Little Book of Value Investing Value Investing: From Graham to Buffett and Beyond Rich Dad’s Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not! Investing in Real Estate, 5th Edition Stock Investing For Dummies Rich Dad’s Advisors: The ABC’s of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss

19 TFIDF counts would be better

20 Recovering latent factors in a matrix
m terms doc term matrix ~ x1 y1 x2 y2 .. xn yn a1 a2 .. am b1 b2 bm v11 vij n documents vnm V[i,j] = TFIDF score of term j in doc i

21 Investing for real estate
Rich Dad’s Advisor’s: The ABCs of Real Estate Investment …

22 The little book of common sense investing: …
Neatest Little Guide to Stock Market Investing

23 MF is like clustering

24 indicators for r clusters
k-means as MF indicators for r clusters cluster means original data set ~ M 1 .. xn yn a1 a2 .. am b1 b2 bm v11 vij Z X n examples vnm

25 How do you do it? ~ K * m n * K x1 y1 x2 y2 .. … xn yn vij a1 a2 .. …
am b1 b2 bm v11 vij n * K vnm

26 KDD 2011 talk pilfered from  …..

27

28 Recovering latent factors in a matrix
m movies m movies ~ H x1 y1 x2 y2 .. xn yn a1 a2 .. am b1 b2 bm v11 vij W V n users vnm V[i,j] = user i’s rating of movie j

29

30

31

32 Matrix factorization as SGD
step size why does this work?

33 Matrix factorization as SGD - why does this work? Here’s the key claim:

34 Checking the claim Think for SGD for logistic regression
LR loss = compare y and ŷ = dot(w,x) similar but now update w (user weights) and x (movie weight)

35 What loss functions are possible?
N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies “generalized” KL-divergence

36 What loss functions are possible?

37 What loss functions are possible?

38 ALS = alternating least squares

39 KDD 2011 talk pilfered from  …..

40

41

42

43 Similar to McDonnell et al with perceptron learning

44 Slow convergence…..

45

46

47

48

49

50

51 More detail…. Randomly permute rows/cols of matrix
Chop V,W,H into blocks of size d x d m/d blocks in W, n/d blocks in H Group the data: Pick a set of blocks with no overlapping rows or columns (a stratum) Repeat until all blocks in V are covered Train the SGD Process strata in series Process blocks within a stratum in parallel

52 More detail…. Z was V

53 More detail…. Initialize W,H randomly not at zero 
Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch” Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k Use “bold driver” to set step size: increase step size when loss decreases (in an epoch) decrease step size when loss increases Implemented in Hadoop and R/Snowfall

54

55 Wall Clock Time 8 nodes, 64 cores, R/snow

56

57

58

59

60 Number of Epochs

61

62

63

64

65 Varying rank 100 epochs for all

66 Hadoop process setup time starts to dominate
Hadoop scalability Hadoop process setup time starts to dominate

67 Hadoop scalability

68


Download ppt "Matrix Factorization."

Similar presentations


Ads by Google