Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.

Similar presentations


Presentation on theme: "Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis."— Presentation transcript:

1 Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis of High-Dimensional Data July 26-28, 2011

2 2 Probabilistic Matrix Factorization (PMF) Approximate a large n-by-m matrix R by –M = P Q –P and Q each have k rows, k << n, m –m ui = p u q i –R may be sparsely populated Prime tool in Netflix Prize –99% of ratings were missing

3 Regularization for PMF Needed to avoid overfitting –Even after limiting rank of M –Critical for sparse, imbalanced data Penalized least squares –Minimize 3

4 Regularization for PMF Needed to avoid overfitting –Even after limiting rank of M –Critical for sparse, imbalanced data Penalized least squares –Minimize –or 4

5 Regularization for PMF Needed to avoid overfitting –Even after limiting rank of M –Critical for sparse, imbalanced data Penalized least squares –Minimize –or – ’s selected by cross validation 5

6 Research Questions Should we use separate P and Q ? 6

7 Research Questions Should we use separate P and Q ? Should we use k separate ’s for each dimension of P and Q? 7

8 Matrix Completion with Noise (Candes and Plan, Proc IEEE, 2010) Rank reduction without explicit factors –No pre-specification of k, rank(M) Regularization applied directly to M –Trace norm, aka, nuclear norm –Sum of the singular values of M Minimize subject to “Equivalent” to L 2 regularization for P, Q 8

9 Research Questions Should we use separate P and Q ? Should we use k separate ’s for each dimension of P and Q? Should we use the trace norm for regularization? 9

10 Bayesian Matrix Factorization (BPMF) (Salakhutdinov and Mnih, ICML 2008) Let r ui ~ N (p u q i,  2 ) No PMF-type regularization p u ~ N (  P,  P -1 ) and q i ~ N (  Q,  Q -1 ) Priors for  2,  P,  Q,  P,  Q Fit by Gibbs sampling Substantial reduction in prediction error relative to PMF with L 2 regularization 10

11 Research Questions Should we use separate P and Q ? Should we use k separate reg. parameters for each dimension of P and Q? Should we use the trace norm for regularization? Does BPMF “regularize” appropriately? 11

12 Matrix Factorization with Biases Let m ui =  + a u + b i + p u q i Regularization similar to before –Minimize 12

13 Matrix Factorization with Biases Let m ui =  + a u + b i + p u q i Regularization similar to before –Minimize –or 13

14 Research Questions Should we use separate P and Q ? Should we use k separate reg. parameters for each dimension of P and Q? Should we use the trace norm for regularization? Does BPMF “regularize” appropriately? Should we use separate ’s for the biases? 14

15 Some Things this Talk Will Not Cover Various extensions of PMF –Combining explicit and implicit feedback –Time varying factors –Non-negative matrix factorization –L 1 regularization – ’s depending on user or item sample sizes Efficiency of optimization algorithms –Use Newton’s method, each coordinate separately –Iterate to convergence 15

16 No Need for Separate P and Q M = (cP)(c -1 Q) is invariant for c ≠ 0 For initial P and Q –Solve for c to minimize –c = –Gives Sufficient to let P = Q = PQ 16

17 Bayesian Motivation for L 2 Regularization Simplest case: only one item –R is n-by-1 –R u1 = a 1 +  ui, a 1 ~ N (0,  2 ),  ui ~ N (0,  2 ) Posterior mean (or MAP) of a 1 satisfies – – a = (  2 /  2 ) – Best is inversely proportional to  2 17

18 Implications for Regularization of PMF Allow a ≠ b –If  a 2 ≠  b 2 Allow a ≠ b ≠ PQ Allow PQ1 ≠ PQ2 ≠ … ≠ PQk ? –Trace norm does not –BPMF appears to 18

19 Simulation Experiment Structure n = 2,500 users, m = 400 items 250,000 observed ratings –150,000 in Training (to estimate a, b, P, Q) –50,000 in Validation (to tune ’s) –50,000 in Test (to estimate MSE) Substantial imbalance in ratings –8 to 134 ratings per user in Training data –33 to 988 ratings per item in Training data 19

20 Simulation Model r ui = a u + b i + p u1 q i1 + p u2 q i2 +  ui Elements of a, b, P, Q, and  –Independent normals with mean 0 –Var(a u ) = 0.09 –Var(b i ) = 0.16 –Var(p u1 q i1 ) = 0.04 –Var(p u2 q i2 ) = 0.01 –Var(  ui ) = 1.00 20

21 Evaluation Test MSE for estimation of m ui = E(r ui ) –MSE = Limitations –Not real data –Only one replication –No standard errors 21

22 PMF Results for k = 0 Restrictions on ’sValues of a, b MSE for m  MSE Grand mean; no ( a, b ) NA.2979 22

23 PMF Results for k = 0 Restrictions on ’sValues of a, b MSE for m  MSE Grand mean; no ( a, b ) NA.2979 a = b = 0 0.0712-.2267 23

24 PMF Results for k = 0 Restrictions on ’sValues of a, b MSE for m  MSE Grand mean; no ( a, b ) NA.2979 a = b = 0 0.0712-.2267 a = b 9.32.0678-.0034 24

25 PMF Results for k = 0 Restrictions on ’sValues of a, b MSE for m  MSE Grand mean; no ( a, b ) NA.2979 a = b = 0 0.0712-.2267 a = b 9.32.0678-.0034 Separate a, b 9.26, 9.70.0678.0000 25

26 PMF Results for k = 1 Restrictions on ’sValues of a, b, PQ1 MSE for m  MSE Separate a, b 9.26, 9.70.0678 26

27 PMF Results for k = 1 Restrictions on ’sValues of a, b, PQ1 MSE for m  MSE Separate a, b 9.26, 9.70.0678 a = b = PQ1 11.53.0439-.0239 27

28 PMF Results for k = 1 Restrictions on ’sValues of a, b, PQ1 MSE for m  MSE Separate a, b 9.26, 9.70.0678 a = b = PQ1 11.53.0439-.0239 Separate a, b, PQ1 8.50, 10.13, 13.44.0439.0000 28

29 PMF Results for k = 2 Restrictions on ’sValues of a, b, PQ1 MSE for m  MSE Separate a, b, PQ1 8.50, 10.13, 13.44, NA.0439 29

30 PMF Results for k = 2 Restrictions on ’sValues of a, b, PQ1 MSE for m  MSE Separate a, b, PQ1 8.50, 10.13, 13.44, NA.0439 a, b, PQ1 = PQ2 8.44, 9.94, 19.84, 19.84.0441+.0002 30

31 PMF Results for k = 2 Restrictions on ’sValues of a, b, PQ1 MSE for m  MSE Separate a, b, PQ1 8.50, 10.13, 13.44, NA.0439 a, b, PQ1 = PQ2 8.44, 9.94, 19.84, 19.84.0441+.0002 Separate a, b, PQ1, PQ2 8.43, 10.24, 13.38, 27.30.0428-.0013 31

32 Results for Matrix Completion Performs poorly on raw ratings –MSE =.0693 –Not designed to estimate biases Fit to residuals from PMF with k = 0 –MSE =.0477 –“Recovered” rank was 1 –Worse than MSE’s from PMF:.0428 to.0439 32

33 Results for BPMF Raw ratings –MSE =.0498, using k = 3 –Early stopping –Not designed to estimate biases Fit to residuals from PMF with k = 0 –MSE =.0433, using k = 2 –Near.0428, for best PMF w/ biases 33

34 Summary No need for separate P and Q Theory suggests using separate ’s for distinct sets of exchangeable parameters –Biases vs. factors –For individual factors Tentative simulation results support need for separate ’s across factors –BPMF does so automatically –PMF requires a way to do efficient tuning 34


Download ppt "Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis."

Similar presentations


Ads by Google