Presentation is loading. Please wait.

Presentation is loading. Please wait.

G = (  n  SUPu 1 e(u 1,n)FM n,...,  n  SUPu lastu e(u lastu,n)FM n,...,  v  SUPm 1 e(v,m 1 )UF v,...,  v  SUPlastm 1 e(v,m lastm )UF v ) 0 = dsse(t)/dt.

Similar presentations


Presentation on theme: "G = (  n  SUPu 1 e(u 1,n)FM n,...,  n  SUPu lastu e(u lastu,n)FM n,...,  v  SUPm 1 e(v,m 1 )UF v,...,  v  SUPlastm 1 e(v,m lastm )UF v ) 0 = dsse(t)/dt."— Presentation transcript:

1 G = (  n  SUPu 1 e(u 1,n)FM n,...,  n  SUPu lastu e(u lastu,n)FM n,...,  v  SUPm 1 e(v,m 1 )UF v,...,  v  SUPlastm 1 e(v,m lastm )UF v ) 0 = dsse(t)/dt = d/dt( (UF u +tG u )(FM m +tG m ) ) 2( (UF u +tG u )(FM m +tG m ) - r um ) 1  um  Rtg d/dt( t 2 G u G m + t(G m UF u +G u FM m ) + UF u FM m ))=0 2(UF u FM m -r um +t(G m UF u +G u FM m )+t 2 G u G m )  um  Rtg g=G m G u h=G m UF u +G u FM m p=UF u FM m -r um ( 2tG u G m + G m UF u +G u FM m ) = 0  um  Rtg ( UF u FM m -r um +t(G m UF u +G u FM m )+t 2 G u G m ) (p+th+t 2 g)(2tg+h) =  um  Rtg (2tpg+2t 2 hg+2t 3 g 2 +ph+th 2 +t 2 gh) = 0  um  Rtg g 2 + t 3 2  um  Rt t 2 3  um  Rt ph = 0 hg + t  um  Rt (2pg+h 2 ) +  um  Rt a bcd r______ p_______ e_________ ee______ g_____ 2 1 1 1 1 1 1 4 1 1 3 9 9 3 1 1 fv 5 mse 1 3GR -0.45 t 8.9 mset -0.4 =t 9.1 mset -0.45 =t 8.9 mset -0.5 =t 8.5 mset 1 0.5 0.1 t + 0.458 1 1.0 0.3 0.0487 It is 0 of mset' 1 0.4 0.5 0.3 0.5 0.2 0.1 0.0487 -1E-17 sse(t) =  (u,m)  Rtg (p um (t) - r um ) 2 =  (u,m)  Rtg ( (UF u +tG u )(FM m +tG m ) - r um ) 2 2(UF u FM m +t 2 G u G m +tG m UF u +tG u FM m -r um ) F=(UF u,FM m )=feature_vec, p um =UF u FM m =prediction, r um =rating, e um =r um -p um =error, G=Gradient(sse) F(t)≡( UF u +tG u, FM m +tG m ) p um (t)≡( (UF u +tG u ) o (FM m +tG m ) ) Given a relationship matrix, r, between 2 entities, u and m. (e.g., Netflix "Ratings" matrix, r um the rating user, u, gives to movie, m. Netflix uses it to recommend movies to users. Recomenders are systems that recommend products to users based on past ratings.) Recommender research is a hot Data Mining topic and Singular Value Decomposition (SVD) is a hot recommender approach. SVD uses the ratings in r to train 2 smaller matrixes, a user-feature matrix, UF, and a feature-movie matrix, FM. Once UF and FM are trained, a prediction of the rating u will give to m can be made very quickly by simply taking the dot product, p um =UF u o MF m. Starting with a few features (e.g., 40?), each UF u vector = extent to which a user "likes" each feature (i.e., features: genre, length); each FM m = level features characterize m. SVD trains UF, FM using gradient descent minimization of sse (sum square error). SVD training finds the best feature vector first (the eigenvector for the largest eigenvalue), then the second best, etc. Solving at 3 +bt 2 +ct+d=0, t= or t = ( q+[q 2 +(r-p 2 ) 3 ]) 1/3 + ( q-[q 2 +(r-p 2 ) 3 ]) 1/3 + p q = p 3 +(bc-3ad)/(6a 2 )p = -b/(3a)r = c/(3a) [(-b 3 /27a 3 +bc/6a 2 -d/2a)+{(-b 3 /27a 3 +bc/6a 2 -d/2a) 2 +(c/3a-b 2 /9a 2 ) 3 } 1/2 ] 1/3 [(-b 3 /27a 3 +bc/6a 2 -d/2a)-{(-b 3 /27a 3 +bc/6a 2 -d/2a) 2 +(c/3a-b 2 /9a 2 ) 3 } 1/2 ] 1/3 - b/3a Here I solved for the root of dsse(t)/dt = 0, getting t=-.45. This must a critical point but is not a minimum point. It must be a saddle point. Note, the dividend, x2 +,5x +.1 = 0 has discrimnant,.25-.4 <0 so the other 2 roots are complex.

2 1 r___ p___ e___ ee__ g__ h_____ a 0 p 2 2 2 2 0 0 0 0 0 2 b 0 q 3 1 fv mse 0 0 GR 1 fvt c 0 r 4 t -1 mset 0 d 0 5 6 (A2-(A3+F2*B2*T)*(B2+F2*A3*T))^2 7 8 mset = (r-(fu+e*fm*T)*(fm+e*fu*T))^2 9 dmset/dT = 0 = 10 2(r-(fu+e*fm*T)*(fm+e*fu*T))2((fu+e*fm*T)(e*fu)+(e*fm)(fm+e*fu*T)) 11 (r-(fu+e*fm*T)*(fm+e*fu*T)) ((fu+e*fm*T)(e*fu)+(e*fm)(fm+e*fu*T)) 12 0= (r-(fu+e*fm*T)*(fm+e*fu*T)) 0= (fu+efmT)(efu)+(efm)(fm+efuT) 13 r-fufm +T(efm^2+efu^2) +T^2(e^2fmfu) 14 1 T*2 T^2*1 15 (T+1)(T+1) = 0 T= -(efu^2+efm^2)/2e^2fmfu 16 T=-1 =-(fu^2+fm^2)/2efmfu 17 =-(1^2+1^2)/2*1*1*1 =-1

3 r_______ p_______ e_________ ee______ g_____ h_______ a 164 2 1 1 1 1 1 1 2 1 b 168 4 1 1 3 9 9 3 6 1 c 60 1 1 fv 0 t 5 mset 5 mse 1 3 GR 1 1 fvt d 8 r_______ p_______ e_________ ee______ 2 1 1 1 1 4 1 1 3 9 1 1 fv 0.1 t 2.980 mset 5 mse r_______ p_______ e_________ ee______ 2 1 1 1 1 4 1 1 3 9 1 1 fv 0.2 t 1.193 mset 5 mse r_______ p_______ e_________ ee_____ 2 1 1 1 1 4 1 1 3 9 1 1 fv 0.3 t 0.124 mset 5 mse r_______ p_______ e_________ ee_____ 2 1 1 1 1 4 1 1 3 9 1 1 fv 0.4 t 0.353 mset 5 mse r_______ p________ e_________ ee_____ 2 1 1 1 1 4 1 1 3 9 1 1 fv 0.5 t 2.562 mset 5 mse r_______ p________ e_________ ee_____ 2 1 1 1 1 4 1 1 3 9 1 1 fv 0.6 t 7.529 mset 5 mse r_______ p________ e_________ ee_____ 2 1 1 1 1 4 1 1 3 9 1 1 fv 0.7 t 16.13 mset 5 mse r_______ p_________ e_________ ee______ g_____ h_______ 2 1 1 1 1 1 1 2 1.3 4 1 1 3 9 9 3 6 2.0 1 1 fv 0.338 t 0.023 mset 5 mse 1 3 GR 1.3 2.0 fvt r_______ p_________ e_________ ee________ g_____ h_______ 2 1.3 1.7 0.209 0.043 0. 0. 0.7 1.3 4 2.0 4.056 -0. 0.0 0. -0 -0. 2.0 1.3 2.0 fv 0.1 t 0.009 mset 0.023 mse 0. -0 GR 1.3 2.0 fvt r_______ p_________ e_________ 2 1.3 1.7 0.209 4 2.0 4.056 -0. 1.3 2.0 fv 0.2 t 0.002 mset r_______ p_________ e________ 2 1.3 1.7 0.209 4 2.0 4.056 -0. 1.3 2.0 fv 0.3 t 0.003 mse r_______ p_________ e_________ ee________ g_____ h_______ 2 1.3 1.7 0.209 0.043 0. 0. 0.7 1.4 4 2.0 4.056 -0. 0.0 0. -0 -0. 1.9 1.3 2.0 fv 0.23 t 0.001 mset 0.023 mse 0. -0 GR 1.4 1.9 fvt Since calculus isn't working (to find the min mse along f(t)=f+tG ), will this type of binary search be efficient enough? Maybe so! In all dimensions, the mse(t) equation is quartic (dimension=4) so The general shape is as below (where any subset of the local extremes can coelese).

4 r_____ p_________ e_____ ____ ee____ g_____ h______________ 1 3 1 1 1 0 2 0 4 6 4 2 5 4 1.2 4 5 1 1 1 3 4 9 ** **7 7 10 8 1.7 1 1 1 fv 0.1 t 3.9362 mset 7 mse 3 1 2 GR 1.3 1.1 1.2 fvt r_____ p_________ e_____ ____ 1 3 1 1 1 0 2 4 5 1 1 1 3 4 1 1 1 fv 0.2 t 1.7848 mset r_____ p_________ e_____ ____ 1 3 1 1 1 0 2 4 5 1 1 1 3 4 1 1 1 fv 0.3 t 2.2170 mset r_____ p_________ e_____ ____ 1 3 1 1 1 0 2 4 5 1 1 1 3 4 1 1 1 fv 0.25 t 1.5761 mset r_____ p_________ e_____ ____ 1 3 1 1 1 0 2 4 5 1 1 1 3 4 1 1 1 fv 0.225 t 1.5878 mset r_____ p__________ e__________ ee____ g_____ h______________ 1 3 1 1 1 0 2 0 4 6 4 2 5 4 1.47 4 5 1 1 1 3 4 9 ** **7 7 10 8 2.66 1 1 1 fv 0.2375 t 1.5571 mset 7 mse 3 1 2 GR 1.71 1.23 1.47 fvt rr______________ p________________ e_____ ____ 1 3 1.47 2.52 2.17 -1.525 4 5 2.66 4.55 3.2948 -0.559 1.70 1.71 1.23 1.47 fv 0.1 t 0.5542 mset r______________ p________________ e_____ ____ 1 3 1.47 2.52 2.17 -1.525 4 5 2.66 4.55 3.2948 -0.559 1.70 1.71 1.23 1.47 fv 0.2 t 3.9363 mset r______________ p________________ e_____ ____ 1 3 1.47 2.52 2.17 -1.525 4 5 2.66 4.55 3.2948 -0.559 1.70 1.71 1.23 1.47 fv 0.05 t 0.5582 mset rr______________ p________________ e_____ ____ 1 3 1.47 2.52 2.17 -1.525 4 5 2.66 4.55 3.2948 -0.559 1.70 1.71 1.23 1.47 fv 0.075 t 0.4258 mset

5 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 1 \a=Z2 2 3 3 5 2 5 3 3 /rvnfv~fv~{goto}L~{edit}+.005~/XImse<omse-.00001~/xg\a~ 3 2 5 1 2 3 5 3.001~{goto}se~/rvfv~{end}{down}{down}~ 4 3 3 3 5 5 2 /xg\a~ 5 5 3 4 3 6 2 1 2 1 7 4 1 1 4 3 8 4 3 2 5 3 9 1 4 5 3 2 LRATE omse 10 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3 1 3 3 2 0.001 0.1952090 fv A22: +A2-A$10*$U2 /* error for u=a, m=1 */ A30: +A10+$L*(A$22*$U$2+A$24*$U$4+A$26*$U$6+A$29*$U$9) /* updates f(u=a) */ U29: +U9+$L*(($A29*$A$30+$K29*$K$30+$N29*$N$30+$P29*$P$30)/4) /* updates f(m=8 */ AB30: +U29 /* copies f(m=8) feature update in the new feature vector, nfv */ W22: @COUNT(A22..T22) /* counts the number of actual ratings (users) for m=1 */ X22: [W3] @SUM(W22..W29) /*adds ratings counts for all 8 movies = training count*/ AD30: [W9] @SUM(SE)/X22 /* averages se's giving the mse */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 21 working error and new feature vector (nfv) 22 0 0 0 **0 ** 3 6 35 23 0 0 ** 0 ** 0 3 6 24 0 0 0 ** 0 2 5 25 0 ** ** 3 3 26 0 0 **1 3 27 **** ** 0 3 4 28 ** 1 0 ** 3 4 29 ** ** 0 0 2 4 L mse 30 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.1952063 nfv A52: +A22^2 /*squares all the individual erros */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 square errors 53 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 59 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SE 60 --------------------------------------------------------------- 61 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 1 1 3 3 3 3. 2 2 3 2 0.125 0.225073 62 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 1 1 3 3 3 3. 1 2 3 2 0.141 0.200424 63 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 1 1 3 3 3 3. 1 3 3 2 0.151 0.197564 64 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.151 0.196165 65 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.151 0.195222 66 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195232 67 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195228 68 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195224 69 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195221 70 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195218 71 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195214 72 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195211 /rvnfv~fv copies fv to nfv after converting fv to values. {goto}L~{edit}+.005~ increments L by.005 /XImse<omse-.00001~/xg\a~ IF mse still decreasing, recalc mse with new L.001~ Reset L=.001 for next round /xg\a~ Start over with next round {goto}se~/rvfv~{end}{down}{down}~ "value copy" fv to output list Notes: In 2 rounds mse is as low as Funk gets it in 2000 rounds. After 5 rounds mse is lower than ever before (and appears to be bottoming out). I know I shouldn't hardcode parameters! Experiments should be done to optimize this line search (e.g., with some binary search for a low mse). Since we have the resulting individual square_errors for each training pair, we could run this, then for mask the pairs with se(u,m) > Threshold. Then do it again after masking out those that have already achieved a low se. But what do I do with the two resulting feature vectors? Do I treat it like a two feature SVD or do I use some linear combo of the resulting predictions of the two (or it could be more than two)? We need to test out which works best (or other modifications) on Netflix data. Maybe on those test pairs for which the training row and column have some high errors, we apply the second feature vector instead of the first? Maybe we invoke CkNN for test pairs in this case (or use all 3 and a linear combo?) This is powerful! We need to optimize the calculations using pTrees!!!

6 Take a feature vector, fv=fv u fv m of SVD, let fv(t) = t*v (this is for the line search of SVD). To do the line search, mse(t) = 1/|Ratings|  (v,m)  Ratings (tfv u *tfv m - r um ) 2 mse(t) = 1/|Ratings|  (v,m)  Ratings (t 2 fv u *fv m - r um ) 2 mse(t) = 1/|Ratings|  (v,m)  Ratings ( t 4 fv u 2 fv m 2 - 2t 2 fv u fv m r u,m + r um 2 )  mse/  t = 1/|Ratings|  (v,m)  Ratings ( 4t 3 fv u 2 fv m 2 - 4t 1 fv u fv m r u,m ) = 0  mse/  t = 4/|Ratings|  (v,m)  Ratings ( t 3 fv u 2 fv m 2 - t 1 fv u fv m r u,m ) = 0  mse/  t = t *  (v,m)  Ratings ( t 2 fv u 2 fv m 2 - fv u fv m r u,m ) = 0 So t=0 is one solution. The other two involve solving the quadratic equation,  (v,m)  Ratings t 2 fv u 2 fv 2 m =  (v,m)  Ratings fv u fv m r u,m t 2 =  (v,m)  Ratings fv u fv m r u,m  (v,m)  Ratings fv u 2 fv 2 m t=±  (v,m)  Ratings fv u fv m r u,m  (v,m)  Ratings fv u 2 fv 2 m  t=±  (v,m)  Ratings p um r u,m  (v,m)  Ratings p 2 um 

7 1 r_________________ p______ e_____________________ e___________ 2 1 3 1 1 1 | 0 2 0 4 3 4 5 1 1 1 | 3 4 9 16 4 1 1 1 fv pr_____ pp____________________ 5 1 3 | 1 1 6 4 5 | 1 1 7.25 mse 7 1.802 =t 8 r_________________ p______ e_____________________ e___________ 9 1 3 1.802 3 3 | -2.25 -0.25 5.062 0 10 4 5 1.802 3 3 | 0.75 1.75 0.562 3.0 11 1.802 1.802 1.802 fvt pr_____ pp____________________ 12 3 9 |10.562 10.562 13 **** |10.562 10.562 2.187 mset 14 -1.80 =(-t) 15 r_________________ p______ e_____________________ e___________ 16 1 3 -1.80 3 9 | -2.25 -6.75 5.062 ** 17 4 5 -1.80 3 0 | 0.75 5 0.562 25 18 -1.80 0 -5.40 fv(-t)pr_____ pp____________________ 19 3 ** |10.562 95.062 20 **0 |10.562 0 19.04 mse(-t 21 r_________________ p______ e_____________________ e___________ 22 1 3 1.352 2 2 |-1.071 0.6234 1.148 0 23 4 5 2.253 3 4 |0.5468 0.2265 0.299 0.0 24 1.532 2.118 1.757 fv1 pr_____ pp_________________ 25 L 2 7 |4.2926 5.6480 26 0.1 **** |11.924 22.785 0.471 mse 27 1.024 =t 28 r_________________ p______ e_____________________ e___________ 29 1 3 1.385 2 2 |-1.175 0.5047 1.381 0 30 4 5 2.309 3 5 |0.3743 -0.011 0.140 0.0 31 1.570 2.170 1.801 fvt pr_____ pp_________________ 32 2 7 |4.7323 6.2265 33 **** |13.145 25.119 0.444 mset 34 -1.02 =(-t) 35 r_________________ p______ e_____________________ e___________ 36 1 3 -1.38 1 4 |-0.419 -1.258 0.176 1 37 4 5 -2.30 2 0 |1.6339 5 2.669 25 38 -1.02 0 -3.07 fv(-t)pr_____ pp_________________ 39 1 ** |2.0153 18.138 40 9 0 |5.5982 0 7.357 mse(-t 41 r_________________ p______ e_____________________ e___________ 42 1 3 1.198 1 2 |-0.698 0.6745 0.487 0 43 4 5 2.421 3 5 |0.5678 -0.242 0.322 0.0 44 1.417 2.165 1.940 fv1 pr_____ pp_________________ 45 L 1 6 |2.8838 5.4079 46 0.2 **** |11.779 27.484 0.330 mse 47 1.011 =t 48 r_________________ p______ e_____________________ e___________ 49 1 3 1.211 1 2 |-0.736 0.6226 0.541 0 50 4 5 2.448 3 5 |0.4913 -0.359 0.241 0.1 51 1.433 2.189 1.962 fvt pr_____ pp_________________ 52 1 7 |3.0139 5.6518 53 **** |12.310 28.723 0.325 mset 54 -1.01 =(-t) 55 r_________________ p______ e_____________________ e___________ 56 1 3 -1.21 1 3 |-0.224 -0.674 0.050 0 57 4 5 -2.44 2 0 |1.5245 5 2.324 25 58 -1.01 0 -3.03 fv(-t)pr_____ pp_________________ 59 1 ** |1.5002 13.502 60 9 0 |6.1279 0 6.957 mse(-t 61 r_________________ p______ e_____________________ e___________ 62 1 3 1.241 1 2 |-0.848 0.3950 0.720 0 63 4 5 2.433 3 4 |0.3763 0.0585 0.141 0.0 64 1.489 2.030 2.098 fv1 pr_____ pp_________________ 65 L 1 7 |3.4178 6.7856 66 0.18 **** |13.131 24.417 0.255 mse 67 1.011 =t 68 r_________________ p______ e_____________________ e___________ 69 1 3 1.255 1 2 |-0.891 0.3343 0.795 0 70 4 5 2.461 3 5 |0.2918 -0.056 0.085 0.0 71 1.506 2.054 2.122 fvt pr_____ pp_________________ 72 1 7 |3.5790 7.1057 73 **** |13.750 25.569 0.248 mset 74 -1.01 =(-t) 75 r_________________ p______ e_____________________ e___________ 76 1 3 -1.25 1 3 |-0.270 -0.811 0.073 0 77 4 5 -2.46 2 0 |1.5098 5 2.279 25 78 -1.01 0 -3.03 fv(-t)pr_____ pp_________________ 79 1 ** |1.6140 14.526 80 9 0 |6.2009 0 7.002 mse(-t 81 r____ __________ p______ e__________________ e___________ 82 1 3 1.135 1 2 |-0.623 0.4993 0.389 0 83 4 5 2.523 3 5 |0.3918 -0.116 0.153 0.0 84 1.430 2.027 2.202 fv1 pr_____ pp____ ___________ 85 L 1 7 |2.6366 6.2532 86 0.19 **** |13.018 26.173 0.201 mse 87 1.010 =t 88 r____ __________ p______ e__________________ e___________ 89 1 3 1.147 1 2 |-0.659 0.4444 0.434 0 90 4 5 2.550 3 5 |0.3125 -0.228 0.097 0.0 91 1.445 2.049 2.226 fvt pr_____ pp____ ___________ 92 1 7 |2.7537 6.5309 93 **** |13.597 27.335 0.195 mset 94 -1.01 =(-t) 95 r____ __________ p______ e__________________ e___________ 96 1 3 -1.14 1 3 |-0.160 -0.481 0.025 0 97 4 5 -2.55 2 0 |1.4215 5 2.020 25 98 -1.01 0 -3.03 fv(-t)pr_____ pp____ ___________ 99 1 ** |1.3465 12.118 100 **0 |6.6486 0 6.819 mse(-t 101 r____ __________ p______ e__________________ e___________ 102 1 3 1.156 1 2 |-0.682 0.2902 0.465 0 103 4 5 2.546 3 4 |0.2943 0.1205 0.086 0.0 104 1.454 1.915 2.343 fv1 pr_____ pp____ ___________ 105 L 1 8 |2.8297 7.3425 106 0.23 **** |13.731 23.809 0.162 mse 107 1.013 =t 108 r____ __________ p______ e__________________ e___________ 109 1 3 1.172 1 2 |-0.728 0.2154 0.530 0 110 4 5 2.581 3 5 |0.1920 -0.014 0.036 0.0 111 1.474 1.942 2.375 fvt pr_____ pp____ ___________ 112 1 8 |2.9881 7.7537 113 **** |14.500 25.142 0.153 mset 114 -1.01 =(-t) 115 r____ __________ p______ e__________________ e___________ 116 1 3 -1.17 1 3 |-0.188 -0.564 0.035 0 117 4 5 -2.58 2 0 |1.3827 5 1.912 25 118 -1.01 0 -3.04 fv(-t)pr_____ pp____ ___________ 119 1 ** |1.4115 12.704 120 **0 |6.8499 0 6.816 mse(-t

8 1181r_____ ____________ ___ e_____________________ ee__________ 1182 1 3 0.83313 * * |-0.009 0.0033 0.000082 1183 4 5 3.29813 ** |0.0054 0.0042 0.000029 0. 11841.21117 1.51473 3.59678 fv1 r__ pp____________________ 1185 L * * |1.0182 8.9797 1186 0.4 ** |15.956 24.957 0.000035 mse 11871.00042 =t 1188r_____ ____________ ___ e_____________________ ee__________ 1189 1 3 0.83349 * * |-0.009 0.0008 0.000098 1190 4 5 3.29954 ** | 0.002 -0.000 0.000003 0. 1191 1.21168 1.51538 3.59832 fvt r__ pp____________________ 1192 * * |1.0199 8.9951 1193 ** |15.984 25.000 0.000025 mse 1194-1.0004 =(-t) 1195r_____ ____________ ___ e_____________________ ee__________ 1196 1 3 -0.8334 * * |0.1661 0.4984 0.027605 1197 4 5 -3.2995 ** |0.6990 5 0.488661 25 1198-1.0004 0 -3.0012 fv(-t) )r__ pp____________________ 1199 * * |0.6953 6.2577 1200 ** |10.896 0 6.441178 mse After 60 rounds, the mse is.000025 and all individual errors show 2 zero places to the right of the decimal. The resulting feature vector: ufv1 ufv2 ufv3 mfv1 mfv2 1.21168 1.51538 3.59832 0.83349 3.29954 There are only two unrated pairs, (u 2,m 1 ) and (u 3,m 2 ) and this vector predicts them as: p(u 2,m 1 )=1.51538*.83349= 1.26306 p(u 3,m 2 )=3.59832*3.29813=11.87283 The last one must be truncated into the [0,5] range at 5. As a final note, we can say we have mined almost every last bit of information from this training set (down to a mse=.000025) so that the fv "models" the training set nearly perfectly. Let's see if we can convince ourselves of that. u1 u2 u3 mfv m1 -> 1 1.26306 3 0.83349 m2 -> 4 5 11.87283 3.29954 ufv-> 1.21168 1.51538 3.59832 What does training row m2 tell us about p(u2,m1)? It tells us that it should be ~25% higher than r(u1,m1)=1 or ~1.25 What does training column u1 tell us about p(u2,m1)? It tells us that it should be 1/4th of r(u2,m2)=5 or ~1.25 What does training column u1 tell us about p(u3,m2)? It tells us that it should be ~4 times r(u3,m1)=3 or ~12 What does training row m1 tell us about p(u3,m2)? It tells us that it should be ~3 times r(u1,m2)=4 or ~12 I am searching the line through the origin generated by nfv while I think I should be searching the line through fv generated by the gradient (I believe that's what LRATE does. There should be a closed form formula for it, obviating the need for a looping search (neither a binary or fixed increment search). The next slide investigates that.


Download ppt "G = (  n  SUPu 1 e(u 1,n)FM n,...,  n  SUPu lastu e(u lastu,n)FM n,...,  v  SUPm 1 e(v,m 1 )UF v,...,  v  SUPlastm 1 e(v,m lastm )UF v ) 0 = dsse(t)/dt."

Similar presentations


Ads by Google