Here is the result after 1 round when using a fixed increment line search to find minimize mse with respect to the LRATE used: .52 1.47 .52 1 1.47.

Slides:



Advertisements
Similar presentations
Matrix.
Advertisements

B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Review of Matrix Algebra
College Algebra Fifth Edition James Stewart Lothar Redlin Saleem Watson.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
259 Lecture 14 Elementary Matrix Theory. 2 Matrix Definition  A matrix is a rectangular array of elements (usually numbers) written in rows and columns.
Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,
Sundermeyer MAR 550 Spring Laboratory in Oceanography: Data and Methods MAR550, Spring 2013 Miles A. Sundermeyer Linear Algebra & Calculus Review.
G = (  n  SUPu 1 e(u 1,n)FM n,...,  n  SUPu lastu e(u lastu,n)FM n,...,  v  SUPm 1 e(v,m 1 )UF v,...,  v  SUPlastm 1 e(v,m lastm )UF v ) 0 = dsse(t)/dt.
a b c d e f g h i j k.
Teach A Level Maths Vectors for Mechanics. Volume 4: Mechanics 1 Vectors for Mechanics.
MATH 3581 — College Geometry — Spring 2010 — Solutions to Homework Assignment # 3 B E A C F D.
1 C ollege A lgebra Systems and Matrices (Chapter5) 1.
Vectors Addition is commutative (vi) If vector u is multiplied by a scalar k, then the product ku is a vector in the same direction as u but k times the.
Check out the ebook on FPGAs and DP. 2points about the topic: 1. Thinking about FPGA DM together with the raging debate about the efficacy of Non-SQL,
HPVD using error = rating 2 NotEqual prediction 2 A B C D E F G H I J K L M N O P Q R S
Copyright © Cengage Learning. All rights reserved. 2 SYSTEMS OF LINEAR EQUATIONS AND MATRICES.
Computer Graphics Matrices
Divide and Conquer (Part II) Multiplication of two numbers Let U = (u 2n-1 u 2n-2 … u 1 u 0 ) 2 and V = (v 2n-1 v 2n-2 …v 1 v 0 ) 2, and our goal is to.
CS 361 – Chapter 11 Divide and Conquer ! Examples: –Merge sort √ –Ranking inversions –Matrix multiplication –Closest pair of points Master theorem –A shortcut.
Take a feature vector, fv=fv u fv m of SVD, let fv(t) = t*v (this is for the line search of SVD). To do the line search, mse(t) = 1/|Ratings|  (v,m) 
An Introduction to Matrix Algebra Math 2240 Appalachian State University Dr. Ginn.
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Linear Algebra by Dr. Shorouk Ossama.
Matrices Introduction.
Elementary Matrix Theory
The Euclidean Algorithm
Linear Algebra Review.
Eigenfaces (for Face Recognition)
Chapter 4 Systems of Linear Equations; Matrices
CSNB 143 Discrete Mathematical Structures
User-Written Functions
Systems of Linear Equations
Statistical Data Analysis - Lecture /04/03
A1 Algebraic manipulation
Warm-Up 3) 1) 4) Name the dimensions 2).
10.5 Inverses of Matrices and Matrix Equations
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
Introduction Algorithms Order Analysis of Algorithm
Introduction to Algorithms
Excel Formulas & Functions.
Chapter 4 Divide-and-Conquer
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Systems of Linear Equations
Applied Discrete Mathematics Week 4: Number Theory
Excel 2013 Formulas & Functions.
Algebraic Vectors Section 8-2.
What do we now know about number systems?
Result of Ontology Alignment with RiMOM at OAEI’06
OOP Paradigms There are four main aspects of Object-Orientated Programming Inheritance Polymorphism Abstraction Encapsulation We’ve seen Encapsulation.
Number and String Operations
Excel 2013 Formulas & Functions.
TreeMap & HashMap 7/16/2009.
Data Structures Review Session
College Algebra Fifth Edition
Dr Huw Owens Room B44 Sackville Street Building Telephone Number 65891
Systems of Linear Equations
Excel 2013 Formulas & Functions.
Four of the Best Numeracy focus: Problem solving focus:
Matrices.
Counting Discrete Mathematics.
Same as… Numeracy focus: Problem solving focus:
Expanding two brackets
Matrix Algebra.
ENM 500 Linear Algebra Review
Mathematics for Signals and Systems
Systems of Linear Equations
Warm-Up 3) 1) 4) Name the dimensions 2).
Laboratory in Oceanography: Data and Methods
Presentation transcript:

Here is the result after 1 round when using a fixed increment line search to find minimize mse with respect to the LRATE used: .52 1.47 .52 1 1.47 1 1.31 .68 1 .52 1.47 1 1.15 1.63 .52 1 .84 1.15 1.63 1.15 3.04 3.02 3.06 3.07 2.95 3.00 3.05 3.04 .0525 0.4470620 a b c d e f g h i j k l m n o p q r s t 1 2 3 4 5 6 7 8 LRATE MSE Without line search, using Funk's LRATE=.001, to arrive at ~ same mse (and a nearly identical feature vector) it takes 81 rounds: .76 1.38 .61 .99 1.38 .99 1.34 .74 0.99 .61 1.38 .99 1.16 1.50 .61 .99 .82 1.12 1.50 1.13 3.04 3.01 3.04 3.06 2.92 2.98 3.02 3 .001 0.44721854 Going from the round 1 result (LRATE=.0525) shown here, we do a second round and again do fixed increment line search: .52 1.47 .52 1 1.47 1 1.31 .68 1 .52 1.47 1 1.15 1.63 .52 1 .84 1.15 1.63 1.15 3.04 3.02 3.06 3.07 2.95 3.00 3.05 3.04 .0525 0.447062 .92 1.48 .50 .99 1.47 .98 1.46 .66 .98 0.50 1.47 .98 1.22 1.63 .50 0.98 .75 1.15 1.63 1.17 3.07 3.03 3.07 3.09 2.92 2.99 3.04 3.03 .050 0.387166 .84 1.47 .50 .99 1.47 .99 1.43 .66 .99 0.50 1.47 .99 1.21 1.63 .50 0.98 .77 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .040 0.371007 .76 1.47 .51 .99 1.47 .99 1.40 .67 .99 0.51 1.47 .99 1.19 1.63 .51 0.99 .79 1.15 1.63 1.16 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .030 0.368960 .76 1.47 .51 .99 1.47 .99 1.40 .67 .99 0.51 1.47 .99 1.19 1.63 .51 0.99 .79 1.15 1.63 1.16 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .020 0.380975 We came up with an approximately minimized mse at LRATE=,030. Going from this line search resulting from LRATE=.03, we do another round round: .76 1.47 .51 .99 1.47 .99 1.40 .67 .99 0.51 1.47 .99 1.19 1.63 .51 0.99 .79 1.15 1.63 1.16 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .030 0.368960 .75 1.47 .50 .99 1.47 .98 1.44 .66 .99 0.51 1.47 .99 1.21 1.63 .50 0.98 .76 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.92 2.99 3.04 3.03 .020 0.351217 .75 1.47 .50 .99 1.47 .99 1.42 .66 .99 0.51 1.47 .99 1.20 1.63 .50 0.99 .77 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.92 3.00 3.04 3.03 .010 0.362428 Going from this line search resulting from LRATE=.02, we the same for the next round: .75 1.47 .50 .99 1.47 .98 1.44 .66 .99 0.51 1.47 .99 1.21 1.63 .50 0.98 .76 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.92 2.99 3.04 3.03 .020 0.351217 .74 1.47 .50 .99 1.47 .98 1.46 .66 .98 0.50 1.47 .98 1.22 1.63 .50 0.98 .75 1.15 1.63 1.17 3.07 3.04 3.07 3.09 2.91 2.99 3.04 3.02 .010 0.351899 LRATE=.02 stable, near-optimal? (No further line search). After 200 rounds at LRATE=.02. (note that it took ~2000 rounds without line search and with line search ~219): .83 1.39 .48 .91 1.40 .86 1.52 .61 .98 0.60 1.51 .98 1.15 1.74 .48 0.99 .64 1.10 1.62 1.45 3.28 3.28 3.04 3.48 1.69 2.98 3.12 2.65 .020 0.199358 Comparing this feature vector to the one we got with ~2000 rounds at LRATE=.001 (without line search) we see that we arrive at a very different feature vector: a b c d e f g h i j k l m n o p q r s t 1 2 3 4 5 6 7 8 LRATE 1.48 2.54 .90 1.6 2.6 1.55 2.68 1.15 1.73 1.08 2.67 1.73 2.06 3.08 .90 1.76 1.16 2.07 2.90 2.75 1.86 1.74 1.71 1.94 .87 1.61 1.7 1.5 .001, no ls .83 1.39 .48 .91 1.40 .86 1.52 .61 .98 .60 1.51 .98 1.15 1.74 .48 0.99 .64 1.10 1.62 1.45 3.28 3.28 3.04 3.48 1.69 2.98 3.12 2.65 .020, w ls However, the UserFeatureVector protions differ by constant multiplier and the MovieFeatureVector portions differ by a different constant. If we divide the LR=.001 vector by the LR=.020, we get the following multiplier vector (one is not a dialation of the other but if we split user portion from the movie portion, they are!!! What does that mean!?!?!?! 1.77 1.81 1.85 1.75 1.84 1.79 1.76 1.86 1.75 1.78 1.76 1.75 1.79 1.76 1.85 1.76 1.81 1.86 1.78 1.89 .56 .53 .56 .55 .51 .54 .54 .56 ".001/.020" 1.80 avg 0.04 std 0.54 avg 0.01 std Another interesting observation is that 1 / 1.8 = .55, that is, 1 / AVGufv = AVGmfv. They are reciporicals of oneanother!!! This makes sense since it means, if you double the ufv you have to halve the mfv to get the same predictions. The bottom line is that the predictions are the same! What is the nature of the set of vectors that [nearly] minimize the mse? It is not a subspace (not closed under scalar multiplication) but it is clearly closed under "reciporical scalar multiplication" (multiplying the mfv's by the reciporical of the ufv's multiplier). Waht else can we say about it? So, we get an order of magnitude speedup fromline search. It may be more than that since we may be able to do all the LRATE calculations in parallel (without recalculating the error matrix or feature vectors????). Or we there may be a better search mechanism than fixed increment search. A binary type search? Othere?

This is powerful! We need to optimize the calculations using pTrees!!! A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 1 \a=Z2 2 3 3 5 2 5 3 3 /rvnfv~fv~{goto}L~{edit}+.005~/XImse<omse-.00001~/xg\a~ 3 2 5 1 2 3 5 3 .001~{goto}se~/rvfv~{end}{down}{down}~ 4 3 3 3 5 5 2 /xg\a~ 5 5 3 4 3 6 2 1 2 1 7 4 1 1 4 3 8 4 3 2 5 3 9 1 4 5 3 2 LRATE omse 10 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3 1 3 3 2 0.001 0.1952090 fv A22: +A2-A$10*$U2 /* error for u=a, m=1 */ A30: +A10+$L*(A$22*$U$2+A$24*$U$4+A$26*$U$6+A$29*$U$9) /* updates f(u=a) */ U29: +U9+$L*(($A29*$A$30+$K29*$K$30+$N29*$N$30+$P29*$P$30)/4) /* updates f(m=8 */ AB30: +U29 /* copies f(m=8) feature update in the new feature vector, nfv */ W22: @COUNT(A22..T22) /* counts the number of actual ratings (users) for m=1 */ X22: [W3] @SUM(W22..W29) /*adds ratings counts for all 8 movies = training count*/ AD30: [W9] @SUM(SE)/X22 /* averages se's giving the mse */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 21 working error and new feature vector (nfv) 22 0 0 0 **0 ** 3 6 35 23 0 0 ** 0 ** 0 3 6 24 0 0 0 ** 0 2 5 25 0 ** ** 3 3 26 0 0 **1 3 27 **** ** 0 3 4 28 ** 1 0 ** 3 4 29 ** ** 0 0 2 4 L mse 30 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.1952063 nfv A52: +A22^2 /*squares all the individual erros */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 square errors 53 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 59 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SE 60 --------------------------------------------------------------- 61 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 1 1 3 3 3 3. 2 2 3 2 0.125 0.225073 62 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 1 1 3 3 3 3. 1 2 3 2 0.141 0.200424 63 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 1 1 3 3 3 3. 1 3 3 2 0.151 0.197564 64 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.151 0.196165 65 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.151 0.195222 66 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195232 67 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195228 68 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195224 69 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195221 70 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195218 71 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195214 72 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195211 Notes: In 2 rounds mse is as low as Funk gets it in 2000 rounds. After 5 rounds mse is lower than ever before (and appears to be bottoming out). I know I shouldn't hardcode parameters! Experiments should be done to optimize this line search (e.g., with some binary search for a low mse). Since we have the resulting individual square_errors for each training pair, we could run this, then for mask the pairs with se(u,m) > Threshold. Then do it again after masking out those that have already achieved a low se. But what do I do with the two resulting feature vectors? Do I treat it like a two feature SVD or do I use some linear combo of the resulting predictions of the two (or it could be more than two)? We need to test out which works best (or other modifications) on Netflix data. Maybe on those test pairs for which the training row and column have some high errors, we apply the second feature vector instead of the first? Maybe we invoke CkNN for test pairs in this case (or use all 3 and a linear combo?) This is powerful! We need to optimize the calculations using pTrees!!! /rvnfv~fv copies fv to nfv after converting fv to values. {goto}L~{edit}+.005~ increments L by .005 /XImse<omse-.00001~/xg\a~ IF mse still decreasing, recalc mse with new L .001~ Reset L=.001 for next round {goto}se~/rvfv~{end}{down}{down}~ "value copy" fv to output list /xg\a~ Start over with next round

AC AD AE AF AG AH AI AJ AK AL AM A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAABACADAEAFAGAHAIAJ 1 3 4 3 1 3 3 3 2 3 2 2 5 3 3 3 3 5 1 4 4 4 1 1 3 4 5 5 2 3 3 2 6 3 1 3 4 1 7 3 2 1 3 1 1 3 4 8 4 5 5 9 2 2 2 3 3 1 10 3 1 1 2 3 2 11 4 4 2 3 5 3 3 12 3 13 5 3 3 5 3 3 3 1 14 2 3 3 2 5 15 3 3 2 16 1 1 5 3 1 5 1 3 17 3 2 3 1 2 2 18 2 3 3 1 3 3 19 1 3 4 2 4 3 3 1 4 1 20 1 2 5 3 1 4 4 2 3 AKALAMANAOAPAQARASATAUAVAWAXAY BA 5 2 5 3 1 1 2 3 5 1 3 3 5 5 1 3 4 1 1 2 1 1 4 1 3 4 2 5 1 3 5 1 1 5 1 5 5 5 2 1 1 3 3 2 5 1 4 4 1 3 1 5 2 1 4 4 3 5 1 5 1 3 2 5 1 4 1 4 1 5 2 4 5 3 4 1 A larger example: 20 movies, 51 users 21 1 1 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 2 2 2 1 0 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 A B C D E F G H I J K L M N 102 1.42 1.84 1.79 1.43 1.38 2.38 1.68 1.65 1.61 2.26 1.65 1.38 1.84 2.46 103 1.42 1.84 1.79 1.43 1.38 2.38 1.68 1.65 1.61 2.26 1.64 1.38 1.84 2.45 104 1.40 1.82 1.78 1.41 1.36 2.38 1.65 1.62 1.63 2.23 1.62 1.35 1.81 2.42 105 1.40 1.82 1.78 1.41 1.36 2.37 1.65 1.62 1.63 2.23 1.62 1.35 1.81 2.42 106 1.40 1.82 1.78 1.41 1.35 2.37 1.65 1.62 1.63 2.23 1.62 1.35 1.81 2.42 107 1.40 1.82 1.78 1.41 1.35 2.37 1.65 1.62 1.63 2.23 1.62 1.35 1.81 2.42 108 1.40 1.82 1.78 1.41 1.35 2.37 1.65 1.62 1.63 2.22 1.62 1.35 1.81 2.42 109 0.84 1.07 1.06 0.83 0.80 1.42 0.97 0.95 1.07 1.30 0.95 0.80 1.07 1.43 110 0.83 1.05 1.06 0.82 0.79 1.41 0.96 0.94 1.06 1.29 0.94 0.79 1.05 1.41 111 0.83 1.05 1.06 0.82 0.79 1.41 0.96 0.94 1.06 1.29 0.94 0.79 1.05 1.41 O P Q R S T U V W X Y Z AA AB 1.79 1.38 1.68 1.67 1.83 1.83 1.07 1.37 1.03 1.53 2.99 1.83 1.68 1.14 1.79 1.35 1.66 1.68 1.81 1.82 1.05 1.35 1.01 1.51 2.98 1.81 1.65 1.15 1.79 1.35 1.66 1.68 1.81 1.82 1.05 1.35 1.01 1.50 2.98 1.81 1.65 1.15 1.79 1.35 1.66 1.68 1.80 1.82 1.05 1.35 1.01 1.50 2.98 1.81 1.65 1.15 1.06 0.85 0.98 1.07 1.06 1.07 0.62 0.80 0.59 0.89 1.79 1.07 0.97 0.72 1.05 0.84 0.97 1.06 1.05 1.06 0.61 0.79 0.58 0.88 1.77 1.06 0.96 0.71 AC AD AE AF AG AH AI AJ AK AL AM 2.13 1.22 1.07 1.81 1.82 1.65 1.83 2.14 2.05 2.58 2.11 2.13 1.22 1.07 1.81 1.82 1.65 1.83 2.13 2.05 2.58 2.11 2.15 1.21 1.05 1.80 1.79 1.62 1.81 2.10 2.02 2.62 2.08 2.16 1.21 1.05 1.80 1.79 1.62 1.81 2.10 2.02 2.62 2.08 1.06 0.71 0.62 1.06 1.06 0.95 1.06 1.24 1.16 1.78 1.24 1.05 0.70 0.61 1.06 1.05 0.94 1.05 1.23 1.14 1.77 1.23 AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA 102 1.83 0.94 1.81 2.41 1.53 2.04 1.53 2.44 1.83 1.69 2.90 2.67 1.69 1.67 103 1.83 0.94 1.81 2.41 1.53 2.04 1.53 2.44 1.82 1.69 2.90 2.67 1.69 1.69 104 1.81 0.93 1.78 2.40 1.51 2.02 1.50 2.40 1.80 1.66 2.86 2.63 1.71 1.71 105 1.81 0.93 1.78 2.40 1.51 2.02 1.50 2.40 1.80 1.66 2.86 2.63 1.71 1.71 106 1.81 0.93 1.78 2.40 1.51 2.01 1.50 2.40 1.79 1.66 2.86 2.63 1.71 1.71 107 1.80 0.93 1.78 2.40 1.51 2.01 1.50 2.40 1.79 1.66 2.86 2.63 1.71 1.71 108 1.80 0.93 1.78 2.40 1.51 2.01 1.50 2.40 1.79 1.66 2.86 2.63 1.71 1.71 109 1.06 0.53 1.27 1.43 0.89 1.78 0.88 1.42 1.07 0.98 1.69 1.56 2.82 2.80 110 1.05 0.52 1.26 1.41 0.88 1.76 0.88 1.41 1.05 0.97 1.67 1.55 2.84 2.84 111 1.05 0.52 1.26 1.41 0.88 1.76 0.88 1.40 1.05 0.96 1.67 1.55 2.84 2.84 BB BC BD BE BF BG BH BI BJ BK BL BM BN BO 1.68 1.68 1.69 1.67 1.68 1.68 1.68 1.68 1.68 1.67 1.68 1.67 1.70 1.68 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 2.81 2.80 2.83 2.80 2.81 2.77 2.81 2.80 2.81 2.80 2.81 2.80 2.83 2.82 2.84 2.84 2.84 2.84 2.84 2.83 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 BP BQ BR BS BT BU 1.67 1.69 1.68 1.68 0.033 1.273270285 1.69 1.69 1.69 1.69 0.001 1.273931431 1.71 1.71 1.71 1.71 0.008 1.277048359 1.71 1.71 1.71 1.71 0.001 1.277041496 1.71 1.71 1.71 1.71 0.001 1.277857118 1.71 1.71 1.71 1.71 0.001 1.277865307 1.71 1.71 1.71 1.71 0.001 1.277874155 2.80 2.82 2.81 2.81 0.091 1.097390432 2.84 2.84 2.84 2.84 0.012 1.095913142 2.84 2.84 2.84 2.84 0.002 1.095769326

Square Errors: A B C D E F G H I J K L M N 1 0.38 1.54 0.10 0.38 0.55 0.00 0.98 0.10 0.14 0.05 0.46 1.57 0.00 0.06 3.08 0.00 0.14 1.01 0.11 1.03 0.00 0.55 4.02 2.75 1.75 6.87 0.00 0.55 0.97 0.45 0.00 0.55 1.89 1.79 0.96 0.00 0.45 1.02 0.14 0.00 1.57 0.05 0.10 0.06 1.89 0.45 3.97 O P Q R S T U V W X Y Z AA AB 0.35 0.45 0.05 0.06 3.97 1.97 0.56 1.76 1.54 0.00 1.01 1.55 0.45 3.08 5.03 0.00 3.08 0.06 0.16 0.00 0.00 0.05 0.00 0.55 0.00 0.25 0.35 0.55 2.26 3.94 3.08 1.01 0.00 0.00 1.57 1.54 0.00 0.24 4.05 0.35 0.56 2.23 AC AD AE AF AG AH AI AJ AK AL AM AN AO AP 0.00 0.00 0.00 0.24 1.94 0.45 2.22 6.27 4.02 0.00 3.97 0.06 0.25 1.55 0.98 2.80 0.98 0.24 0.25 4.02 0.00 0.06 2.23 1.94 0.06 0.00 0.00 0.56 5.40 2.27 0.25 6.79 1.01 0.00 0.55 0.98 0.00 0.00 0.36 0.98 0.56 0.24 0.98 1.01 0.25 2.23 1.01 0.15 AQ AR AS AT AU AV AW AX AY 1.01 0.25 0.05 0.35 1.02 0.00 0.05 2.23 5.80 2.26 1.54 0.00 1.01 0.05 0.98 2.26 0.05 0.98 1.01 0.57 0.35 2.23 0.98 0.97 0.25 0.00 0.35 6.21 2.26 0.98 0.00 1.01 0.58

Square Errors > 2: Retraining only se>2 se<=2 A B C D E F G H I J K L M N 3.38 4.44 2.47 6.70 2.03 3.57 O P Q R S T U V W X Y Z AA AB 3.60 3.39 4.62 3.41 2.51 3.55 4.47 AC AD AE AF AG AH AI AJ AK AL AM 3.75 6.63 7.35 3.68 3.20 4.33 2.02 3.75 4.88 2.60 4.25 2.02 AN AO AP AQ AR AS AT AU AV AW AX AY 2.36 6.35 2.50 2.53 5.83 Retraining only se>2 0.033 1.273270285 0.001 1.273931431 0.008 1.277048359 0.001 1.277041496 0.001 1.277857118 0.001 1.277865307 0.001 1.277874155 0.067 1.180492461 0.001 1.180482847 0.001 1.181607655 0.001 1.181604520 0.001 1.181601449 0.001 1.181598444 0.001 1.181595501 0.001 1.181592619 0.001 1.181589797 0.001 1.181587032 0.001 1.181584322 0.001 1.181581668 se<=2 0.028 0.064463201 0.001 0.064463129 0.001 0.064463320 0.001 0.064463319 0.001 0.064463318 0.001 0.064463317 0.001 0.064463316 0.001 0.064463315 0.001 0.064463314

The resulting Square Errors after training those with Square Errors > 2: A B C D E F G H I J K L M N 0.39 0.06 0.39 0 2.29 0 0 0 0 0 0 0.55 0.39 0 2.86 0 0 0 0.26 5.32 0 19.1 0 0 0 0 2.29 0.39 0.47 0 0 0 0 0.39 0 0 0.06 0 0 0.39 0 12.1 O P Q R S T U V W X Y Z AA AB 0 0 0.06 13.9 0 0 0 0.06 0 1.59 0 0 0.55 10.4 0 0.58 0 0 0 0 0.06 0 0 1.59 0.06 0 0.55 3.74 0.58 1.59 0 0 0 3.12 0 0.06 4.26 0 0 0.06 AC AD AE AF AG AH AI AJ AK AL AM 0 0 0 0 7.74 4.11 0.25 0.71 0.63 0 14.3 0 0 1.45 1.06 0 0.25 3.40 0.04 0.04 0 9.94 7.74 0 0 0.04 0 8.82 2.23 3.40 1.48 0 0 0 0 1.45 0 4.91 0.04 0 3.40 0.04 1.45 0.25 9.94 0 4.91 AN AO AP AQ AR AS AT AU AV AW AX AY 2.28 0 0.16 0 0.00 0 0.25 2.54 0.26 0 0 0 0 0.24 0 0 0 0 0.16 0.25 0 0 2.28 0 0.16 12.1 0.24 0 0.00 0 0 Retraining only se>4 0.001 1.207117335 0.001 1.207116921 0.001 1.207116518 0.001 1.207116126 0.001 1.207115745 0.001 1.207115374 0.001 1.207115014

The resulting Square Errors after training those with Square Errors>2, then se>4: A B C D E F G H I J K L M N 0.39 0 0.39 0 1.58 0 0 0 0 0 0 0 0.39 0 1.81 0 0 0 1.58 7.03 0 19.1 0 0 0 0 1.58 0.39 1.81 0 0 0 0 0.39 0 0 0 0 0 0.39 0 13.9 0 0 0 13.9 0 0 0 0 0 1.58 0 0 0 13.9 0 1.58 0 0 0 0 0 0 0 0 0 0.26 1.58 1.58 0 0 0 1.58 0 0 0.23 0 0 0 AC AD AE AF AG AH AI AJ AK AL AM 0 0 0 0 8.92 2.83 0 2.81 0 0 15.9 0 0 1.01 2.83 0 0 2.81 1.01 0 0 11.0 8.92 0 0 10.9 0 2.81 4.04 0 0 0 0 1.01 0 4.04 0 0 2.81 0 1.01 0 11.0 0 4.04 AN AO AP AQ AR AS AT AU AV AW AX AY 1.58 0 0.16 0 0.00 0 2.55 1.58 0 0 0 0 0 0 0 0 0 0.16 0 1.58 0 0.16 14.0 0.00 0 0

Where are we now wrt PSVD? Clearly line search is a good idea. How good? (speedup?, accuracy comparisons?) What about 2nd [3rd?, 4th?, ...] feature vector training? How to generate those? (Probably just a matter of understanding Funk's code). What "retraining under mask" steps are breakthroughs? improve accuracy markedly? improve speed markedly? What speedup shortcuts can we [as mindless engineers ;-) ] come up with. By "mindless" I mean only that trial and error is probably the best way to find these speedups, unless you can understand the mathematics). Maybe Dr. Ubhaya? What speedup shortcuts can we come up with to execute Md's PTreeSet Algebra Procedures? These speedups can be "mindless" or "magic" - we'll take them anyway!. Again, by "mindless" I mean that trial and error is used to find lucky speedups - unless you can fully understand the mathematics, it's mindless ;-) Maybe Dr. Ubhaya can do the math for us? I will suggest the following: "The more the Mathematics is understood the better the mindless engineering tricks work!" What speedup shortcuts can we come up with? Involving Md's PTreeSet Algebra? These speedups can be "mindless" or "magic", we'll take them anyway!. By "mindless" I mean that trial and error is used to find lucky speedups - unless you can fully understand the mathematics, it's mindless ;-) Maybe Dr. Ubhaya can do the math for us? I will suggest the following: "The more the Mathematics is understood the better the mindless engineering tricks work!" In RECOMMENDERs, we have people (users, customers, websearchers...) and things (products, movies, items, documents, webpages or?) We also often have text (product descriptions, movie features, item descriptions, document contents, webpage contents...), which can be handled as entity description columns or by introducing a third entity, terms (content terms, stems of content terms, ...). So we have three entities and three relationships in a cyclic 2 hop rolodex structure (or what we called BUP "Bi-partite, Uni-partite on Part" structure). A lifetime of fruitful research lurks in this arena. We can use one relationship to restrict (mask entities instances in) an adjacent relationship. I firmly believe pTree structuring is the way to do this. We can add a people-to-people relationship also (ala, facebook friends) and richen the information content significantly. We should add tweats to this somehow. Since I don't tweat, I'm probably not the one to suggest how this should fit in, but I will anyway ;-) Tweats (seem to be) mini-documents describing documents or mini-documents describing people, or possibly even mini-documents describing terms (e.g, if a buzzword becomes hot in the media, people tweat about it????) Let's call this research arena the VERTICAL RECOMMENDER arena. It's hot! Who's going to be the Master Chef in this Hell's Kitchen?