.52 1.47.52 1 1.47 1 1.31.68 1.52 1.47 1 1.15 1.63.52 1.84 1.15 1.63 1.15 3.04 3.02 3.06 3.07 2.95 3.00 3.05 3.04.0525 0.4470620 a b c d e f g h i j k.

Slides:



Advertisements
Similar presentations
Matrix Multiplication Hyun Lee, Eun Kim, Jedd Hakimi.
Advertisements

Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.
Mathematics SL Internal Assessment
B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.
UNIVERSITY OF JYVÄSKYLÄ Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer Environment Master’s Thesis by Joni Töyrylä
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
© 2005 Baylor University Slide 1 Fundamentals of Engineering Analysis EGR 1302 Unit 1, Lecture B Approximate Running Time - 24 minutes Distance Learning.
Multiple regression analysis
Assessing cognitive models What is the aim of cognitive modelling? To try and reproduce, using equations or similar, the mechanism that people are using.
Review of Matrix Algebra
Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,
College Algebra Fifth Edition James Stewart Lothar Redlin Saleem Watson.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
259 Lecture 14 Elementary Matrix Theory. 2 Matrix Definition  A matrix is a rectangular array of elements (usually numbers) written in rows and columns.
Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,
Positional Number Systems
Sundermeyer MAR 550 Spring Laboratory in Oceanography: Data and Methods MAR550, Spring 2013 Miles A. Sundermeyer Linear Algebra & Calculus Review.
Array Math.
G = (  n  SUPu 1 e(u 1,n)FM n,...,  n  SUPu lastu e(u lastu,n)FM n,...,  v  SUPm 1 e(v,m 1 )UF v,...,  v  SUPlastm 1 e(v,m lastm )UF v ) 0 = dsse(t)/dt.
Multivariate Statistics Matrix Algebra II W. M. van der Veld University of Amsterdam.
SWC Spring 2010 Application Version SWC Spring 2010 Select Folder 2.
Teach A Level Maths Vectors for Mechanics. Volume 4: Mechanics 1 Vectors for Mechanics.
MATH 3581 — College Geometry — Spring 2010 — Solutions to Homework Assignment # 3 B E A C F D.
1 C ollege A lgebra Systems and Matrices (Chapter5) 1.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Vectors Addition is commutative (vi) If vector u is multiplied by a scalar k, then the product ku is a vector in the same direction as u but k times the.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Elementary Linear Algebra Anton & Rorres, 9th Edition
Biol 304 Week 3 Equilibrium Binding Multiple Multiple Binding Sites.
Prepared by Deluar Jahan Moloy Lecturer Northern University Bangladesh
Fundamentals of Engineering Analysis
Check out the ebook on FPGAs and DP. 2points about the topic: 1. Thinking about FPGA DM together with the raging debate about the efficacy of Non-SQL,
HPVD using error = rating 2 NotEqual prediction 2 A B C D E F G H I J K L M N O P Q R S
Chapter 1 Section 1.5 Matrix Operations. Matrices A matrix (despite the glamour of the movie) is a collection of numbers arranged in a rectangle or an.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Computer Graphics Matrices
MATRIX A set of numbers arranged in rows and columns enclosed in round or square brackets is called a matrix. The order of a matrix gives the number of.
Divide and Conquer (Part II) Multiplication of two numbers Let U = (u 2n-1 u 2n-2 … u 1 u 0 ) 2 and V = (v 2n-1 v 2n-2 …v 1 v 0 ) 2, and our goal is to.
Computer Science 1620 Sorting. cases exist where we would like our data to be in ascending (descending order) binary searching printing purposes selection.
Revision on Matrices Finding the order of, Addition, Subtraction and the Inverse of Matices.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Take a feature vector, fv=fv u fv m of SVD, let fv(t) = t*v (this is for the line search of SVD). To do the line search, mse(t) = 1/|Ratings|  (v,m) 
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Lower Bounds & Sorting in Linear Time
Elementary Matrix Theory
The Euclidean Algorithm
Linear Algebra Review.
Statistical Data Analysis - Lecture /04/03
Warm-Up 3) 1) 4) Name the dimensions 2).
10.5 Inverses of Matrices and Matrix Equations
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
Excel Formulas & Functions.
Chapter 4 Divide-and-Conquer
Applied Discrete Mathematics Week 4: Number Theory
Excel 2013 Formulas & Functions.
Algebraic Vectors Section 8-2.
Number and String Operations
Excel 2013 Formulas & Functions.
College Algebra Fifth Edition
Dr Huw Owens Room B44 Sackville Street Building Telephone Number 65891
Systems of Linear Equations
Parallel Sorting Algorithms
Lower Bounds & Sorting in Linear Time
Excel 2013 Formulas & Functions.
Systems of Linear Equations
Warm-Up 3) 1) 4) Name the dimensions 2).
Here is the result after 1 round when using a fixed increment line search to find minimize mse with respect to the LRATE used:
Laboratory in Oceanography: Data and Methods
Presentation transcript:

a b c d e f g h i j k l m n o p q r s t LRATE MSE Without line search, using Funk's LRATE=.001, to arrive at ~ same mse (and a nearly identical feature vector) it takes 81 rounds: Going from the round 1 result (LRATE=.0525) shown here, we do a second round and again do fixed increment line search: We came up with an approximately minimized mse at LRATE=,030. Going from this line search resulting from LRATE=.03, we do another round round: Going from this line search resulting from LRATE=.02, we the same for the next round: LRATE=.02 stable, near-optimal? (No further line search). After 200 rounds at LRATE=.02. (note that it took ~2000 rounds without line search and with line search ~219): Comparing this feature vector to the one we got with ~2000 rounds at LRATE=.001 (without line search) we see that we arrive at a very different feature vector: , no ls a b c d e f g h i j k l m n o p q r s t LRATE , w ls However, the UserFeatureVector protions differ by constant multiplier and the MovieFeatureVector portions differ by a different constant. If we divide the LR=.001 vector by the LR=.020, we get the following multiplier vector (one is not a dialation of the other but if we split user portion from the movie portion, they are!!! What does that mean!?!?!?! ".001/.020" 1.80 avg 0.04 std0.54 avg 0.01 std Another interesting observation is that 1 / 1.8 =.55, that is, 1 / AVGufv = AVGmfv. They are reciporicals of oneanother!!! This makes sense since it means, if you double the ufv you have to halve the mfv to get the same predictions. The bottom line is that the predictions are the same! What is the nature of the set of vectors that [nearly] minimize the mse? It is not a subspace (not closed under scalar multiplication) but it is clearly closed under "reciporical scalar multiplication" (multiplying the mfv's by the reciporical of the ufv's multiplier). Waht else can we say about it? So, we get an order of magnitude speedup fromline search. It may be more than that since we may be able to do all the LRATE calculations in parallel (without recalculating the error matrix or feature vectors????). Or we there may be a better search mechanism than fixed increment search. A binary type search? Othere? Here is the result after 1 round when using a fixed increment line search to find minimize mse with respect to the LRATE used:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 1 \a=Z /rvnfv~fv~{goto}L~{edit}+.005~/XImse<omse ~/xg\a~ ~{goto}se~/rvfv~{end}{down}{down}~ /xg\a~ LRATE omse fv A22: +A2-A$10*$U2 /* error for u=a, m=1 */ A30: +A10+$L*(A$22*$U$2+A$24*$U$4+A$26*$U$6+A$29*$U$9) /* updates f(u=a) */ U29: +U9+$L*(($A29*$A$30+$K29*$K$30+$N29*$N$30+$P29*$P$30)/4) /* updates f(m=8 */ AB30: +U29 /* copies f(m=8) feature update in the new feature vector, nfv */ /* counts the number of actual ratings (users) for m=1 */ X22: /*adds ratings counts for all 8 movies = training count*/ AD30: /* averages se's giving the mse */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 21 working error and new feature vector (nfv) **0 ** ** 0 ** ** ** ** ** **** ** ** 1 0 ** ** ** L mse nfv A52: +A22^2 /*squares all the individual erros */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD square errors SE /rvnfv~fv copies fv to nfv after converting fv to values. {goto}L~{edit}+.005~ increments L by.005 /XImse<omse ~/xg\a~ IF mse still decreasing, recalc mse with new L.001~ Reset L=.001 for next round /xg\a~ Start over with next round {goto}se~/rvfv~{end}{down}{down}~ "value copy" fv to output list Notes: In 2 rounds mse is as low as Funk gets it in 2000 rounds. After 5 rounds mse is lower than ever before (and appears to be bottoming out). I know I shouldn't hardcode parameters! Experiments should be done to optimize this line search (e.g., with some binary search for a low mse). Since we have the resulting individual square_errors for each training pair, we could run this, then for mask the pairs with se(u,m) > Threshold. Then do it again after masking out those that have already achieved a low se. But what do I do with the two resulting feature vectors? Do I treat it like a two feature SVD or do I use some linear combo of the resulting predictions of the two (or it could be more than two)? We need to test out which works best (or other modifications) on Netflix data. Maybe on those test pairs for which the training row and column have some high errors, we apply the second feature vector instead of the first? Maybe we invoke CkNN for test pairs in this case (or use all 3 and a linear combo?) This is powerful! We need to optimize the calculations using pTrees!!!

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAABACADAEAFAGAHAIAJ AKALAMANAOAPAQARASATAUAVAWAXAY BA A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT Lrate MSE A larger example: 20 movies, 51 users (same as last time except I found errors in my code, which I corrected. The last two red lines are printouts of the two steps in the initial line search (on the way to the first result line at MSE= ). The two vectors should be co-linear (generate the same line) or else I am not doing line search!! They are clearly not co-linear. Thus I have a more code mistake. This is why a C# versions is desparately needed!! How is that coming?

Where are we now wrt PSVD? Clearly line search is a good idea. How good? (speedup?, accuracy comparisons?) What about 2nd [3rd?, 4th?,...] feature vector training? How to generate those? (Probably just a matter of understanding Funk's code). What "retraining under mask" steps are breakthroughs? improve accuracy markedly? improve speed markedly? What speedup shortcuts can we [as mindless engineers ;-) ] come up with. By "mindless" I mean only that trial and error is probably the best way to find these speedups, unless you can understand the mathematics). Maybe Dr. Ubhaya? What speedup shortcuts can we come up with to execute Md's PTreeSet Algebra Procedures? These speedups can be "mindless" or "magic" - we'll take them anyway!. Again, by "mindless" I mean that trial and error is used to find lucky speedups - unless you can fully understand the mathematics, it's mindless ;-) Maybe Dr. Ubhaya can do the math for us? I will suggest the following: "The more the Mathematics is understood the better the mindless engineering tricks work!" What speedup shortcuts can we come up with? Involving Md's PTreeSet Algebra? These speedups can be "mindless" or "magic", we'll take them anyway!. By "mindless" I mean that trial and error is used to find lucky speedups - unless you can fully understand the mathematics, it's mindless ;-) Maybe Dr. Ubhaya can do the math for us? I will suggest the following: "The more the Mathematics is understood the better the mindless engineering tricks work!" In RECOMMENDERs, we have people (users, customers, websearchers...) and things (products, movies, items, documents, webpages or?) We also often have text (product descriptions, movie features, item descriptions, document contents, webpage contents...), which can be handled as entity description columns or by introducing a third entity, terms (content terms, stems of content terms,...). So we have three entities and three relationships in a cyclic 2 hop rolodex structure (or what we called BUP "Bi-partite, Uni-partite on Part" structure). A lifetime of fruitful research lurks in this arena. We can use one relationship to restrict (mask entities instances in) an adjacent relationship. I firmly believe pTree structuring is the way to do this. We can add a people-to-people relationship also (ala, facebook friends) and richen the information content significantly. We should add tweats to this somehow. Since I don't tweat, I'm probably not the one to suggest how this should fit in, but I will anyway ;-) Tweats (seem to be) mini-documents describing documents or mini-documents describing people, or possibly even mini-documents describing terms (e.g, if a buzzword becomes hot in the media, people tweat about it????) Let's call this research arena the VERTICAL RECOMMENDER arena. It's hot! Who's going to be the Master Chef in this Hell's Kitchen?