Using a 3-dim DSR(Document Sender Receiver) matrix and

Slides:



Advertisements
Similar presentations
Computer vision: models, learning and inference Chapter 8 Regression.
Advertisements

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Lecture 22 Adjunct Methods. Part 1 Motivation Motivating scenario We want to predict tomorrow’s weather, u(t) … We have a atmospheric model chugging.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Solving systems using matrices
Near Duplicate Detection
RoloDex Model The Data Cube Model gives a great picture of relationships, but can become gigantic (instances are bitmapped rather than listed, so there.
Classification and Prediction: Regression Analysis
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Designed by David Jay Hebert, PhD Problem: Add the first 100 counting numbers together … We shall see if we can find a fast way of doing.
Collaborative Filtering Matrix Factorization Approach
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
G = (  n  SUPu 1 e(u 1,n)FM n,...,  n  SUPu lastu e(u lastu,n)FM n,...,  v  SUPm 1 e(v,m 1 )UF v,...,  v  SUPlastm 1 e(v,m lastm )UF v ) 0 = dsse(t)/dt.
Toward a Unified Theory of Data Mining DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH Assume a Partition has uniquely labeled components.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
1234 G Exp G So as not to duplicate axes, this copy of G should be folded over to coincide with the other copy, producing a "conical" unipartite.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Level-0 FAUST for Satlog(landsat) is from a small section (82 rows, 100 cols) of a Landsat image: 6435 rows, 2000 are Tst, 4435 are Trn. Each row is center.
October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
R r vv r m R r v v v v r r v m V v r v v r v Oblique FAUST Clustering P R = P (X dot d)
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
EDU=E S# C# SNAME AGE CNAME SITE GRADE 17 5 BAID 19 3UA ND CLAY 21 3UA NJ CLAY 21 CUS ND THAISZ 18 3UA NJ THAISZ 18 CUS.
Using FlexTraining.
Let try to identify the conectivity of these entity relationship
Statistics 202: Statistical Aspects of Data Mining
Support vector machines
Item-Based P-Tree Collaborative Filtering applied to the Netflix Data
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Toward a Unified Theory of Data Mining DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH Assume a Partition has uniquely.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Near Duplicate Detection
DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH
Classification with Perceptrons Reading:
The vertex-labelled, edge-labelled graph
RANSAC and mosaic wrap-up
Data Mining Lecture 11.
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
LSI, SVD and Data Management
Fitting Curve Models to Edges
Step-By-Step Instructions for Miniproject 2
Collaborative Filtering Matrix Factorization Approach
DSR is binary (1 means doc was sent by Sender to Reciever)
Fundamentals of Data Representation
Word Embedding Word2Vec.
North Dakota State University Fargo, ND USA
Functional Analytic Unsupervised and Supervised data mining Technology
The Multi-hop closure theorem for the Rolodex Model using pTrees
Separating Style and Content with Bilinear Models Joshua B
Spreadsheets, Modelling & Databases
Machine Learning: UNIT-4 CHAPTER-1
CSE 491/891 Lecture 25 (Mahout).
North Dakota State University Fargo, ND USA
Support vector machines
Separating Style and Content with Bilinear Models Joshua B
Recommendation Systems
Recommender Systems Group 6 Javier Velasco Anusha Sama
Database Management system
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Communications Analytics prediction and anomaly detection for emails, tweets, phone, text Using a 3-dim DSR(Document Sender Receiver) matrix and 2-dim TD(Term,Doc) and UT(User Term) matrixes. fR The pSVD trick is to replace these massive relationship matrixes with small feature matrixes. fR fS f1,S f2,S 1 fS 1 Using just one feature, replace with vectors, f=fDfTfUfSfR or f=fDfTfU DSR U 2 3 4 5 TD 1 D T UT fD 1 D 2 3 4 5 fD 1 DSR rec  sender  fT 1 fT 1 Replace DSR with fD, fS, fR 5 4 3 T 2 TD fD 1 Replace TD with fT and fD fD 1 FU 1 fU 1 U 2 3 4 5 UT fT 1 fT 1 Replace UT with fU and fT feature matrixes (2 features) Use GradientDescent+LineSearch to minimize sum of square errors, sse, where sse is the sum over all nonblanks in TD, UT and DSR. Should we train User feature segments separately (train fU with UT only and train fS and fR with DSR only?) or train U with UT and DSR, then let fS = fR = fU , so f = This will be called 3D f. <----fD----> 1 <----fT----> <----fU----> <----fS----> <----fR----> Or training User the feature segment just once, f = This will be called 3DTU f <----fD----> 1 <----fT----> <fU=fS=fR> We do pTrees conversions and train F in the CLOUD; then download the resulting F to user's personal devices for predictions, anomaly detections. The same setup should work for phone record Documents, tweet Documents (in the US Library of Congress) and text Documents, etc.

(Problem to solve: mechanism for SVD prediction of Sender?) 3DTU: Structure relationship as a rotatable matrix, then create PTreeSets for each rotation (attach entity tbl PTreeSet to its rotation Always treat an entity as an attr of another entity if possible? Rather than add it as a new dimension of a matrix? E.g., Treat Sender as a Document attribute instead of as the 3rd dim of matix DSR. The reason: Sender is a candidate key for Doc (while Receiver is not). (Problem to solve: mechanism for SVD prediction of Sender?) Sender CT LN 1 3 1 2 1 2 DR DT 3 5 1 4 1 1 2 1 1 D TD 3 5 4 1 TU 1 2 4 5 3 1 RD 1 pDRR1 1 pDRR2 pDRMask pDSh,0 1 pDS,1 pDCT,0 1 pDCT,1 pDLN,0 1 pDLN,1 2 3 T pDTT1,2 1 pDTT1,1 pDTT1,0 pDTT2,2 pDTT2,1 pDTT2,0 pDTT2,Mask pDTT1,Mask pDTT3,2 pDTT3,1 pDTT3,0 pDTT3,Mask UT U 3 5 4 1 1 2 1 2 pTDD1,2 1 pTDD1,1 pTDD1,0 pTDD1,Mask pTDD2,2 pTDD2,1 pTDD2,0 pTDD2,Mask Only provide blankmask when blanks pTUU1,2 1 pTUU1,1 pTUU1,0 pTUU1,Mask pTUU2,2 pTUU2,1 pTUU2,0 pTUU2,Mask pTrees might be provided for DST (SendTime) and D(LN (Length): pUTT1,2 pUTT1,1 1 pUTT1,0 pUTT1,Mask pUTT2,2 pUTT2,1 pUTT2,0 pUTT3,2 pUTT3,1 pUTT3,0 pUTT2,Mask pRDD1,2 1 pRDD2,1 pRDMask

Here we try a comprehensive comparison of the 3 alternatives, 3D (DSR); 2D (DS, DR); DTU(2D) [em9 em10] DT 1 3 4 5 UT 5 5 u1 5 u2 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 d2 1 d2 1 1 1 1 1 1 1 1 1 1 1 sseDTU 65.198 tDSU 1.1 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 85.339 t2D 1.2 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 sse3D 88.579 t3D 1.06 5.8 7.0 4.9 -1. 1.8 7.0 1.7 -4. -4. -4. -4. sseDTU 59.968 tDSU 0.028 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 21.766 t2D 0.14 1.8 2.0 1.7 0.8 1.3 2.0 1.3 0.4 0.4 0.4 0.4 sse3D 47.721 t3D 0.14 0.6 1.2 -2. 1.5 7.4 -2. -5. 0.0 -0. 0.1 -0. sseDTU 59.934 tDSU -0.001 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 13.612 t2D 0.056 1.9 2.1 1.4 1.0 2.1 1.7 0.7 0.4 0.4 0.4 0.4 sse3D 36.011 t3D 0.11 0.3 1.5 -0. 0.2 0.2 1.5 -0. 0.0 -0. -0. -0. sseDTU 59.900 tDSU -0.002 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 11.576 t2D 0.08 1.9 2.3 1.4 1.0 2.1 1.9 0.6 0.4 0.3 0.4 0.4 sse3D 35.266 t3D 0.09 -0. -0. -1. 0.0 -0. -0. -0. 0.0 -0. 0.0 -0. sseDTU 59.899 tDSU -0.001 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 11.337 t2D 0.04 1.9 2.3 1.3 1.0 2.0 1.8 0.6 0.4 0.3 0.4 0.3 sse3D 34.936 t3D 0.1 DT 1 3 4 5 UT 3 5 4 u1 1 2 1 u2 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 1 d2 1 d2 1 1 1 1 1 1 1 1 1 1 1 sseDTU 65.066 tDSU 1.12 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 59.455 t2D 1.24 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 sse3D 64.380 t3D 1.09 4.6 9.1 4.8 -3. 3.4 9.1 0.4 -3. -4. -3. -4. sseDTU 60.875 tDSU 0.025 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 19.197 t2D 0.11 1.6 2.2 1.7 0.6 1.5 2.2 1.1 0.6 0.4 0.6 0.4 sse3D 13.877 t3D 0.129 -0. 1.1 0.3 2.8 6.0 -1. -4. 0.4 -0. 0.2 -0. sseDTU 60.841 tDSU -0.001 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 15.334 t2D 0.086 1.5 2.4 1.7 0.9 2.1 2.0 0.6 0.6 0.3 0.6 0.4 sse3D 6.0480 t3D 0.11 -0. -0. 1.8 0.9 0.2 -0. 0.8 0.4 -1. -0. -0. sseDTU 60.808 tDSU -0.002 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 14.196 t2D 0.07 1.5 2.3 1.9 1.1 2.2 2.0 0.7 0.7 0.2 0.5 0.2 sse3D 5.1468 t3D 0.121 -0. -0. 0.2 0.1 0.3 -0. -0. 0.4 -0. -0. -0. sseDTU 60.806 tDSU -0.001 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 14.151 t2D -0.015 1.5 2.3 1.9 1.1 2.2 2.0 0.7 0.7 0.1 0.5 0.2 sse3D 5.0888 t3D 0.06 DT 1 4 3 4 5 2 UT 5 5 u1 5 u2 t1 t2 t3 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 d2 1 d2 1 1 1 1 1 1 1 1 1 1 1 sseDTU 65.066 tDSU 1.12 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 85.339 t2D 1.2 1 1 1 1 1 1 1 1 1 1 1 sse3D 89 t3D 1 6 7 5 -1 4 7 2 -3 -3 -3 -3 sseDTU 60.875 tDSU 0.025 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 21.460 t2D 0.13 1.9 2.0 1.7 0.8 1.6 2.0 1.3 0.5 0.5 0.5 0.5 sse3D 44.253 t3D 0.15 0.0 0.9 -2. 1.3 5.2 -2. -5. -0. -0. -0. -0. sseDTU 60.812 tDSU -0.003 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 13.511 t2D 0.084 1.9 2.1 1.4 0.9 2.1 1.7 0.7 0.5 0.4 0.5 0.4 sse3D 36.291 t3D 0.11 0.7 1.7 -0. 0.3 0.3 1.8 -0. -0. -1. -0. -0. sseDTU 60.809 tDSU -0.001 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 12.098 t2D 0.082 1.9 2.3 1.4 1.0 2.2 1.9 0.7 0.5 0.3 0.4 0.3 sse3D 35.339 t3D 0.09 DT 1 3 4 5 UT 3 5 4 u1 1 2 1 u2 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 d2 1 d2 1 1 1 1 1 1 1 1 1 1 1 sseDTU 65.066 tDSU 1.12 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 42.594 t2D 1.37 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 sse3D 50.552 t3D 1.19 3.9 9.2 4.4 -2. 3.4 9.2 -0. -3. -3. -3. -3. sseDTU 60.875 tDSU 0.025 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 21.909 t2D 0.12 1.6 2.2 1.6 0.9 1.5 2.2 1.1 0.7 0.7 0.7 0.7 sse3D 10.604 t3D 0.11 -0. 2.0 0.8 2.3 6.3 -0. -4. 1.3 0.4 0.8 0.8 sseDTU 60.841 tDSU -0.001 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 16.522 t2D 0.09 1.5 2.3 1.7 1.1 2.0 2.1 0.8 0.8 0.7 0.8 0.8 sse3D 4.3033 t3D 0.08 -0. -0. 1.1 0.2 1.1 -0. -1. 0.8 -3. -1. -1. sseDTU 60.808 tDSU -0.002 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 15.626 t2D 0.04 1.5 2.3 1.8 1.1 2.1 2.1 0.7 0.9 0.6 0.7 0.7 sse3D 3.8386 t3D 0.052 -0. -0. 1.0 0.2 1.3 -0. -0. 1.1 -1. -0. -0. sseDTU 60.806 tDSU -0.001 T1 T2 T3 D1 D2 U1 U2 S1 S2 R1 R2 sse2D 15.599 t2D 0.01 1.5 2.3 1.7 1.1 2.1 2.1 0.7 0.8 0.6 0.7 0.7 sse3D 3.8098 t3D -0.02

Comprehensive comparison of 3 alternatives DTU [em11] 2D(DT,UT,DS,DR); 3D(DTD,TDT,TUT,UUT,DDSR,SDSR,RDSR) DT 1 3 4 5 UT 3 5 4 u1 1 2 1 u2 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 d2 1 d2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 TD1 TD2 TD3 DT1 DT2 TU1 TU2 TU3 DSR1 DSR2 U1 U2 S1 S2 1.19 1.19 1.19 1.19 1.19 1.19 1.19 1.19 1.19 1.19 1.19 1.19 1.19 1.19 DT _________ e 1 3 sse t -0.4 1.58 4 5 65.06 DTU 1.12 2.58 3.58 42.59 t2D 1.37 fT 1.19 1.19 1.19 UT 50.55 t3D 1.19 e 3 5 4 u1 1.58 3.58 2.58 1 2 1 u2 -0.4 0.58 -0.4 t1 t2 t3 fT 1.19 1.19 1.19 DSR e fS 1 1 d1 r1 -0.6 -0.6 1.19 1 1 d2 -0.6 -0.6 1.19 s1 s2 fD 1.19 1.19 e fS 1 1 d1 r2 -0.6 -0.6 1.19 2.57 4.26 1.88 1.38 7.33 1.389 1.38 1.389 -3.8 -3.8 9.22 -0.2 -3.8 -3.8 DT 1 3 4 5 UT 3 5 4 u1 1 2 1 u2 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 d2 1 d2 -3.3 3.64 1.57 0.06 3.13 -3.00 1.34 0.203 0 -2.5 0.26 -2.2 1.50 0.41 TD1 TD2 TD3 DT1 DT2 TU1 TU2 TU3 DSR1 DSR2 U1 U2 S1 S2 1.29 2.06 1.57 1.38 2.45 1.159 1.48 1.399 0.64 0.45 2.50 0.97 0.75 0.67 DT _________ e 1 0 3 sse t -0.8 0.81 4 5 0 60.84 DTU -0.00 0.81 -0.0 16.44 t2D 0.1 fT 1.29 2.06 1.57 UT 3.377 t3D 0.075 e 3 5 4 -0.2 -0.1 0.06 1 2 1 -0.2 -0.0 -0.5 fT 1.29 2.06 1.57 DSR1 e fS 1 1 0.24 -0.3 0.75 1 1 0.32 -0.1 0.67 fD 1.38 2.45 DSR2 e fS -0.0 -0.5 0.78 0.12 1.07 -0.89 -0.3 -0.35 0 -0.6 -0.5 -1.2 1.12 -1.8 DT 1 3 4 5 UT 3 5 4 u1 1 2 1 u2 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 d2 1 d2 2.57 4.26 1.88 1.38 7.33 1.389 1.38 1.389 -3.8 -3.8 9.22 -0.2 -3.8 -3.8 TD1 TD2 TD3 DT1 DT2 TU1 TU2 TU3 DSR1 DSR2 U1 U2 S1 S2 1.55 1.78 1.45 1.38 2.21 1.384 1.38 1.384 0.64 0.64 2.48 1.14 0.64 0.64 DT _________ e 1 0 3 sse t -1.1 0.98 4 5 0 60.87 DTU 0.025 0.56 1.03 21.84 t2D 0.116 fT 1.55 1.78 1.45 UT 7.033 t3D 0.14 e 3 5 4 -0.8 0.56 0.39 1 2 1 -0.7 -0.0 -0.6 fT 1.55 1.78 1.45 DSR1 e fS 1 1 0.42 0.07 0.64 fD 1.38 2.21 DSR2 e fS -3.3 3.64 1.57 0.06 3.13 -3.00 1.34 0.203 0 -2.5 0.26 -2.2 1.50 0.41 DT 1 3 4 5 UT 3 5 4 u1 1 2 1 u2 DSR s1 s2 s1 s2 1 d1 r1 d1 r2 d2 1 d2 -0.0 -0.5 0.78 0.12 1.07 -0.89 -0.3 -0.35 0 -0.6 -0.5 -1.2 1.12 -1.8 TD1 TD2 TD3 DT1 DT2 TU1 TU2 TU3 DSR1 DSR2 U1 U2 S1 S2 1.29 2.04 1.59 1.39 2.48 1.132 1.47 1.389 0.64 0.43 2.48 0.94 0.79 0.62 DT _________ e 1 0 3 sse t -0.8 0.77 4 5 0 60.84 DTU 0 0.77 -0.0 15.67 t2D 0.04 fT 1.29 2.04 1.59 UT 3.314 t3D 0.03 e 3 5 4 -0.2 -0.0 0.03 1 2 1 -0.2 0.07 -0.5 fT 1.29 2.04 1.59 DSR1 e fS 1 1 0.21 -0.3 0.79 1 1 0.38 -0.0 0.62 fD 1.39 2.48 DSR2 e fS 0.01 -0.3 0.70 -0.0 1.09 -0.77 -0.1 -0.38 0 -0.5 -0.4 -0.9 1.19 -1.7

pSVD for Communication Analytics, f = fDTD fTTD fTUT fUUT fSDSR fDDSR Train f as follows: Train w 2D matrix, TD Train w 2D matrix UT Train over the 3D matrix, DSR pSVD for Communication Analytics, f = fDTD 1 fTTD fTUT fUUT fSDSR fDDSR fRDSR sse=nbTD(td-TDtd)2 sse=nbUT(ut-UTut)2 sse=nbDSR(dsr-DSRdsr)2 ssed=2nbTD(td-TDtd)t sseu=2nbUT(ut-UTtd)t ssed=2nbDSR(dsr-DSRdsr)sr sset=2nbTD(td-TDtd)d sset=2nbUT(ut-UTtd)u sses=2nbDSR(dsr-DSRdsr)dr sser=2nbDSR(dsr-DSRdssr)ds pSVD classification predicts blank cell values. DSR fSDSR 1 fDTD fTTD fTUT fUUT U 2 3 4 5 D T fDDSR TD UT fRDSR pSVD FAUST Cluster: Use pSVD to speed up FAUST cluster by looking for gaps in TD rather than TD (i.e., using SVD predicted values rather than actual given TD values). The same goes for DT, UT, TU, DSR, SDR, RDS. E.g., on the T(d1,...,dn) table, the tth row is pSVD estimated as (ft*d1,...,ft*dn) and the dot product vot is pSVD estimated as k=1..n vk*ft*dk So we analyze gaps in this column of values taken over all rows, t. pSVD FAUST Classification: Use pSVD to speed up FAUST Classification by finding optimal cutpoints in TD rather than TD (i.e., using SVD predicted values rather than actual given TD values). Same goes for DT, UT, TU, DSR, SDR, RDS.

Recalling the massive interconnection of relationships between entities, any analysis we do on this we can do after estimating each matrix using pSVD trained feature vectors for the entities. DSR 1  sender  rec UT 1 On the next slide we display the pSVD1 (one feature) replacement by a feature vector which approximates the non-blank cell values and predicts the blanks.  Customer 1 2 3 4 Item 1 customer rates movie as 5 card cust item card 5 6 7 People  1 2 3 4 Author movie 2 3 1 5 4 customer rates movie card 2 3 4 5 PI 2 3 4 5 PI 4 3 2 1 Course Enrollments 1 Doc termdoc card authordoc card 1 3 2 Doc 1 2 3 4 Gene genegene card (ppi) docdoc People  term  7 6 5 4 3 Gene 1 2 3 4 G 5 6 7 6 5 4 3 2 t 1 termterm card (share stem?) 1 3 Exp expPI card expgene card genegene card (ppi)

fE fDSR,S fG1 fG2 fG3 fG5 fE2 fG4 fUT,T fDSR,D fUT,U fCI,C fCI,I fTD,T On this slide we display the pSVD1 (one feature) replacement by a feature vector which approximates the non-blank cell values and predicts the blanks. 1 fDSR,R fDSR,S Train the following feature vector thru gradient descent of sse, but that each set of matrix feature vectors be trained on only the sse over the nonblank cells of that matrix. / train these 2 on GG1 \ /train these 2 on EG\ / train on GG2 \ And the same for the rest of them. Any data mining we can do with the matrixes, we can do (estimate) with the feature vectors (e.g., netflix like recommenders, prediction of blank cell values, FAUST gap based classification and clustering including anomaly detection). fG1 fG2 fG3 fG5 1 fE2 fG4 fUT,T Doc Sender Receiver fDSR,D fUT,U UT fCI,C 1 2 3 4 Gene2 Item Doc 5 G3 Experiment G1 6 T1 7 =Customer=users Author People = movie Course T2 fCI,I CI fTD,T fG1 fTT,T1 1 2 3 4 5 6 7 fUM,M fE,S fG2 1 fTD,D 1 fTD,D AD TD 1 fD2 3 2 1 fD1 fE,C GG1 Enroll DD fUM,M fE fG3 UserMovie ratings fG5 fTT,T1 1 fE1 1 fE2 ExpPI ExpG 3 2 fTT,T2 1 TermTerm fG4 GG2

k=1..n(f1R1f1Ck+..+fKR1fKCk)dk = A n-dim vector space, RC(C1,...,Cn) is a matrix or TwoEntityRelationship (with row entity instances R1...RN and column entity instances C1...Cn.) ARC will denote the pSVD approximation of RC: FC= f1C f2C 2 4 ... 5 A N+n vector, f=(fR, fC) defines prediction, pi,j=fRifCj, error, ei,j=pi,j-RCi,j then ARCf,i,j≡fRifCj and ARCf,row_i= fRifC= fRi(fC1...fCn)= (fRifC1...fRifCn). Use sse gradient descent to train f. fC 4 1 ... 3 RC C1 C2 ... Cn R1 R2 . RN fR 1 : 6 4 2 3 5 fR1(fCodt) Once f is trained and if d=unit n-vector, the SPTS, ARCfodt, is: k=1..n fR2fCkdk : k=1..n fRNfCkdk fR1k=1..n fCkdk = fR2k=1..n fCkdk fRNk=1..n fCkdk k=1..n fR1fCkdk = (fR1fC)odt = (fR2fC)odt (fRNfC)odt fR2(fCodt) fRN(fCodt) 1 : 2 4 3 5 f1R f2R 1 1 1 1 1 1 Compute fCodt=k=1..nfCkdk form constant SPTS with it, and multiply that SPTS by SPTS, fR. d 1 ... Any datamining that can be done on RC can be done using this pSVD approximation of RC, ARC e.g., FAUST Oblique (because ARCodt should show us the large gaps quite faithfully). Given any K(N+n) feature matrix, F=[FR FC], FRi=(f1Ri...fKRi), FCj=(f1Cj...fKCj) pi,j=fRiofCj=k=1..KfkRifkCj Once F is trained and if d=unit n-vector, the SPTS, ARCodt, is: (FR1oFC)odt = (FR2oFC)odt : (FRNoFC)odt k=1..n(f1R1f1Ck+..+fKR1fKCk)dk = k=1..n(f1R2f1Ck+..+fKR2fKCk)dk k=1..n(f1RNf1Ck+..+fKRNfKCk)dk FR1o(FCodt) FR2o(FCodt) FRNo(FCodt) Keeping in mind that we have decided (tentatively) to approach all matrixes as rotatable tables, this then is a universal method of approximation. The big question is, how good is the approximation for data mining? It is known to be good for Netflix type recommender matrixes but what about others?

Of course if we take the previous data (all nonblarnks=1 Of course if we take the previous data (all nonblarnks=1. and we only count errors in those nonblarnks, then f pure1 is error=0. But of course, if it is an image (fax-type image of 0/1) then there are no blanks (and zero positions must be assessed error too). So we change the data. t sse .13 .2815 e 1 2 3 4 5 6 7 8 9 a b 1 2 3 4 5 6 7 8 9 a b c d e f -.2 .07 .21 -.1 .94 0.000 0.262 0.238 0.102 0.081 1 2 3 4 5 6 7 8 9 a b 1 1 2 5 3 2 3 4 5 6 3 7 2 8 9 4 3 10 1 11 3 4 12 13 1 14 5 15 2 fR fC