Download presentation
Presentation is loading. Please wait.
Published byΖεύς Ευταξίας Modified over 6 years ago
1
Mining Communications data prediction and anomaly detection on emails, tweets, phone, text
DSR is binary (1 means doc was sent by Sender to Reciever). Or, Sender can be attr. of Doc(typ,Nm,Sz,HsTg,Sndr) fR The pSVD trick is to replace these massive relationship matrixes with small feature matrixes. fR fS f1,S f2,S 1 fS 1 Using just one feature, replace with vectors, f=fDfTfUfSfR or f=fDfTfU DSR U 2 3 4 5 TD 1 D T UT fD 1 D 2 3 4 5 fD 1 DSR rec sender fT 1 fT 1 Replace DSR with fD, fS, fR 5 4 3 T 2 TD fD 1 Replace TD with fT and fD fD 1 FU 1 fU 1 U 2 3 4 5 UT fT 1 fT 1 Replace UT with fU and fT feature matrixes (2 features) Use GradientDescent+LineSearch to minimize sum of square errors, sse, where sse is the sum over all nonblanks in TD, UT and DSR. Should we train User feature segments separately (train fU with UT only and train fS and fR with DSR only?) or train U with UT and DSR, then let fS = fR = fU , so f = This will be called 3D f. <----fD----> 1 <----fT----> <----fU----> <----fS----> <----fR----> Or training User the feature segment just once, f = This will be called 3DTU f <----fD----> 1 <----fT----> <fU=fS=fR> We do pTrees conversions and train F in the CLOUD; then download the resulting F to user's personal devices for predictions, anomaly detections. The same setup should work for phone record Documents, tweet Documents (in the US Library of Congress) and text Documents, etc.
2
pSVD for Communication Analytics, f = fDTD fTTD fTUT fUUT fSDSR fDDSR
Train f as follows: Train w 2D matrix, TD Train w 2D matrix UT Train over the 3D matrix, DSR pSVD for Communication Analytics, f = fDTD 1 fTTD fTUT fUUT fSDSR fDDSR fRDSR sse=nbTD(td-TDtd)2 sse=nbUT(ut-UTut)2 sse=nbDSR(dsr-DSRdsr)2 ssed=2nbTD(td-TDtd)t sseu=2nbUT(ut-UTtd)t ssed=2nbDSR(dsr-DSRdsr)sr sset=2nbTD(td-TDtd)d sset=2nbUT(ut-UTtd)u sses=2nbDSR(dsr-DSRdsr)dr sser=2nbDSR(dsr-DSRdssr)ds pSVD classification predicts blank cell values. DSR fSDSR 1 fDTD fTTD fTUT fUUT U 2 3 4 5 D T fDDSR TD UT fRDSR pSVD FAUST Cluster: Use pSVD to speed up FAUST cluster by looking for gaps in TD rather than TD (i.e., using SVD predicted values rather than actual given TD values). The same goes for DT, UT, TU, DSR, SDR, RDS. E.g., on the T(d1,...,dn) table, the tth row is pSVD estimated as (ft*d1,...,ft*dn) and the dot product vot is pSVD estimated as k=1..n vk*ft*dk So we analyze gaps in this column of values taken over all rows, t. pSVD FAUST Classification: Use pSVD to speed up FAUST Classification by finding optimal cutpoints in TD rather than TD (i.e., using SVD predicted values rather than actual given TD values). Same goes for DT, UT, TU, DSR, SDR, RDS.
3
fTrowi=fRifC=fRi fC1...fCn = fRifCn ... fRifCn fTcolj=fRtrfCj=fR1 fCj
A real valued vector space, T(C1..Cn) is a 2-entity (R=row entity, C=column entity) labeled relationship over rows R1..RN, columns C1..Cn Let fTi,j= fRiofC be the approximation to T where f=(fRfC) is a F(N+n) matrix trained to minimize sse=Tij nonblank(fTij-Tij)2. Assuming one feature (i.e., F=1): fTrowi=fRifC=fRi fC1...fCn = fRifCn ... fRifCn fTcolj=fRtrfCj=fR1 fCj : fRN = fR1fCj fRNfCj One forms each SPTS by multiplying a SPTS by a number (Md's alg) So we only need the two feature SPTSs. to get the entire PTS(fT) which approximates PTS(T) A 2-entity matrix can be viewed as a vector space 2 ways. E.g., Document entity: We meld the Document table with the DSR matrix and the DT matrix to form an ultrawide Universal Doc Tbl, UD(Name,Time,Sender,Length,Term1,...,TermN,Receiver1,...,Receivern) where N=~ and n=~1,000,000,000. We train 2 feature vectors to approximate UD, fD and fC where fC=(fST,fS,fL,FT1,...,fTN,fR1,...,fRn). We have found it best to train with a minimum of matrixes, which means that there will be a distinct fD vectors for each matrix.) How many bitslices in the PTreeSet for UD? Assuming an average of bitwidth=8 for its columns, that would be 8,000,080,0024 bitslices. That may be too many to be useful (e.g., for download onto an Iphone). Therefore we can appoximate PTreeSetUD with fUD as above. Whenever we need a Scalar PTreeSet representing a column, Ck, of UD (from PTreeSetUD) we can download that fCk value plus fD and multiply the SPTS, fD, by the constant, fCk to get a "good" approximation to the actual SPTS needed. We note that the concept of the join (equijoin) which is so central to the relational model, is not necessary when we use the rolodex model and focus on entities (each entity, as a join attribute is pre-joined.)
4
A vector space is closed under addition (adding one vector componentwise to and multiplication by a scalar (real multiplication or multiplication of a vector by a real number producing another vector). We also need component-wise multiplication (vector multiplication) (the 1st half of dot product) but is not a required vector space operation. Md and Arjun, do you have code these? Some thoughts on scalar multiplication. It's just shifts and additions? e.g., Take v=(7,1,6,6,1,2,2,4)TR and scalar mult by 3=(0 1 1) 1 the leftmost 1 bit in 3 shifts each bitslice 1 to the left and those get added to the unshifted bitslices (due to the units 1 bit. The results bitslices are: r3 r2 r1 r0 v2 v1 v due to the 1x21 in 3 v2 v2 v1 v due to the 1x21 in 3 v2 v2+v1 v1+v0 v0 Note vi + vj = vi XOR vj with carry vi AND vj
5
Recalling the massive interconnection of relationships between entities, any analysis we do on this we can do after estimating each matrix using pSVD trained feature vectors for the entities. DSR 1 sender rec UT 1 On the next slide we display the pSVD1 (one feature) replacement by a feature vector which approximates the non-blank cell values and predicts the blanks. Customer 1 2 3 4 Item 1 customer rates movie as 5 card cust item card 5 6 7 People 1 2 3 4 Author movie 2 3 1 5 4 customer rates movie card 2 3 4 5 PI 2 3 4 5 PI 4 3 2 1 Course Enrollments 1 Doc termdoc card authordoc card 1 3 2 Doc 1 2 3 4 Gene genegene card (ppi) docdoc People term 7 6 5 4 3 Gene 1 2 3 4 G 5 6 7 6 5 4 3 2 t 1 termterm card (share stem?) 1 3 Exp expPI card expgene card genegene card (ppi)
6
fE fDSR,S fG1 fG2 fG3 fG5 fE2 fG4 fUT,T fDSR,D fUT,U fCI,C fCI,I fTD,T
On this slide we display the pSVD1 (one feature) replacement by a feature vector which approximates the non-blank cell values and predicts the blanks. 1 fDSR,R fDSR,S Train the following feature vector thru gradient descent of sse, but that each set of matrix feature vectors be trained on only the sse over the nonblank cells of that matrix. / train these 2 on GG1 \ /train these 2 on EG\ / train on GG2 \ And the same for the rest of them. Any data mining we can do with the matrixes, we can do (estimate) with the feature vectors (e.g., netflix like recommenders, prediction of blank cell values, FAUST gap based classification and clustering including anomaly detection). fG1 fG2 fG3 fG5 1 fE2 fG4 fUT,T Doc Sender Receiver fDSR,D fUT,U UT fCI,C 1 2 3 4 Gene2 Item Doc 5 G3 Experiment G1 6 T1 7 =Customer=users Author People = movie Course T2 fCI,I CI fTD,T fG1 fTT,T1 1 2 3 4 5 6 7 fUM,M fE,S fG2 1 fTD,D 1 fTD,D AD TD 1 fD2 3 2 1 fD1 fE,C GG1 Enroll DD fUM,M fE fG3 UserMovie ratings fG5 fTT,T1 1 fE1 1 fE2 ExpPI ExpG 3 2 fTT,T2 1 TermTerm fG4 GG2
7
k=1..n(f1R1f1Ck+..+fKR1fKCk)dk =
A n-dim vector space, RC(C1,...,Cn) is a matrix or TwoEntityRelationship (with row entity instances R1...RN and column entity instances C1...Cn.) ARC will denote the pSVD approximation of RC: FC= f1C f2C 2 4 ... 5 A N+n vector, f=(fR, fC) defines prediction, pi,j=fRifCj, error, ei,j=pi,j-RCi,j then ARCf,i,j≡fRifCj and ARCf,row_i= fRifC= fRi(fC1...fCn)= (fRifC1...fRifCn). Use sse gradient descent to train f. fC 4 1 ... 3 RC C1 C2 ... Cn R1 R2 . RN fR 1 : 6 4 2 3 5 fR1(fCodt) Once f is trained and if d=unit n-vector, the SPTS, ARCfodt, is: k=1..n fR2fCkdk : k=1..n fRNfCkdk fR1k=1..n fCkdk = fR2k=1..n fCkdk fRNk=1..n fCkdk k=1..n fR1fCkdk = (fR1fC)odt = (fR2fC)odt (fRNfC)odt fR2(fCodt) fRN(fCodt) 1 : 2 4 3 5 f1R f2R 1 1 1 1 1 1 Compute fCodt=k=1..nfCkdk form constant SPTS with it, and multiply that SPTS by SPTS, fR. d 1 ... Any datamining that can be done on RC can be done using this pSVD approximation of RC, ARC e.g., FAUST Oblique (because ARCodt should show us the large gaps quite faithfully). Given any K(N+n) feature matrix, F=[FR FC], FRi=(f1Ri...fKRi), FCj=(f1Cj...fKCj) pi,j=fRiofCj=k=1..KfkRifkCj Once F is trained and if d=unit n-vector, the SPTS, ARCodt, is: (FR1oFC)odt = (FR2oFC)odt : (FRNoFC)odt k=1..n(f1R1f1Ck+..+fKR1fKCk)dk = k=1..n(f1R2f1Ck+..+fKR2fKCk)dk k=1..n(f1RNf1Ck+..+fKRNfKCk)dk FR1o(FCodt) FR2o(FCodt) FRNo(FCodt) Keeping in mind that we have decided (tentatively) to approach all matrixes as rotatable tables, this then is a universal method of approximation. The big question is, how good is the approximation for data mining? It is known to be good for Netflix type recommender matrixes but what about others?
8
t sse Rnd Of course if we take the previous data (all nonblanks=1. and we only count errors in those nonblarnks, then f=pure1 has sse=0. But of course, if it is a fax-type image (of 0/1s) then there are no blank (=0 positions must be assessed error too). So we change the data. a b 1 1 4 5 8 12 15 2 e a b 1 2 3 4 5 6 7 8 9 a b c d e f -0.23 t sse 0.5 10,171 a b a b 1 1 4 5 8 12 15 1 Next, consider a fax-type image dataset (blanks=zeros. sse summed over all cells). e a b c d e f 1 2 3 6 7 9 a b c d e f ****************** ***** ****************** ***** ****************** ***** ****************** ***** ***************** Minimum sse=10.154 a b c d e f a b c d e f Without any gradient descent rounds we can knock down column 1 with T=t+(tr1...tcf) but sse= (can't go below its min=10.154) 6 t= a b c d e 1 1 1 f a b c d e f tr1 tr2 tr3 tr4 tr5 tr6 tr7 tr8 tr9 tra trb trc trd tre trf tc1 tc2 tc3 tc4 tc5 tc6 tc7 tc8 tc9 tca tcb tcc tcd tce tcf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.