5(I,C) (I,C) (I,C) (I,C) Collaborative filtering, AKA customer preference prediction, AKA Business Intelligence, is critical for on-line retailers (Netflix, Amazon, Yahoo...). It's just classical classification: based on a rating history training set, predict how customer, c, would rate item, i? Use relationships to find "neighbors" to predict rating(c=3,i=5)? 5(C,I) C I 2345 C (I,C) I Rolodex Relationship model 4(C,I) C I (C,I) C I (C,I) C I (C,I) C I 2345 Binary Relationship model Find all customers whose rating history is similar to that of c=3. I.e., for each rating, k=1,2,3,4,5, find all other customers who give that rating to the movies that c=3 gives that rating to, which is k k 3 where k is a customer pTree from the relationship k(C,I). Then find the intersection of those k- CustomerSet: &k k 3 and let those resulting customers vote or predict rating(c=3,i=5) TrainingSet C I Rating (C,I) (I,C) C I C 3(I,C) I (I,C) C 5(I,C) I 2345 Multihop Relationship model
5(I,C) (I,C) (I,C) (I,C) (C,I) (I,C) C I C 4(I,C) Multihop model I C 5(I,C) Collaborative filtering (AKA: customer preference prediction or Business Intelligence) is critical for on-line retailing (e.g., Netflix, Amazon, Yahoo...). Use MRRYH to predict rating(c=3, i=5)? 5(C,I) C I 2345 C (I,C) I Rolodex model 4(C,I) C I (C,I) C I (C,I) C I (C,I) C I 2345 Binary model Approach 2: Judging that rating=3 is "no opinion", focus (count) on the middle customer axis??????
50% Satlog-Landsat stride=64, classes: redsoil cotton greysoil dampgreysoil stubble verydampgreysoil R Rir R ir Rir R ir RG R G r cl cgdsv Rclass Gir G ir G Gir G ir Gclass ir1 ir1ir ir1 ir ir1class ir2 ir2class r cl cgdsv r cgdsv r cgdsv For 50% Satlog-Landsat stride=320, we get: Note that for stride=320, the means are way off and it therefore will probably produce very inaccurate classification.. A level-0 pVector is a bit string with 1 bit per record. A level-1 pVector is a bit string with 1 bit per record stride which gives truth of a predicate applied to record stride. A n-level pTree consists of a level-k pVector (k=0...n-1) all with the same predicate and s.t. each level-k stride is a contained within one level-k-1 stride. 320-bit strides start end cls cls 320 stride _ R G ir1 ir2 cls means stds means stds means stds means stds
50% stride=64 R cls G cls ir cls ir cls
r r r v v r m r r v v v r r v m v v r v v r v APPENDIX: FAUST Oblique formula: P (X o d)<a X any set of vectors (e.g., a training class). To separate r s from v s using means_midpoint as the cut-point, calculate a as follows: a Viewing m r, m v as vectors ( e.g., m r ≡ origin pt_m r ), a = ( m r +(m v -m r )/2 ) o d = (m r +m v )/2 o d d D≡ mrmv.D≡ mrmv. Let d = D/|D|. What if d points away from the intersection,, of the Cut-hyperplane (Cut-line in this 2-D case) and the d-line (as it does for class=V, where d = (m v m r )/|m v m r | ? Then a is the negative of the distance shown (the angle is obtuse so its cosine is negative). But each v o d is a larger negative number than a=(m r +m v )/2 o d, so we still want v o d < ½(m v +m r ) o d d
r r r v v r m r r v v v r r v m v v r v v r v P X o d < a = P d i X i <a FAUST Oblique vector of stds D≡ m r m v, d=D/|D| To separate r from v: Using the vector of stds cutpoint, calculate a as follows: d Viewing m r, m v as vectors, a = ( m r + m v ) o d std r +std v std r std r +std v std v What are the purple stds? approach-1: for each coordinate (or dimension) calculate the stds of the coordinate values and for the vector of those stds. Let's remind ourselves that the formula given Md's formula, does not require looping through the X-values but requires only one AND program across the pTrees. P X o d < a = P d i X i <a
r r r v v r m r r v v v r r v m v v r v v r v pm r | P Xod<a = P d i X i <a FAUST Oblique D≡ m r m v, d=D/|D| Approach 2 To separate r from v: Using the stds of the projections, calculate a as follows: d r|r| r|r| |r|r |r|r |r|r pm v | v|v| v|v| |v|v |v|v |v|v a = pm r + (pm v -pm r ) = pstd r +pstd v pstd r pm r *pstd r + pm r *pstd v + pm v *pstd r - pm r *pstd r pstd r +pstd v By pm r, we mean this distance, m r o d, which is also mean{r o d|r R} By pstd r, std{r o d|r R} next? pm r + (pm v -pm r ) = pstd v +2pstd r 2pstd r pm r *2pstd r + pm r *pstd v + pm v *2pstd r - pm r *2pstd r 2pstd r +pstd v In this case the predicted classes will overlap (i.e., a given sample point may be assigned multiple classes) therefore we will have to order the class predictions.
FAUST Satlog evaluation R G ir1 ir2 mn R G ir1 ir2 std Oblique level-0 using midpoint of means 1's 2's 3's 4's 5's 7's True Positives: False Positives: NonOblique lev-0 1's 2's 3's 4's 5's 7's True Positives: Class Totals-> NonOblq lev-1 50% 1's 2's 3's 4's 5's 7's True Positives: False Positives: Oblique level-0 using means and stds of projections (w/o cls elim) 1's 2's 3's 4's 5's 7's True Positives: False Positives: Oblique lev-0, means, stds of projections (w cls elim in order) Note that none occurs 1's 2's 3's 4's 5's 7's True Positives: False Positives: a = pm r + (pm v -pm r ) = pstd v +2pstd r 2pstd r pm r *pstd v + pm v *2pstd r pstd r +2pstd v Oblique level-0 using means and stds of projections, doubling pstd 1's 2's 3's 4's 5's 7's True Positives: False Positives: Oblique lev-0, means, stds of projs, doubling pstd r, classify, eliminate in 2,3,4,5,7,1 ord 1's 2's 3's 4's 5's 7's True Positives: False Positives: So the number of FPs is drastically reduced and TPs somewhat reduced. Is that better? If we parameterize the 2 (doubling) and adjust to max TPs and min FPs, what is the optimal multiplier parameter value? Next, low-to-high std elimination ordering. Oblique lev-0, means,stds of projs, doubling pstd r, classify, elim 3,4,7,5,1,2 ord 1's 2's 3's 4's 5's 7's True Positives: False Positives: above=(std+stdup)/gap below=(std+stddn)/gapdn suggest ord abv below abv below abv below abv below avg red green ir1 ir2 cls avg s1/(2s1+s2) elim ord: TP: FP: tot TP s1/(s1+s2) FP TP 2s1/(2s1+s2) FP no elim ord TP 2s1/(2s1+s2) FP TP 2s1/(2s1+s2) FP TP 2s1/(2s1+s2) FP TP s1/(s1+s2) FP level1 50%
Can MYRRH classify? (pixel classification?) Try 4-hop using attributes of IRIS(Cls,SL,SW,PL,PW) stride=10 level-1 val SL SW PL PW setosa setosa setosa setosa setosa versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica SL SW PL rnd(PW/10) PL SW PW PW SL SL CLS se ve vi C={se} A={3,4} A C confident? = 1/2 ct( & pw & sw A R sw S pw & sl & cls C U cls T sl )/ ct(& pw & sw A R sw S pw ) R S T U pl={1,2} pl={1}
1-hop: IRIS(Cls,SL,SW,PL,PW) stride=10 level-1 val SL SW PL PW setosa setosa setosa setosa setosa versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica SL SW PL rnd(PW/10) SW C={se} A={3,4} 1-hop A C is more confident: = 1 R sw= {3,4} CLS se ve vi ct(R A & cls {se} R cls ) / ct(R A ) sw= {3,4} sw= {3,4} But what about just taking R {class} ? Gives {3,4} se {2,3} ve {3} vi This is not very differentiating of class. Include the other three? SW CLS se ve vi SL CLS se ve vi PL CLS se ve vi PW CLS se ve vi {4,5} se{5,6} ve{6,7} vi {3,4} se{2,3} ve{3} vi {1,2} se{3,4,5} ve{5,6} vi {0} se{1,2} ve{1,2} vi These rules were derived from the binary relationships only. A minimal Decision Tree Classifier suggested by the rules: / \ PW=0 else | sePL {3,4} & SW=2 & SL=5 else | ve 2 of 3 of: else PL {3,4,5} | SW={2,3} vi SL={5,6} | ve I was hoping for a "Look at that!" but it didn't happen ;-)
2-hop stride=10 level-1 val SL SW PL PW setosa setosa setosa setosa setosa versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica SL SW PL rnd(PW/10) PL SL SL CLS se ve vi T U ct(OR pl A T pl & cls C U cls ) / ct(OR pl A T pl ) A={1,2} C={se} =1 Mine out all confident se-rules with minsup = 3/4: sl={4,5} Closure: If A {se} is nonconfident and A U se then B {se} is nonconfident for all B A. So starting with singleton A's: ct(T pl=1 & U se ) / ct(T pl=1 ) = 2/2 yes. ct(T pl=2 & U se ) / ct(T pl=2 ) = 1/1 yes. ct(T pl=3 & U se ) / ct(T pl=3 ) = 0/1 no. ct(T pl=4 & U se ) / ct(T pl=4 ) = 0/1 no. ct(T pl=5 & U se ) / ct(T pl=5 ) = 1/2 no. ct(T pl=6 & U se ) / ct(T pl=6 ) = 0/1 no. etc. A= {1,3} {1,4} {1,5} or {1,6} will yield nonconfidence and A U se so all supersets will yield nonconfidence. A= {2,3} {2,4} {2,5} or {2,6} will yield nonconfidence but the closure property does not apply. A= {1,2} will yield confidence. I conclude that this closure property is just too weak to be useful. And also it appears from this example that trying to use myrrh to do classification (at least in this way) does not appear to be productive.
Lev2-50% stride640, classes: redsoil cotton greysoil dampgreysoil stubble verydampgreysoil RG R G 4567r cl cgdsv Rir R ir Rir R ir Rclass R r cl cgdsv Gir G ir Gir G ir Gclass G r cl cgdsv ir1ir ir ir ir1class G r cl cgdsv ir2class ir