Presentation is loading. Please wait.

Presentation is loading. Please wait.

O 0 r 1 v 1 r 2 v 2 r 3 v 3 v 4 dim2 dim1 Algorithm-1: Look for dimension where clustering best. Below, dimension=1 (3 clusters: {r 1,r 2,r 3,O}, {v 1,v.

Similar presentations


Presentation on theme: "O 0 r 1 v 1 r 2 v 2 r 3 v 3 v 4 dim2 dim1 Algorithm-1: Look for dimension where clustering best. Below, dimension=1 (3 clusters: {r 1,r 2,r 3,O}, {v 1,v."— Presentation transcript:

1 o 0 r 1 v 1 r 2 v 2 r 3 v 3 v 4 dim2 dim1 Algorithm-1: Look for dimension where clustering best. Below, dimension=1 (3 clusters: {r 1,r 2,r 3,O}, {v 1,v 2,v 3,v 4 } and {0}). How to determine? 1.a: Take each dimension in turn working left to right, when d(mean,median)>¼ width, declare a cluster. 1.b: Next take those clusters one at a time to the next dimension for further sub-clustering via the same algorithm. mean median mean median At this point we declare {r 1,r 2,r 3,O} a cluster and start over. mean median mean median mean median At this point we need to declare a cluster, but which one, {0,v 1 } or {v 1,v 2 }? We will always take the one on the median side of the mean - in this case, {v 1,v 2 }. And that makes {0} a cluster (actually an outlier, since it's singleton). Continuing with {v 1,v 2 }: mean median mean median Declare {v 1,v 2,v 3,v 4 } a cluster. Note we have to loop. However, rather than each single projection, delta can be the next m projs if they're close. Next we would take one of the clusters and go to the best dimension to subcluster... Oblique version: Take grid of Oblique direction vectors, e.g., For 3D dataset, a DirVect pointing to center of each PTM triangle. With projections onto those lines, do 1 or 2 above. Ordering = any sphere surface grid: S n ≡{x≡(x 1...x n )  R n |  x i 2 =1}, in polar coords, {p≡(θ 1...θ n-1 ) | 0  θ i  179}. Can skip doubletons since mean always same as median. Algorithm-3: Another variation of this is to calculate the dataset mean and vector of medians. Then on the projections of the dataset onto the line connecting the two, do 1a or 1b. Then repeat on each declared cluster, but use projection line other than the one through the mean and vom, this second time, since the mean-vom-line would likely be in approx in the same direction as the first round) Do until no new clusters? Adjust? e.g., proj lines and stop cond,... Algorithm-2: 2.a Take each dim in turn, working left to right, when density>Density_Threshold, declare a cluster (density≡count/size). 2b=1b Algorithm-4: Proj onto line of dataset mean, vom, mn=6.3,5.9 vom=6,5.5 (11,10=outlier). 4.b, Repeat on any perp line thru mean. (mn, vom far apart  multi-modality. Algorithm-4.1: 4.b.1 In each cluster, find 2 points furthest from line? (Require projection be done one point at a time? Or can we determine those 2 points in one pTree formula?) Algorithm-4.2: 4.b.2 use a grid of unit direction lines, {dv i | i=1..m}. For each, calc mn, vom of projs of each cluster (except singletons). Take the one for which the separation is max. 4,9 2,8 5,8 4,6 3,4 dim2 dim1 11,10 10,5 9,4 8,3 7,2 6.3,5.9 6,5.5 Use lexicographical polar coords? 180 n too many? Use e.g., 30 deg units, giving 6 n vectors, for dim=n. Attrib relevance important Analysis of Affinities and Anomalies through pTrees

2 435 524 504 545 323 1 2 3 mean=(8.18, 3.27, 3.73)vom=(7,4,3) 1. no clusters determined yet. 924 b43 e43 c63 752 f72 2. (9,2,4) determined as an outlier cluster. 3. Using red dim line, (7,5,2) is determined as an outlier cluster. maroon pts determined as cluster, purple pts too. 3.a However, continuing to use line connecting (new) mean and vom of the projections onto this plane, would the same be determined? Other option? use (at some judicious point) a p-Kmeans type approach. This could be done using K=2 and a divisive top down approach (using a GA mutation at various times to get us off a non-convergent track)? Notes:Each round, reduce dim by one (low bound on the loop.) Each round, just need good line (in remaining hyperplane) to project cluster (so far). 1. pick line thru proj'd mean, vom (vom is dependent on basis used. better way?) 2. pick line thru longest diameter? ( or diam  1/2 previous diam?). 3. try a direction vector. Then hill climb it in direction increase in diam of proj'd set. From: Mark Silverman [mailto:msilverman@treeminer.com] April 21, 2012 8:22 AM Subject: RE: oblique faustmsilverman@treeminer.com I’ve been doing some tests, so far not so accurate (I’m still validating the code – I “unhardcoded” it so I can deal with arbitrary datasets and it’s possible there’s a bug, so far I think it’s ok). Something rather unique about the test data I am using is that it has four attributes, but for all of the class decisions it is really one of the attributes driving the classification decision (e.g. for classes 2-10, attribute 2 is dominant decision, class 11 attribute 1 is dominant, etc). I have very wide variability in std deviation in the test data (some very tight, some wider). Thus, I think that placing “a” on the basis of relative deviation makes a lot of sense in my case (and probably in general). My assumption is that all I need to do is to modify as follows: Now: a[r][v] = (Mr + Mv) * d / 2 Changes to a[r][v] = (Mr + Mv) * d * std(r) / (std(r) + std(s)) Is this correct?

3 r r vv r m R r v v v r r v m V v r v v r v FAUST Oblique (our best classifier?) P R =P (X o d R ) < a R 1 pass gives classR pTree D≡ m R  m V d=D/|D| midpoint of means ( mom ) Separate class R using midpoint of means ( mom ) method: Calc a (m R +(m V -m R )/2) o d = a = (m R +m V )/2 o d (works also if D=m V  m R, d Training≡placing cut-hyper-plane(s) (CHP) (= n-1 dim hyperplane cutting space in two). Classification is 1 horizontal program (AND/OR) across pTrees, giving a mask pTree for each entire predicted class (all unclassifieds at-a-time) Accuracy improvement? Consider the dispersion within classes when placing the CHP. E.g., use the vom 1. vectors_of_median, vom, to represent each class, not the mean m V, where vom V ≡(median{v 1 |v  V}, mom_std, vom_std methods 2. mom_std, vom_std methods : project each class on d-line; then calculate std (one horizontal formula per class using Md's method); then use the std ratio to place CHP (No longer at the midpoint between m r and m v median{v 2 |v  V},...) vom V v1v1 v2v2 vom R std of distances, v o d, from origin along the d-line dim 2 dim 1 d-line Note:training (finding a and d) is a one-time process. If we don’t have training pTrees, we can use horizontal data for a,d (one time) then apply the formula to test data (as pTrees)

4 The PTreeSet Genius for Big Data Big Vertical Data: PTreeSet (Dr. G. Wettstein's) perfect for BVD! (pTrees both horiz and vert) PTreeSets incl methods for horiz querying and vertical DM, multihopQuery/DM, and XML. T(A 1...A n ) is a PTreeSet data structure = bit matrix with (typically) each numeric attr converted to fixedpt(?), (negs?) bitsliced (pt_pos  schema) and category attr bitmapped; coded then bitmapped; num coded then bisliced (or as is, ie, char(25) NAME col stored outside PTreeSet? A 1..A k num w bitwidths=bw 1..bw k ; A k+1..A n categorical w counts=cc k+1...cc n, PTreeSet is bitmatrix: 0 1 0 1 0 0 1 A 1,bw 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 row number N... 5 4 3 2 1 A 1,bw 1 -1... A 1,0 0 0 0 0 0 0 0 A 2,bw 2 0 1 0 1 0 0 1... 0 0 0 0 0 1 0 A k+1,c 1 0 0 1 0 0 1 0..A n,cc n Methods for this data structure can provide fast horizontal row access, e.g., an FPGA could (with zero delay) convert each bit-row back to original data row. Methods already exist to provide vertical (level-0 or raw pTree) access. Add any Level1 PTreeSet can be added: given any row partition (eg, equiwidth =64 row intervalization) and a row predicate (e.g.,  50% 1-bits ). Add "level-1 only" DM meth, e.g., FPGA converts unclassified rowsets to equiwidth=64,  50% level1 pTrees, then entire batch would be FAUST classified in one horiz program. Or lev1 pCKNN. 1 0 1 1 A 1,bw 1 0 0 1 0 0 0 1 1 inteval number roof (N/64)... 2 1 A 1,bw 1 -1... A 1,0 0 0 0 0 A 2,bw 2 1 0 0 1... 0 0 0 1 A k+1,c 1 1 0 1 0...A n,cc n pDGP (pTree Darn Good Protection) by permuting col ord (permution = key). Random pre-pad for each bit- column would makes it impossible to break the code by simply focusing on the first bit row. Relationships (rolodex cards) are 2 PTreeSets, AHGPeoplePTreeSet (shown) and AHGBasePairPositionPTreeSet (rotation of shown). Vertical Rule Mining, Vertical Multi-hop Rule Mining and Classification/Clustering methods (viewing AHG as either a People table (cols=BPPs) or as a BPP table (cols=People). MRM and Classification done in combination? Any table is a relationship between row and column entities (heterogeneous entity) - e.g., an image = [reflect. labelled] relationship between pixel entity and wavelength interval entity. Always PTreeSetting both ways facilitates new research and make horizontal row methods (using FPGAs) instantaneous (1 pass across the row pTree) More security?: all pTrees same (max) depth, and intron-like pads randomly interspersed... Most bioinformatics done so far is not really data mining but is more toward the database querying side. (e.g., a BLAST search). A radical approach View whole Human Genome as 4 binary relationships between People and base-pair-positions (ordered by chromosome first, then gene region?). AHG [THG/GHG/CHG] is relationship between People and adenine(A) [thymine(T)/guanine(G)/cytosine(C)] (1/0 for yes/no) Order bpp? By chromosome and by gene or region (level2 is chromosome, level1 is gene within chromosome.) Do it to facilitate cross-organism bioinformatics data mining? Create both People and BPP-PTreeSet w human health records feature table (training set for classification and multi-hop ARM.) comprehensive decomp (ordering of bpps) FOR cross species genomic DM. If separate PTreeSets for each chrmomsome (even each region - gene, intron exon...) then we can may be able to dataming horizontally across the all of these vertical pTrees. pc bc lc cc pe age ht wt AHG(P,bpp) 001 100 001 100 000 011 100 0000 0100 0001 0100 0000 0011 0100 P 7B... 5 4 3 2 1 123 bpp 45...3B genechromosome The red person features used to define classes. AHG p pTrees for data mining. We can look for similarity (near neighbors) in a particular chromosome, a particular gene sequence, of overall or anything else.

5 pc bc lc cc pe age ht wt Multi-hop Data Mining (MDM): relationship1 (Buys= B(P,I) ties table1 (People=P) to table2 (Items) P=People 2345 F(P,P)=Friends 0101 10100100 1001 5 4 3 2 P B(P,I)=Buys 0010 0000 0100 0001 I=Items 2345 Define NearestNeighborVoterSet of {f} using strong R-rules with F in consequent? A strong cluster based on several self-relationships (different relationships, so it's not just strong implic both ways) strongly implies itself (or strongly implies itself after several hops (or when closing a loop). Find all strong, A  C, A  P, C  I Frequent iff ct(P A ) > minsup and Confident iff ct(& p  A P p AND & i  C P i ) / ct(& p  A P p ) > minconf Says: "A friend of all A will buy C if all A buy C." (the AND is always AND) Closures: A freq then A + freq. A  C not conf, then A  C - not conf ct(| p  A P p AND& i  C P i )>mncf ct(| p  A P p ) friend of any in A will buy C if any in A buy C. ct(| p  A P p AND | i  C P i )>mncf ct(| p  A P p ) Change to "friend of any in A will buy something in C if any in A buy C. tied by relationship2 (Friends=F(P,P) ) to table3 (also P). Can we do clustering and/or classification on one of the tables using the relationships to define "close" or to define the other notions? Categorycolorsizewtstorecitystatecountry

6 A facebook Member, m, purchases Item, x, tells all friends. Let's make everyone a friend of him/her self. Each friend responds back with the Items, y, she/he bought and liked. Facebook-Buys: Members 4321 F≡Friends(M,M) 0111 1011 0110 1101 1 2 3 4 Members P≡Purchase(M,I) 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 I≡Items 2345  X  I MX≡& x  X P x People that purchased everything in X. FX≡OR m  MX F b = Friends of a MX person. So,  X={x}, is Mx Purchases x strong" Mx=OR m  Px F m  x frequent if Mx large. This is a tractable calculation. Take one x at a time and do the OR. Mx=OR m  Px F m  x confident if Mx large. ct( Mx  P x ) / ct(Mx) > minconf 4321 0 1 0 1 2 1011 1001 2 4 K 2 = {1,2,4} P 2 = {2,4} ct(K 2 ) = 3 ct(K 2 &P 2 )/ct(K 2 ) = 2/3 To mine X, start with X={x}. If not confident then no superset is. Closure: X={x.y} for x and y forming confident rules themselves.... ct(OR m  P x F m & P x )/ct(OR m  P x F m )>mncnf Kx=OR O g  x frequent if Kx large (tractable- one x at a time and OR. g  OR b  Px F b Kiddos 4321 F≡Friends(K,B) 0111 1011 0110 1101 1 2 3 4 Buddies P≡Purchase(B,I) 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 I≡Items 2345 1 2 3 4 Groupies Others(G,K) 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 4321 0 1 0 1 2 1011 1001 2 4 K 2 ={1,2,3,4} P 2 ={2,4} ct(K 2 ) = 4 ct(K 2 &P 2 )/ct(K 2 )=2/4 0 1 0 1 4 1 1 1 0 2 1 1 0 1 1 1 2 3 4 Fcbk buddy, b, purchases x, tells friends. Friend tells all friends. Strong purchase poss? Intersect rather than union (AND rather than OR). Ad to friends of friends Kiddos 4321 F≡Friends(K,B) 0111 1011 0110 1101 1 2 3 4 Buddies P≡Purchase(B,I) 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 I≡Items 2345 1 2 3 4 Groupies Compatriots (G,K) 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 4321 0 1 0 1 2 1011 1001 2 4 K 2 ={2,4} P 2 ={2,4} ct(K 2 ) = 2 ct(K 2 &P 2 )/ct(K 2 ) = 2/2 0 1 0 1 4 1 1 0 1 1 1 2 3 4

7 The Multi-hop Closure Theorem A hop is a relationship, R, hopping from entities E to F. upward closure: If a condition is true of A then it is true of all supersets D of A. downward closure: If a condition is true of A, then it is true for all subsets D of A. For transitive (a+c)-hop strong rule mine where the focus or count entity is a hops from the antecedent and c hops from the consequent, if a (or c) is odd/even then downward/upward closure applies to frequency (confidence). Odd  downward Even  upward S(F,G) R(E,F) 0001 0010 0001 0100 1001 0111 1000 1100 1 2 3 4 E F 2345 1 2 3 4 G AA CC T(G,H) 0001 1010 0001 0101 H 2345 U(H,I) 1001 0101 1000 1100 1 2 3 4 I The proof of the theorem: a pTree, X, is said to be "covered by" a pTree, Y, if  1-bit in X, there is a 1-bit at that same position in Y. Lemma-0: For any two pTrees, X and Y, X & Y is covered by X and ct(X)  ct(X&Y) Proof-0: ANDing with Y may zero some of X's 1-positions but never ones any of X's 0-positions. Lemma-1: Let A  B, & a  B X a is covered by & a  A X a Proof-1: Let Z=& a  B-A X a then &a  B X a = Z & (& a  A X a ), so the result follows from lemma-0. Lemma-2: For a (or c) =0, frequency and confidence are upward closed Proof-2: Lemma-3: If a (or c) we have upward/downward closure of frequency or confidence, then for a+1 (or c+1) we have downward/upward closer. Proof-3: Taking the a and upward closure, going to a+1 and D  A, we are removing ANDs in the numerator for both frequency and confidence, so by Lemma-1, the a+1 numerator is covers the a numerator and therefore the a+1_count  the a_count. Therefore, the condition (frequency or confidence) holds in the a+1 case and we have downward closure. ct(B)  ct(A), so ct(A)>mnsp  ct(B)>mnsp and ct(C&A)/ct(C)>mncf  ct(C&B)/ct(C)>.mncf

8 The Multi-hop Closure Theorem A hop is a relationship, R, hopping from entities E to F. upward closure: If a condition is true of A then it is true of all supersets D of A. downward closure: If a condition is true of A, then it is true for all subsets D of A. For transitive (a+c)-hop strong rule mine where the focus entity is a hops from the antecedent and c hops from the consequent, if a (or c) is odd/even then downward/upward closure applies to frequency (confidence). Odd  downward Even  upward A pTree, X, is "covered by" a pTree, Y, if  1-bit in X, there is a 1-bit at that same position in Y. Lemma-0: For any two pTrees, X&Y is covered by X and ct(X)  ct(X&Y) Proof-0: ANDing with Y may zero some of X's 1-positions but never ones any of X's 0-positions. Lemma-1: Let A  B, & a  B X a is covered by & a  A X a Proof-1: Let Z=& a  B-A X a then &a  B X a = Z & (& a  A X a ), so the result follows from lemma-0.  Thresh is upward/downard closed on A & a1  (&... )S a2 T a1 ) & a(n-1)  (& an  A R an ) ct( Lemma2: If n is even/odd Proof-2: Let A  D, then & an  D R an  & an  A R an a(n-1)  (& an  D R an )  & a(n-1)  (& an  A R an ) & a(n-1)  (& an  D R an )  a(n-2)  & a(n-1)  (& an  A R an ) & a(n-2)  & & a(n-1)  (& an  D R an )  a(n-2)  & a(n-1)  (& an  A R an ) a(n-3)& a(n-2)  & & a(n-3)& &

9 Dear Dr. Perrizo and All, I think I found a method to calculate mode of a dataset using pTrees. Assume we have a data set that is represented by three pTrees. So possible values of each data value is 0 to 7. Now if we do the following operations: F0 = count (P2'&P1'&P0') will give us frequency of value 0 F1 = count (P2'&P1 &P0') will give us frequency of value 1 F2 = count (P2'&P1 &P0') will give us frequency of value 2... F7 = count (P2 &P1 &P0 ) will give us frequency of value 7 Now Mode = Max(F0, F1,...,F7) Problem of this method is: if we have large number of pTrees then there will be large number of F operations and each F operation will involve many AND operations. For examples, if we have 8 pTrees then we'll have 2^8=256 F's and each F contains 8-1=7 AND operations. I have though of a solution that may overcome this problem: Assume we have 3 pTrees and Value=2 is the mode. So if we do F2=P2'&P1&P0' would give us maximum F value. Assume it is m. Now if we get the count of all individual component of F2 that is subsets ( P2', P1, P0, P2'&P1, P2'&P0', P1&P0', P2'&P1&P0') then all of them are must be greater than of equal to m (Down closure property). So to search for P2'&P1&P0' we can run an aprio like algorithm with singleton itemset P2, P2', P1, P1', P0, P0'. Then form doubleton P2P1, P2P1'... etc. Now we need a support value for pruning. Obviously the support should be the mode but we do not know it ahead of time. So we can set a minimum value of mode as support. (Note: There cannot be any PiPi' doubleton as it is 0.) Minimum value of mode is Min(1, floor[Datasize/2^n]) where n is number of pTrees. Sorry I cannot give any example now but I can try to give an example in the white board. Thanks. Sincerely, Mohammad

10 R 11 1 0 1 0 1 Given a n-row table, a row predicate (e.g., a bit slice predicate, or a category map) and a row ordering (e.g., asc on key; or for spatial data, col/row- raster, Z, Hilbert), the sequence of predicate truth bits is the raw or level-0 predicate Tree (pTree) for that table, row predicate and row order. Given a raw pTree, P, a partitioned of it, par, and a bit-set predicate, bsp (e.g., pure1, pure0, gte50%One), the level-1 par, bsp pTree is the string of truths of bsp on consecutive partitions of par. If the partition is an equiwidth=m intervalization, it's called the level-1 stride=m bsp pTree. IRIS Table Name SL SW PL PW Color setosa 38 38 14 2 red setosa 50 38 15 2 blue setosa 50 34 16 2 red setosa 48 42 15 2 white setosa 50 34 12 2 blue versicolor 51 24 45 15 red versicolor 56 30 45 14 red versicolor 57 28 32 14 white versicolor 54 26 45 13 blue versicolor 57 30 42 12 white virginica 73 29 58 17 white virginica 64 26 51 22 red virginica 72 28 49 16 blue virginica 74 30 48 22 red virginica 67 26 50 19 red P 0 SL,0 0 1 0 1 0 1 predicate: remainder(SL/2)=1 order: the given table order P 0 Color=red 1 0 1 0 1 0 1 0 1 pred: Color=red order: given ord P 0 SL,1 1 0 1 0 1 0 1 pred: rem(div(SL/2)/2)=1 order: given order gte50% stride=5 P 1 SL,1 1 0 pure1 str=5 P 1 SL,1 0 gte25% str=5 P 1 SL,1 1 P 0 PW<7 1 0 pred: PW<7 order: given gte50% stride=5 P 1 PW<7 1 0 gte50% st=5 pTree predicts setosa. gte75% str=5 P 1 SL,1 1 0 gte50% str=5 P 1 C=red 0 1 pure1 str=5 P 1 C=red 0 gte25% str=5 P 1 C=red 1 gte75% str=5 P 1 C=red 0 1 P 0 SL,0 0 1 0 1 0 1 rem(SL/2)=1 ord: given gte50% stride=4 P 1 SL,0 0 1 gte50% stride=8 P 1 SL,0 0 1 gte50% stride=16 P 1 SL,0 0 lev2 pTree= lev1 pTree on a lev1. (1col tbl) P 0 SL,0 0 1 0 1 0 1 pred: rem(SL/2)=1 ord: given order P 1 gte50%,s=4,SL,0 ≡ gte50% stride=4 P 1 SL,0 0 1 level-2 gte50% stride=2 1 P 2 gte50%,s=4,SL,0 1 0 1 0 0 0 1 0 1 1 1 1 1 1 0 gte50_P 11 raw level-0 pTree level-1 gt50 stride=4 pTree level-1 gt50 stride=2 pTree

11 FAUST Satlog evaluation R G ir1 ir2 mn 62.83 95.29 108.12 89.50 1 48.84 39.91 113.89 118.31 2 87.48 105.50 110.60 87.46 3 77.41 90.94 95.61 75.35 4 59.59 62.27 83.02 69.95 5 69.01 77.42 81.59 64.13 7 R G ir1 ir2 std 8 15 13 9 1 8 13 13 19 2 5 7 7 6 3 6 8 8 7 4 6 12 13 13 5 5 8 9 7 7 Oblique level-0 using midpoint of means 1's 2's 3's 4's 5's 7's True Positives: 322 199 344 145 174 353 False Positives: 28 3 80 171 107 74 NonOblique lev-0 1's 2's 3's 4's 5's 7's True Positives: 99 193 325 130 151 257 Class actual-> 461 224 397 211 237 470 NonOblq lev1 gt50 1's 2's 3's 4's 5's 7's True Positives: 212 183 314 103 157 330 False Positives: 14 1 42 103 36 189 Oblique level-0 using means and stds of projections (w/o cls elim) 1's 2's 3's 4's 5's 7's True Positives: 359 205 332 144 175 324 False Positives: 29 18 47 156 131 58 Oblique lev-0, means, stds of projections (w cls elim in 2345671 order) Note that none occurs 1's 2's 3's 4's 5's 7's True Positives: 359 205 332 144 175 324 False Positives: 29 18 47 156 131 58 a = pm r + (pm v -pm r ) = pstd v +2pstd r 2pstd r pm r *pstd v + pm v *2pstd r pstd r +2pstd v Oblique level-0 using means and stds of projections, doubling pstd No elimination! 1's 2's 3's 4's 5's 7's True Positives: 410 212 277 179 199 324 False Positives: 114 40 113 259 235 58 Oblique lev-0, means, stds of projs, doubling pstd r, classify, eliminate in 2,3,4,5,7,1 ord 1's 2's 3's 4's 5's 7's True Positives: 309 212 277 154 163 248 False Positives: 22 40 65 211 196 27 2s 1, # of FPs reduced and TPs somewhat reduced. Better? Parameterize the 2 to max TPs, min FPs. Best parameter? Oblique lev-0, means,stds of projs, doubling pstd r, classify, elim 3,4,7,5,1,2 ord 1's 2's 3's 4's 5's 7's True Positives: 329 189 277 154 164 307 False Positives: 25 1 113 211 121 33 above=(std+stdup)/gap below=(std+stddn)/gapdn suggest ord 425713 abv below abv below abv below abv below avg 1 4.33 2.10 5.29 2.16 1.68 8.09 13.11 0.94 4.71 2 1.30 1.12 6.07 0.94 2.36 3 1.09 2.16 8.09 6.07 1.07 13.11 5.27 4 1.31 1.09 1.18 5.29 1.67 1.68 3.70 1.07 2.12 5 1.30 4.33 1.12 1.32 15.37 1.67 3.43 3.70 4.03 7 2.10 1.31 1.32 1.18 15.37 3.43 4.12 red green ir1 ir2 cls avg 4 2.12 2 2.36 5 4.03 7 4.12 1 4.71 3 5.27 2s1/(2s1+s2) elim ord: 425713 TP: 355 205 224 179 172 307 FP: 37 18 14 259 121 33 1 2 3 4 5 7 tot 461 224 397 211 237 470 2000 TP actual 99 193 325 130 151 257 1155 TP nonOb L0 pure1 212 183 314 103 157 330 1037 TP nonOblique 14 1 42 103 36 189 385 FP level-1 50% 322 199 344 145 174 353 1537 TP Obl level-0 28 3 80 171 107 74 463 FP MeansMidPoint 359 205 332 144 175 324 1539 TP Obl level-0 29 18 47 156 131 58 439 FP s1/(s1+s2) 410 212 277 179 199 324 1601 TP 2s1/(2s1+s2) 114 40 113 259 235 58 819 FP Ob L0 no elim 309 212 277 154 163 248 1363 TP 2s1/(2s1+s2) 22 40 65 211 196 27 561 FP Ob L0 234571 329 189 277 154 164 307 1420 TP 2s1/(2s1+s2) 25 1 113 211 121 33 504 FP Ob L0 347512 355 189 277 154 164 307 1446 TP 2s1/(2s1+s2) 37 18 14 259 121 33 482 FP Ob L0 425713 2 33 56 58 6 18 173 TP BandClass rule 0 0 24 46 0 193 263 FP mining (below) G[0,46]  2G[47,64]  5 G[65,81]  7 G[81,94]  4 G[94,255]  {1,3} R[0,48]  {1,2} R[49,62]  {1,5} R[82,255]  3 ir1[0,88]  {5,7}ir2[0,52]  5 Conclusion? MeansMidPoint and Oblique std1/(std1+std2) are best with the Oblique version slightly better. I wonder how these two methods would work on Netflix? Two ways: UTbl(User, M 1,...,M 17,770 )  (u,m); umTrainingTbl = SubUTbl(Support(m), Support(u), m) MTbl(Movie, U 1,...,U 480189 )  (m,u); muTrainingTbl = SubMTbl(Support(u), Support(m), u)

12 Netflix data {m k } k=1..17770 uID rating date u i 1 r m k,u d m k,u u i 2. u i n k m k (u,r,d) avg:5655u/m mID uID rating date m 1 u 1 r m,u d m,u m 1 u 2. m 17770 u 480189 r 17770,480189 d 17770,480189 or U 2649429  -------- 100,480,507 --------  Main:(m,u,r,d) avg:209m/u u 1 u k u 480189 m 1 : m h : m 17770 rmhukrmhuk   47B   MTbl(mID,u 1...u 480189 ) u 0,2 u 480189,0 m 1 : m h : m 17770 0/1   47B   MPTreeSet 3*480189 bitslices wide  (u,m) to be predicted, from umTrainingTbl = SubUTbl(Support(m), Support(u),m) Of course, the two supports won't be tight together like that but they are put that way for clarity. Lots of 0 s in vector sp, umTraningTbl). Want the largest subtable without zeros. How? SubUTbl(  n  Sup(u)  m Sup(n), Sup(u),m)? Using Coordinate-wise FAUST (not Oblique), in each coordinate, n  Sup(u), divide up all users v  Sup(n)  Sup(m) into their rating classes, rating(m,v). then: 1. calculate the class means and stds. Sort means. 2. calculate gaps 3. choose best gap and define cutpoint using stds. This of course may be slow. How can we speed it up? Coord FAUST, in each coord, v  Sup(m), divide up all movies n  Sup(v)  Sup(u) to rating classes 1. calculate the class means and stds. Sort means. 2. calculate gaps 3. choose best gap and define cutpoint using stds. Gaps alone not best (especially since the sum of the gaps is no more than 4 and there are 4 gaps). Weighting (correlation(m,n)-based) useful (higher the correlation the more significant the gap??) Ctpts constructed for just this one prediction, rating(u,m). Make sense to find all of them. Should just find, e,g, which n-class-mean(s) rating(u,n) is closest to and make those the votes? m 1... m h... m 17770 u 1 : u k. u 480189 rmhukrmhuk   47B   UserTable(uID,m 1,...,m 17770 ) m 0,2... m 17769,0 u 1 : u k. u 480189 1/0   47B   UPTreeSet 3*17770 bitslices wide  (u,m) to be predicted, form umTrainingTbl=SubUTbl(Support(m),Support(u),m) u 324513?45 m12455m12455 m12455m12455


Download ppt "O 0 r 1 v 1 r 2 v 2 r 3 v 3 v 4 dim2 dim1 Algorithm-1: Look for dimension where clustering best. Below, dimension=1 (3 clusters: {r 1,r 2,r 3,O}, {v 1,v."

Similar presentations


Ads by Google