Presentation is loading. Please wait.

Presentation is loading. Please wait.

Find f0, f1 and d1≡(f1-f0)/f1-f0) as before:

Similar presentations


Presentation on theme: "Find f0, f1 and d1≡(f1-f0)/f1-f0) as before:"— Presentation transcript:

1 Find f0, f1 and d1≡(f1-f0)/f1-f0) as before:
p x y Find f0, f1 and d1≡(f1-f0)/f1-f0) as before: Form SPTS(dis(x,M)) and pick a furthest point [from M], f0. f2 Form SPTS(dis(x,f0)) and pick a furthest point [from f0], f1. f1 Next we want to pick an x whose shadow on d1 is longest (a point furthest away from the d1-line). The projections of the x's onto d1 are x' ≡ (x-f0)-((x-f0)od1)d1. d2 We could create the whole PTreeSet (all columns) for the x's and repeat the above, but that is a massive construction just to pick a furthest point [from d1], f2. d1 f0 Instead, by Pythagoras x'ox' = |x'|2 is (x-f0)o(x-f0) - ((x-f0)od1)2 d1 Can we get SPTS(x'ox') as SPTS(xox) - SPTS(xod1)*SPTS(xod1) ? Mohammad, can we calculate the product and difference of two SPTSs using pTree calculations (no loops)? If so, in 3-D, SPTS(x''ox'') is SPTS(xox) - SPTS(xod1)*SPTS(xod1) - SPTS(xod2)*SPTS(xod2) where d2 ≡ f2-(f2od1)d1 |f2-(f2od1)d1| We need to verify that x'' ≡ x (xod1)d (xod2)d2 is orthogonal to d1 and to d2. x''od1 = xod1 - (xod1)(d1od1) (xod2)(d2od1) x''od1 = xod1 - (xod1)(1) (xod2)(0) = 0 x''od2 = xod2 - (xod1)(d1od2) (xod2)(d2od2) x''od2 = xod2 - (xod2)(0) (xod2)(1) = 0 Of course in this Spaeth Dataset, there are no "extreme" anomalies (all are in between other clusters). Next I add 22 anomalies to IRIS systematically, all of which are extreme outliers in some sense.

2 SL SW PL PW set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver SL SW PL PW ver ver ver ver ver ver ver ver ver vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir t t t t t t t t t t t t t t tall b b b b b b b b b b b b b b ball Before adding the new tuples: MINS MAXS MEAN same after additions.

3 SL SW PL PW D(xM) &67 set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set set ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver ver D(x,f0) &67 sDx,f0 gp>4 b b b b vir 33 0 vir 35 0 vir 36 0 vir 38 0 vir 39 0 vir 40 0 vir 41 0 b vir 42 0 vir 43 0 ball 43 0 ver 43 0 b ver 44 0 vir 44 0 b ver 45 0 vir 45 0 vir 46 0 ver 46 0 ver 47 0 vir 47 0 ver 48 0 ver 49 0 vir 49 0 ver 50 0 ver 51 0 vir 51 0 ver ver ver ver ver ver ver ver ver ver vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir vir t t t t t t t t t t t t t t tal b b b b b b b b b b b b b b ball Counts: b ver 52 0 b ver 53 0 ver 54 0 t b ver 55 0 ver 56 0 b b ver 60 0 b ver 61 0 t set 66 0 t set 67 0 set 68 0 b set 69 0 vir 69 0 t set 70 0 set 71 0 t set 72 0 set 73 0 set 74 0 set 75 0 set 76 0 set 78 0 t t t t t t t t tal 103 0 t f1 f0

4 D(x,f1) &67 sDx,f1 gp>4 set t set t set set set set set set t set t set t ver set ver set t set set set vir set t ver set t ver set ver vir set ver ver ver t ver ver ver ver b ver vir ver vir vir ver ver vir vir ver vir ver vir ver vir vir ver vir b vir vir vir vir vir b vir vir b vir vir vir vir b vir vir b vir vir vir b b b b b b b ball 122 0 I think quite clearly, doing gap analysis the same way on individual attributes, 1,2,3,4 would isolate all of the other anomalies as singletons or doubletons (didn't have time to do it). f1

5 The mathematics of SPTS's:
Every SPTS is a functional on the set X (function from X to R1). A vertical functional is a functional expressed vertically as a two column table, vf( X, R1-values ). A pTree functional is a vertical functional in which the value bitslices are expressed as basic pTrees in a PTreeSet. The simplest examples are the coordinate projections, ek:XR1 where ek( x≡(x1,...xn) ) = xk Others include distance(x,p) where distance is any distance and p is a fixed point (e.g., such as M or f0 or f1 or ...) length(x) length2(x)≡xox Given any hyperplane, H, with orthonormal basis, o1,...ok, projoh(x) = oh-component of the projection of x on H (e.g., H is the space perpendicular to the line from f0 to f1).

6 Round 2 is straight forward. So, 1. Given gaps, find ct=k_intervals.
p x y No gaps (ct=0_intervals) on the furthest-to-Mean line, but 3 ct=1 intevals. Declare p=p12, p16, p18 anomaly if pofM is far enough from the bddry pts of its interval? VOM (34, 35) Mean, M Round 2 is straight forward. So, 1. Given gaps, find ct=k_intervals. 2. Find good gaps (dot prod with a constant vector for linear gaps?) For rounded gaps, use xox? Note: in this example, vom works better than mean.

7 Length based gapping is dependent.
Using vector lengths However, if the data happens to be shifted, as it is on the right, using lengths no longer works in this example. That is, dot product with a fixed vector, like fM is independent of the placement of the points with respect to the origin. Length based gapping is dependent. 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 A squared pattern does not lend itself to rounded gap boundaries. distance from the origin is in red. Distance from (7,0) is in blue. x x 8 7 x x x x x x x x x x x x x x x x 6 x x x x x x x x x x x x x x x x 5 x x x x x x x x x x x x x x x x 4 x x x x x x x x x x x x x x x x 3 x x x x x x x x x x x x x x x x 2 x x x x x x x x x x x x x x x x 1 x x x x x x x x x x x x x x x x 0 x x x x x x x x x x x x x x x x a b c d e f

8 Thin interval finder on the fM line using the scalar pTreeSet, PTreeSet(xofM) (the pTree slices of these projection lengths) Looking for Width24_Count=1_ThinIntervals or W16_C1_TIs 1 z1 z z7 2 z z z8 3 z z z9 za M 6 7 zf zb a zc b zd ze c a b c d e f X x1 x2 z z z z z z z z z za 13 4 zb 10 9 zc 11 10 zd 9 11 ze 11 11 zf 7 8 xofM 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 f= &p5' 1 C=3 p5' C=2 p5 C=8 &p4' 1 C=1 p4' p4 C=2 C=0 C=6 p6' 1 C=5 p6 C10 W=24 C=1 [ , ] =[0,16). z1ofM=11 is 5 units from 16, so z1 not declared an anomaly. W=24 C=1 [ , ] =[32,48). z4ofM=34 is within 2 of 32, so z4 is not declared an anomaly. W=24 C=1 [ , ] =[48, 64). z5ofM=53 is 19 from z4ofM=34 (>24) but 11 from 64. The next interval [64,80) is empty and it's 27 from 80 (>24) so z5 is an anomaly and we make a cut through z5. W=24 C=0 [ , ]=[64, 80). Ordinarily we cut thru the midpoint of C=0 intervals, but in this case it's unnecessary since it would duplicate the z5 cut just made. Here we started with xofM distances. The same process works starting with any distance based ScalarPTreeSet, e.g., xox, etc.

9 Defining gaps: Any scalar pTreeSet where the scalar is a distance, can be used for gap based
FAUST Clustering / Anomaly _Detection or FAUST Classification. Certainly the dot product with any fixed vector works (gaps in the projections along the line generated by the vector). E.g., use vectors fM; fM/|fM|; or in general, a*fM (a constant); (where M is a medoid (mean or vector of medians) and f is a "furthest point" from M). fF; fF/|fF|; or in general, a*fF (a constant); (where F is a "furthest point" from f). ek where ek - ( ) (1 in the kth position) V1=(-b a ) where V = (a b c d e ...) is any one of the vectors above. (gives us a vector orthogonal to V) V2=(a b C ) where C=-(a2 + b2)/c (vector orthogonal to V and to V1); etc. (Vk for all k=1...n forming a orthogonal basis with V But also, if one takes the ScalarPTreeSet of all vector lengths (or squares of lengths to make it easy) that is also a ScalarPTreeSet and the gaps are radial gaps as one proceeds out from the origin. One can note that this is just the column of xox values, so it is dot product generated also. If one takes just the ScalarPTreeSet of all ith coordinate values (V=ei above), that works as well. In this case we get gaps in the value distribution of the ith coordinates. This was used, for instance in coordinate-wise (non-Oblique) FAUST. PTreeSet 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 Take a fixed vector, y0, the ScalarPTreeSet of all vector lengths (or squares of lengths) of vectors, x-y0, is also a ScalarPTreeSet that works and the gaps are radial gaps as one proceeds out from the point, y0. Note that this is just the column of xoy0 values.

10 pTree Text Mining data Cube layout: tePt=again tePt=all tePt=a
lev2, pred=pure1 on tfP1 -stide 1 hdfP t=a t=again t=all lev-2 (len=VocabLen) 8 1 3 df count <--dfP3 <--dfP0 t=a t=again t=all . . . tfP0 1 tfP1 lev1tfPk eg pred tfP0: mod(sum(mdl-stride),2)=1 2 doc=1 d=2 d=3 term=a t=a t=a d= d= d=3 t=again t=again t=again tf d=1 d= d=3 t=all t=all t=all ... tePt=again t=a d=1 t=a d=2 t=a d=3 1 tePt=a t=again t=again t=again d= d= d=3 tePt=all t=all d=1 t=all d=2 t=all d=3 lev1 (len=DocCt*VocabLen) lev0 corpusP (len=MaxDocLen*DocCt*VocabLen) t=a d=1 t=a d=2 t=a d=3 t=again d=1 1 Math book mask Libry Congress masks (document categories move us up document semantic hierarchy  ptf: positional term frequency The frequency of each term in each position across all documents (Is this any good?). 2 d=1 Preface 1 d=1 commas d=1 References Reading position masks (pos categories) move us up position semantic hierarchy  (and allows puncutation etc., placement.) 1 te ... tf2 1 ... tf1 1 ... tf0 3 2 tf are April apple and an always. all again a Vocab Terms 1 3 2 df . . . . . . 1 JSE HHS LMM documnet Corpus pTreeSet data Cube layout: 1 2 3 4 5 6 7 Position

11 tf a b f l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l 60 content words from 44 Mother Goose documents (listed on the next slide). I started with 50 documents, but only documents with at least two content words were kept.

12 Three blind mice. See how they run
Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. Old Mother Hubbard went to the cupboard to give her poor dog a bone. But when she got there the cupboard was bare and so the poor dog had none. She went to the baker to buy him some bread. When she came back the dog was dead. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. Hush baby. Daddy is near. Mamma is a lady and that is very clear. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. One misty moisty morning when cloudy was the weather, I met an old man clothed all in leather. He began to praise and I began to grin. How do you do? And how do you do again? There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man, what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. I had a little husband no bigger than my thumb. I put him in a pint pot, and I bid him drum. I bought a little hanky to wipe his little nose and a pair of little garters to tie his little hose. How many miles to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. There was an old woman, and what do you think? She lived on nothing but victuals and drink. Victuals and drink were the chief of her diet, yet this old woman could never be quiet. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. Cry baby cry. Put your finger in your eye and tell your mother it was not I. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. Jack, come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle, they will think that I have gone mad. For many a joyous day, my fiddle and I have had. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them ? Buttons, a farthing a pair! Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. Here we go round the mulberry bush, the mulberry bush, the mulberry bush. Here we go round the mulberry bush, on a cold and frosty morning. This is the way we wash our hands, wash our hands, wash our hands. This is the way we wash our hands, on a cold and frosty morning. This is the way we wash our clothes, wash our clothes, wash our clothes. This is the way we wash our clothes, on a cold and frosty morning. This is the way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH

13 te a b f l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH df h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l

14 mtf=10 a b f *tf/df l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l

15 mtf0 a b f l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l

16 mtf1 a b f l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l

17 mtf2 a b f l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l

18 mtf3 a b f l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l

19 mtf4 a b f l b r b c c c c i g w a b b b r i r c h l l c r d f d f g r h h a w a a b b a b b e g o b a i e o o o c c d i d e a d u i e i i y a b c a a k e o a h w u k l a t c w r u a s o a l l l r e g l s y y k d g e d y d t n y e d n h k n y t y h g t l e l l n h l 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH h m m o r t t w o k l l m e o m t n p o s h h t t w o w u i a a a m r n o h o o p p l u r i s r u o r t w i m o s n d m i e r e r e s l i i u n u n o e m w e w a f a o e g y b d n y y n r e d e g m d n g n e b n e o y e n l

20 APPENDIX: FAUST=Fast, Accurate Unsupervised and Supervised Teaching (Teaching big data to reveal info) FAUST CLUSTER-fmg (furthest-to-mean gaps for finding round clusters): C=X (e.g., X≡{p1, ..., pf}= 15 pix dataset.) While an incomplete cluster, C, remains find M ≡ Medoid(C) ( Mean or Vector_of_Medians or? ). Pick fC furthest from M from S≡SPTreeSet(D(x,M) .(e.g., HOBbit furthest f, take any from highest-order S-slice.) If ct(C)/dis2(f,M)>DT (DensThresh), C is complete, else split C where P≡PTreeSet(cofM/|fM|) gap > GT (GapThresh) End While. Notes: a. Euclidean and HOBbit furthest. b. fM/|fM| and just fM in P. c. find gaps by sorrting P or O(logn) pTree method? C2={p5} complete (singleton = outlier). C3={p6,pf}, will split (details omitted), so {p6}, {pf} complete (outliers). That leaves C1={p1,p2,p3,p4} and C4={p7,p8,p9,pa,pb,pc,pd,pe} still incomplete. C1 is dense ( density(C1)= ~4/22=.5 > DT=.3 ?) , thus C1 is complete. Applying the algorithm to C4: In both cases those probably are the best "round" clusters, so the accuracy seems high. The speed will be very high! {pa} outlier. C2 splits into {p9}, {pb,pc,pd} complete. 1 p1 p p7 2 p p p8 3 p p p9 pa 5 6 7 pf pb a pc b pd pe c d e f a b c d e f M M f1=p3, C1 doesn't split (complete). M f M4 1 p2 p5 p1 3 p p p9 4 p p8 p7 pf pb pe pc pd pa 8 a b c d e f Interlocking horseshoes with an outlier X x1 x2 p p p p p p p p p pa pb pc pd pe pf D(x,M0) 2.2 3.9 6.3 5.4 3.2 1.4 0.8 2.3 4.9 7.3 3.8 3.3 1.8 1.5 C1 C C C4 M1 M0

21 FAUST Oblique PR = P(X dot d)<a d-line D≡ mRmV = oblique vector.
d=D/|D| Separate classR, classV using midpoints of means (mom) method: calc a View mR, mV as vectors (mR≡vector from origin to pt_mR), a = (mR+(mV-mR)/2)od = (mR+mV)/2 o d (Very same formula works when D=mVmR, i.e., points to left) Training ≡ choosing "cut-hyper-plane" (CHP), which is always an (n-1)-dimensionl hyperplane (which cuts space in two). Classifying is one horizontal program (AND/OR) across pTrees to get a mask pTree for each entire class (bulk classification) Improve accuracy? e.g., by considering the dispersion within classes when placing the CHP. Use 1. the vector_of_median, vom, to represent each class, rather than mV, vomV ≡ ( median{v1|vV}, 2. project each class onto the d-line (e.g., the R-class below); then calculate the std (one horizontal formula per class; using Md's method); then use the std ratio to place CHP (No longer at the midpoint between mr [vomr] and mv [vomv] ) median{v2|vV}, ... ) dim 2 vomR vomV r   r vv r mR   r      v v v v       r    r      v mV v      r    v v     r         v                     v2 v1 d-line dim 1 d a std of these distances from origin along the d-line

22 1. MapReduce FAUST. Current_Relevancy_Score =9. Killer_Idea_Score=2
1. MapReduce FAUST Current_Relevancy_Score =9 Killer_Idea_Score= Nothing comes to minds as to what we would do here.  MapReduce.Hadoop is a key-value approach to organizing complex BigData.  In FAUST PREDICT/CLASSIFY we start with a Training TABLE and in FAUST CLUSTER/ANOMALIZER  we start with a vector space. Mark suggests (my understanding), capturing pTreeBases as Hadoop/MapReduce key-value bases? I suggested to Arjun developing XML to capture Hadoop datasets as pTreeBases. The former is probably wiser. A wish list of great things that might result would be a good start. 2.  pTree Text Mining: Current_Relevancy_Score =10  Killer_Idea_Score=9   I I think Oblique FAUST is the way to do this.  Also there is the very new idea of capturing the reading sequence, not just the term-frequency matrix (lossless capture) of a corpus. 3. FAUST CLUSTER/ANOMALASER: Current_Relevancy_Score =9               Killer_Idea_Score=9   No No one has taken up the proof that this is a break through method.  The applications are unlimited! 4.  Secure pTreeBases: Current_Relevancy_Score =9            Killer_Idea_Score=10     This seems straight forward and a certainty (to be a killer advance)!  It would involve becoming the world expert on what data security really means and how it has been done by others and then comparing our approach to theirs.  Truly a complete career is waiting for someone here! 5. FAUST PREDICTOR/CLASSIFIER: Current_Relevancy_Score =9             Killer_Idea_Score= No one done a complete analysis of this is a break through method.  The applications are unlimited here too! 6.  pTree Algorithmic Tools: Current_Relevancy_Score =10                 Killer_Idea_Score= This is Md’s work.  Expanding the algorithmic tool set to include quadratic tools and even higher degree tools is very powerful.  It helps us all! 7.  pTree Alternative Algorithm Impl: Current_Relevancy_Score =9               Killer_Idea_Score= This is Bryan’s work.  Implementing pTree algorithms in hardware/firmware (e.g., FPGAs) - orders of magnitude performance improvement? 8.  pTree O/S Infrastructure: Current_Relevancy_Score =10                    Killer_Idea_Score= This is Matt’s work.  I don’t yet know the details, but Matt, under the direction of Dr. Wettstein, is finishing up his thesis on this topic – such changes as very large page sizes, cache sizes, prefetching,…  I give it a 10/10 because I know the people – they do double digit work always! From: Sent: Thurs, Aug Dear Dr. Perrizo, Do you think a map reduce class of FAUST algorithms could be built into a thesis? If the ultimate aim is to process big data, modification of existing P-tree based FAUST algorithms on Hadoop framework could be something to look on? I am myself not sure how far can I go but if you approve, then I can work on it. From: Mark to:Arjun Aug 9 From industry perspective, hadoop is king (at least at this point in time). I believe vertical data organization maps really well with a map/reduce approach –   these are complimentary as hadoop is organized more for unstructured data, so these topics are not mutually exclusive. So from industry side I’d vote hadoop… from Treeminer side text (although we are very interested in both) From: Sent: Friday, Aug 10 I’m working thru a list of what we need to get done – it will include implementing anomaly detection which is now on my list for some time.  I tried to establish a number of things such that even if we had some difficulties with some parts we could show others (w/o digging us too deep). Once I get this I’ll get a call going.  I have another programming resource down here who’s been working with me on our production code who will also be picking up some of the work to get this across the finish line, and a have also someone who was a director at our customer previously assisting us in packaging it all up so the customer will perceive value received… I think Dale sounded happy yesterday.


Download ppt "Find f0, f1 and d1≡(f1-f0)/f1-f0) as before:"

Similar presentations


Ads by Google