FAUST Oblique Analytics : X(X 1..X n )  R n |X|=N, Classes={C 1..C K }, d=(d 1..d n ) |d|=1, p=(p 1..p n )  R n, L, R: FAUST C ount C hange C lusterer.

Slides:



Advertisements
Similar presentations
CS 450: COMPUTER GRAPHICS LINEAR ALGEBRA REVIEW SPRING 2015 DR. MICHAEL J. REALE.
Advertisements

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
MS 101: Algorithms Instructor Neelima Gupta
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
How and why to use Spearman’s Rank… If you have done scattergraphs, Spearman’s Rank offers you the opportunity to use a statistical test to get a value.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Class 25: Question 1 Which of the following vectors is orthogonal to the row space of A?
First we need to understand the variables. A random variable is a value of an outcome such as counting the number of heads when flipping a coin, which.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Hamming Codes 11/17/04. History In the late 1940’s Richard Hamming recognized that the further evolution of computers required greater reliability, in.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Positive and Negative Numbers
Mathematical Fundamentals
Chapter 10 Review: Matrix Algebra
Skills for October Rounds
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Chapter 5. Loops are common in most programming languages Plus side: Are very fast (in other languages) & easy to understand Negative side: Require a.
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
Mathematical Methods Wallands Community Primary School
Great Theoretical Ideas in Computer Science.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Standard What is the y-intercept of the graph of 4x + 2y = 12
CSE554Fairing and simplificationSlide 1 CSE 554 Lecture 6: Fairing and Simplification Fall 2012.
Alternative Wide Block Encryption For Discussion Only.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Module #9: Matrices Rosen 5 th ed., §2.7 Now we are moving on to matrices, section 7.
FAUST Oblique Analytics (based on the dot product, o). Given a table, X(X 1..X n ), |X|=N and vectors, D=(D 1..D n ), FAUST Oblique employs the ScalarPTreeSets.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.
1 p1 p2 p7 2 p3 p5 p8 3 p4 p6 p9 4 pa pf 9 pb a pc b pd pe c d e f a b c d e f X x1 x2 p1 1 1 p2 3 1 p3 2 2 p4 3 3 p5 6 2 p6.
From: Mark Silverman Sent: Wed, May 28, :48 Hypothesis: How do you algorithmically.
Given k, k-means clustering is implemented in 4 steps, assumes the clustering criteria is to maximize intra- cluster similarity and minimize inter-cluster.
Class 26: Question 1 1.An orthogonal basis for A 2.An orthogonal basis for the column space of A 3.An orthogonal basis for the row space of A 4.An orthogonal.
Class 24: Question 1 Which of the following set of vectors is not an orthogonal set?
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
The parity bits of linear block codes are linear combination of the message. Therefore, we can represent the encoder by a linear system described by matrices.
Level-0 FAUST for Satlog(landsat) is from a small section (82 rows, 100 cols) of a Landsat image: 6435 rows, 2000 are Tst, 4435 are Trn. Each row is center.
Enclose clusters with gaps using functionals (ScalarPTreeSets or SPTSs): C p,d (x)=(x-p) o d /  (x-p) o (x-p) Conical Separating clusters by cone gaps.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
EEE502 Pattern Recognition
Correspondences this week From: Arjun Roy Sent: Sunday, March 02, :14 PM Subject: C++/C# Compiler I did some tests to compare C/C++ Vs C# on some.
FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.
UpS = The Universe of all pTree Sets= {all vectors of SPSs (formerly known as SPTSs)} V=n-dimensional vector space. Code all operations as n-ary operations.
Basic Theory (for curve 01). 1.1 Points and Vectors  Real life methods for constructing curves and surfaces often start with points and vectors, which.
Q&A f=distance dominated functional, avgGap=(f max -f min )/|f(X)| may be a good measurement for setting thresholds, e.g., x is an outlier=anomaly if.
Real Zeros of Polynomial Functions
Age stage expectations The calculation policy is organised according to age stage expectations as set out in the National Curriculum 2014, however it.
Matrices Rules & Operations.
p=AvgS p=AvgE p=AvgI d=e1 d=e2 d=e3 d=e4 FAUST Oblique, LSR Lp,d
FAUST Analytics X(X1. Xn)Rn, |X|=N; Classes={C1. CK}; d=(d1
Ld Xod = Xod-pod= Ld-pod Lmind,k= min(Ld&Ck) Lmaxd,k= max(Ld&Ck)
FAUST Oblique Analytics Given a table, X(X1
Classify x as class C iff there exists cC such that (c-x)o(c-x)  r2
FAUST Oblique Analytics Given a table, X(X1
PDR PTreeSet Distribution Revealer
PTree Rank(K) (Rank(n-1) applied to SpS(XX,d2(x,y)) gives 2nd smallest distance from each x (useful in outlier analysis?) RankKval=0; p=K; c=0; P=Pure1;
Ld Xod = Xod-pod= Ld-pod Lmind,k= min(Ld&Ck) Lmaxd,k= max(Ld&Ck)
FAUST Outlier Detector To be used when the goal is to find outliers as quickly as possible. FOD recursively uses a vector, D=FurthestFromMedian-to-FurthestFromFurthestFromMedia.
SpS[ XX, d2(x=(x1,x2), y=(x3,x4) ] = SpS[ XX, (x1-x3)2+(x2-x4)2 ] =
FAUST Oblique Analytics Given a table, X(X1
Fundamentals of Data Representation
Let's review Data Analytics Technology, Supervised and Supervised
FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.
FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.
pTree-k-means-classification-sequential (pkmc-s)
Sp  (X-p)o(X-p) = XoX + Xo(-2p) + pop = L-2p + XoX+pop
Machine Learning = moving data up some concept hierarchy (increasing information and/or/by reducing volume). ML takes two forms: clustering and classification.
Classify x into C iff there exists cC such that (c-x)o(c-x)  r2
Presentation transcript:

FAUST Oblique Analytics : X(X 1..X n )  R n |X|=N, Classes={C 1..C K }, d=(d 1..d n ) |d|=1, p=(p 1..p n )  R n, L, R: FAUST C ount C hange C lusterer : If DensThres not reached, cut C at PCC s  L p,d &C w next (p,d)  pdSet FAUST T op KO utliers : Use D 2 NN=SqDist(x, X')=rank 2 S x for TopKOutlier-slider. FAUST P iecewise L inear C lassifier : y is C k iff y  LH k  { z | ll p.d,k  (z-p) o d  hl p,d,k }  (p,d)  pdSet LH k is Linear Hull of Class=k, pdSet is chosen set of (p,d) pairs, e.g., (DiagonalStartPt, Diagonal). X o D is a central computation for FAUST. e.g., X o d is the only SPTS needed in FAUST CCC lusterer and PLC lassifier.  x  X, D 2( X,x)=(X-x)o(X-x)=XoX+xox-2Xox. X o X is pre-computed 1 time, then x o x is read from XoX, leaving X o x. Then the Rank i PTR(x,ptr-to-Rank i D 2 (X,x)) SPTS and the Rank i SD(x,Rank i D 2 (X,x))) valueTree (ordered descending on Rank i D 2 (X,x), i=2..q) are constructed. X o X, -2X o p, X o d pre-computed, then 2 scalar adds, 1 mult., 2 adds X o X+p o p-2X o p - X o X+p o p-2X o p - [X o d-p o d] 2 Then T op KO utliers uses SPTS, R p,d, which measures Square Radial Reach of each x  X from the d-line thru p. Then T op KO utliers uses SPTS, R p,d, which measures Square Radial Reach of each x  X from the d-line thru p. (X-p) o (X-p) - [(X-p) o d] 2 = p x d  (x-p) o (x-p) (x-p) o d = |x-p| cos   (x-p) o (x-p) - (x-p) o d 2 If X is a high-value classification training set (eg, Enron s), pre-compute what? 1. column statistics(min, avg, max, std,...) ; 2. X o X; X o p, p=class_Avg/Median); 3. X o d, d=interclass_Avg/Median_UnitVector; 4. X o x, d 2 (X,x), Rank i d 2 (X,x), x  X, i=2,3...; 5. L p,d and R p,d for all p's and d's above FAUST L inear A nd R adial C lassifier y is C k iff y  LRH k  {z | ll p.d,k  (z-p) o d  hl p,d,k AND lr p.d,k  (z-p) o (z-p) - (z-p) o d 2  hr p,d,k  (p,d)  pdSet } L p,d  (X-p) o d, ll p,d,k =minL p,d &C k, hl p,d,k =maxL p,d &C k, S p  (X-p) o (X-p) ls p,d,k =minS p &C k, hs p,d,k =maxS p &C k. R p,d  S p, - L p,d 2 lr p,d,k =minR p,d &C k, hr p,d,k =maxR p,d &C k.

LARC on IRIS150 Dse ; x o Des: S E I y isa O if y o D  (- ,-184)  (382,590)  (2725,  ) y isa O or S(50) if y o D  C 1,1  [-184, 123] y isa O or I(1) if y o D  C 1,2  [ 381, 590] y isa O or I(38) if y o D  C 1,4  [1331,2046] y isa O or E(50) or I(11) if y o D  C 1,3  [ 590,1331] SRR(AVGs,dse) on C 1, S y isa O if y isa C 1,1 AND SRR(AVGs,Dse)  (154,  ) y isa O or S(50) if y isa C 1,1 AND SRR(AVGs,DSE)  [0,154] SRR(AVGs,dse) on C 1,2 only one such I SRR(AVGs,dse) onC 1, E 7 143I y isa O if y isa C 1,3 AND SRR(AVGs,Dse)  (- ,2)U(392,  ) y isa O or E(10) if y isa C 1,3 AND SRR in [2,7) y isa O or E(40) or I(10) if y isa C 1,3 AND SRR in [7,137) = C 2,1 y isa O or I(1) if y isa C 1,3 AND SRR in [137,143] etc. We use the Radial steps to remove false positives from gaps and ends. We are effectively projecting onto a 2-dim range, generated by the Dline and the D  line (which measures the perpendicular radial reach from the D-line). In the D  projections, we can attempt to cluster directions into "similar" clusters in some way and limit the domain of our projections to one of these clusters at a time, accommodating "oval" shaped or elongated clusters giving a better hull fit. E.g., in the Enron case the dimensions would be words that have about the same count, reducing false positives. Dei ; x o Dei on C 2,1 : E -2 3 I y isa O if y o D  (- ,-2)  (19,  ) y isa O or I(8) if y o D  [ -2, 1.4] y isa O or E(40) or I(2) if y o D  C 3,1  [ 1.4,19] SRR(AVGe,dei) onC 3, E 8 106I y isa O if y isa C 3,1 AND SRR(AVGs,Dei)  [0,2)  (370,  ) y isa O or E(4) if y isa C 3,1 AND SRR(AVGs,Dei)  [2,8) y isa O or E(27) or I(2) if y isa C 3,1 AND SRR(AVGs,Dei)  [8,106) y isa O or E(9) if y isa C 3,1 AND SRR(AVGs,Dei)  [106,370]

LARC on IRIS150-2 We use the diagonals. Also we set a MinGapThreshold=2 which will mean we stay 2 units away from any cut d=e 1 =1000; The x o d limits: S E I y isa O if y o D  (- ,43)  (79,  ) y isa O or S( 9) if y o D  [43,47] y isa O or S(41) or E(26) or I( 7) if y o D  (47,60) (y  C 1,2 ) y isa O or E(24) or I(32) if y o D  [60,72] (y  C 1,3 ) y isa O if y o D  [43,47]&SRR  (- ,52)  (60,  ) y isa O or I(11) if y o D  (72,79] y isa O if y o D  [72,79]&SRR  (- ,49)  (78,  ) d=e 2 =0100 on C 1,3 x o d lims: E I zero differentiation! y isa O or E( 3) if y o D  [18,23) y isa O if y o D  (- ,18)  (46,  ) y isa O or E(13) or I( 4) if y o D  [23,28) (y  C 2,1 ) y isa O or S(13) or E(10) or I( 3) if y o D  [28,34) (y  C 2,2 ) y isa O or S(28) if y o D  [34,46] y isa O if y o D  [18,23)&SRR  [0,21) y isa O if y o D  [34,46]&SRR  [0,32]  [46,  ) d=e 3 =0010 on C 2,2 x o d lims: S E I y isa O if y o D  (- ,28)  (33,  ) y isa O or S(13) or E(10) or I(3) if y o D  [28,33] d=e 3 =0001 x o d lims: E I y isa O or S(13) if y o D  [1,5] y isa O if y o D  (- ,1)  (5,12)  (24,  ) y isa O or E( 9) if y o D  [12,16) y isa O or E( 1) or I( 3) if y o D  [16,24) y isa O if y o D  [12,16)&SRR  [0,208)  (558,  ) y isa O if y o D  [16,24 )&SRR  [0,1198)  (1199,1254)  1424,  ) y isa O or E(1) if y o D  [16,24)&SRR  [1198,1199] y isa O or I(3) if y o D  [16,24)&SRR  [1254,1424] d=e 2 =0100 on C 1,2 x o d lims: S E I y isa O or E(17) if y o D  [60,72]&SRR  [1.2,20] y isa O or I(25)if y o D  [60,72]&SRR  [66,799] y isa O or E( 7) or I( 7)if y o D  [60,72]&SRR  [20, 66] y isa O if y o D  [0,1.2)  (799,  )

LARC IRIS150. d=e 1 p=Avg S, L=(X-p) o d -8 8 S&L E&L I&L -8,-2 16 [-2,8) 34, 24, [20,29] 12 [8,20) 26, E=26 I=5 30ambigs, 5 errs d=e 4 p=Avg S, L=(X-p) o d -2 4 S&L 7 16 E&L I&L -2,4) 50 [7,11) 28 [16,23] I=34 [11,16) 22, E=22 I=16 38ambigs 16errs d=e 3 p=Avg E, L=(X-p) o d S&L E&L I&L,-25) , [9,27] I=34 [-12,9) 49, 15 2(17) E=32 I=14 d=e 4 p=Avg E, L=(X-p) o d S&L -3 5 E&L 1 12 I&L -7] 50 [-3,1) 21 [5,12] 34 [1,5) 22, E=22 I=16 d=e 2 p=Avg S, L=(X-p) o d S&L E&L I&L,-13) 1 -13,-11 0, 2, 1 all=-11 [0,4) [4, ,0 29,47, , 1 46,11 2, 1 9, 3 d=e 3 p=Avg S, L=(X-p) o d -5 5 S&L E&L 4 55 I&L -5,4) 47 [4,15) 3 1 [37,55] I=34 [15,37) 50, E=18 I=12 3, 1 d=e 1 p=Avg E, L=(X-p) o d S&L E&L I&L [-11,-1) 33, 21, [11,20] I12 [-1,11) 26, E=7 I=4 E=5 I=3 d=e 2 p=Avg E, L=(X-p) o d -5 `17 S&L -8 7 E&L I&L,-6) 1 [-6, -5) 0, 2, [7,11) [11, err [-5,7) 29,47, , 21 21, 3 d=e 1 p=Avg I, L=(X-p) o d S&L E&L I&L [-17,-8) 33, 21, [-8,4) 26, E=26 I=11 E=2 I=1 d=e 2 p=Avg I, L=(X-p) o d -7 `15 S&L E&L -8 9 I&L,-6) 1 [6,11) [11, [-7, 4) 29,46, [-8, -7) 2, 1 allsame E=2 I=1 E=47 I=22 [5, 9] 9, 2, 1 allsame S=9 E=2 I=1 d=e 3 p=Avg I, L=(X-p) o d S&L E&L I&L,-25) , [9,27] I=34 [-25,-4) 50, E=32 I=14 E=46 I=14 d=e 4 p=Avg I, L=(X-p) o d S&L E&L -6 5 I&L -7] 50 [-3,1) 21 [5,12] 34 [-6,-3) 22, 16 same range E=22 I=16 d=Avg E  Avg I p=Avg E, L=(X-p) o d S E I R(p,d,X) S E I [-17,-14)] I(1) [-14,11) (50, 13) [11,33] I(36) E=47 I=12 R(p,d,X) S E I [12,17.5)] I(1) d=Avg S  Avg I p=Avg S, L=(X-p) o d -6 5 S E I [17.5,42) (50,12) [11,33] I(37) E=45 I=12 d=Avg S  Avg E p=Avg S, L=(X-p) o d -6 4 S E I R(p,d,X) S E I [11,18)] I(1) [18,42) (50,11) [42,64] 38 E=39 I=11

LARC on IRIS150 Dse S E I L H C 1,3 : 0 s 49 e 11 i Dei E y isa O if yoDei  (- ,-117)  (-3,  ) I y isa O or E or I if yoDei  C 2,1  [-62,-44] L H y isa O or I if yoDei  C 2,2  [-44, -3] C 2,1 : 2 e 4 i Dei E y isa O if yoDei  (- ,420)  (459,480)  (501,  ) I y isa O or E if yoDei  C 3,1  [420,459] L H y isa O or I if yoDei  C 3,2  [480,501] Continue this on clusters with OTHER + one class, so the hull fits tightely (reducing false positives), using diagonals? y isa OTHERif y o Dse  (- ,495)  (802,1061)  (2725,  ) y isa OTHER or S if y o Dse  C 1,1  [ 495, 802] y isa OTHER or Iif y o Dse  C 1,2  [1061,1270] y isa OTHER or Iif y o Dse  C 1,4  [2010,2725] y isa OTHER or E or I if y o Dse  C 1,3  [1270,2010 C 13 C 1,1 : D= y isa O if yoD  (- ,43)  (58,  ) L H y isa O|S if yoD  C 2,3  [43,58] C 2,3 : D= y isa O if yoD  (- ,23)  (44,  ) L H y isa O|S if yoD  C 3,3  [23,44] C 3,3 : D= y isa O if yoD  (- ,10)  (19,  ) L H y isa O|S if yoD  C 4,1  [10,19] C 4,1 : D= y isa O if yoD  (- ,1)  (6,  ) L H y isa O|S if yoD  C 5,1  [1,6] C 5,1 : D= y isa O if yoD  (- ,68)  (117,  ) L H y isa O|S if yoD  C 6,1  [68,117] C 6,1 : D= y isa O if yoD  (- ,54)  (146,  ) L H y isa O|S if yoD  C 7,1  [54,146] C 7,1 : D= y isa O if yoD  (- ,44)  (100,  ) L H y isa O|S if yoD  C 8,1  [44,100] C 8,1 : D= y isa O if yoD  (- ,36)  (105,  ) L H y isa O|S if yoD  C 9,1  [36,105] C 9,1 : D= y isa O if yoD  (- ,26)  (61,  ) L H y isa O|S if yoD  C a,1  [26,61] C a,1 : D= y isa O if yoD  (- ,12)  (91,  ) L H y isa O|S if yoD  C b,1  [12,91] C b,1 : D= y isa O if yoD  (- ,81)  (182,  ) L H y isa O|S if yoD  C c,1  [81,182] C c,1 : D= y isa O if yoD  (- ,71)  (137,  ) L H y isa O|S if yoD  C d,1  [71,137] C d,1 : D= y isa O if yoD  (- ,55)  (169,  ) L H y isa O|S if yoD  C e,1  [55,169] C e,1 : D= y isa O if yoD  (- ,39)  (127,  ) L H y isa O|S if yoD  C f,1  [39,127] C f,1 : D= y isa O if yoD  (- ,84)  (204,  ) L H y isa O|S if yoD  C g,1  [84,204] C g,1 : D= y isa O if yoD  (- ,10)  (22,  ) L H y isa O|S if yoD  C h,1  [10,22] C h,1 : D= y isa O if yoD  (- ,3)  (46,  ) L H y isa O|S if yoD  C i,1  [3,46] The amount of work yet to be done., even for only 4 attributes, is immense.. For each D, we should fit boundaries for each class, not just one class.  D, not only cut at minC o D, maxC o D but also limit the radial reach for each class (barrel analytics)? Note, limiting the radial reach limits all other directions [other than the D direction] in one step and therefore by the same amount. I.e., it limits all directions assuming perfectly round clusters). Think about Enron, some words (columns) have high count and others have low count. Our radial reach threshold would be based on the highest count and therefore admit many false positives. We can cluster directions (words) by count and limit radial reach differently for different clusters?? For 4 attributes, I count 77 diagonals*3 classes = 231 cases. How many in the Enron case with 10,000 columns? Too many for sure!! APPENDIX

Dot Product SPTS computation: X o D =  k=1..n X k D k /*Calc P XoD,i after P XoD,i-1 CarrySet=CAR i-1,i RawSet=RS i */ INPUT: CAR i-1,i, RS i ROUTINE: P XoD,i =RS i  CAR i-1,i CAR i,i+1 =RS i &CAR i-1,i OUTPUT: P XoD,i, CAR i,i CAR1 1,2  & P XoD,1 100  &  & 001 CAR2 2,3 100 P XoD,2 & 011 P XoD,3 000 CAR1 3, D D 1,1 D 1,0 1 1 D 2,1 D 2, X X 1 X 2 p 11 p p 21 p XoDXoDXoDXoD p XoD,3 p XoD,2 p XoD,1 p XoD,0 ( = (1 p 1,0 + 1 p p (1 p 1,0 1 p 1,1 + 1 p 2,1 ) + 1 p 2,0 + 1 p 2,1 ) + 1 p 2,1 ) + 1 p 2,0 ) P XoD,0 CAR1 0,1  & & P XoD,3 010 P XoD,4 Different data. 3 3 D D 1,1 D 1,0 1 1 D 2,1 D 2,0 1 1 ( = 22= 22= 22= (1 p 1,0 + 1 p p (1 p 1,0 1 p 1,1 + 1 p 2,1 ) + 1 p 2,0 + 1 p 2,1 ) + 1 p 2,1 ) + 1 p 2,0 ) X pTrees XoDXoDXoDXoD We have extended the Galois field, GF(2)={0,1}, XOR=add, AND=mult to pTrees. 011 P XoD,0 100 CAR1 0,1  &  000 & 101 CAR2 1, CAR1 2,3  & 010  & P XoD,  &  &  & 010&  &  &  & 010 P XoD,2 & = (2 1 p 1, p 1,0 ) (2 1 p 2, p 2,0 ) = 2 2 p 1,1 p 2, ( p 1,1 p 2,0 + p 2,1 p 1,0 ) p 1,0 p 2,0 X1*X2X1*X2X1*X2X1*X & p X 1 *X 2,0 &011&010 & p X 1 *X 2,3 010  & 000 p X 1 *X 2,2 010  & 001 p X 1 *X 2,1 SPTS multiplication: (Note, pTree multiplication = &) X X 1 X 2 p 11 p 10 p 21 p X1*X2X1*X2X1*X2X1*X p X 1 *X 2, p X 1 *X 2,2 p X 1 *X 2,1 p X 1 *X 2,0

Rank N-1 (X o D)=Rank 2 (X o D) D=x 1 D 1,1 D 1,0 0 1 D 2,1 D 2, X X 1 X 2 p 11 p p 21 p XoDXoDXoDXoD p3p3p3p3 p2p2p2p2 p1p1p1p1 p,0 RankK: p is what's left of K yet to be counted, initially p=K V is the RankKvalue, initially 0. For i=bitwidth+1 to 0 if Count(P&P i )  p { KVal=KVal+2 i ; P=P&P i }; else /* < p */{ p=p-Count(P&P i ); P=P&P' i }; 111 P=P&p 1 3  2 1* P 111 p1 p1 p1 p1n=1p=2011 P=p 0 &P 2  2 1*2 1 +1*2 0 =3 so -2x 1 o X = P 011 &p 0 n=0p=2 Rank N-1 (X o D)=Rank 2 (X o D) D=x 2 D 1,1 D 1,0 1 1 D 2,1 D 2, XoDXoDXoDXoD p3p3p3p3 p2p2p2p2 p1p1p1p1 p,0 101 P=P&p' 3 1<2 2-1=1 0* P 010 p3 p3 p3 p3n=3p=2101 P=p' 2 &P 0<1 1-0=1 0*2 3 +0* P 000 &p 2 n=2p=1 101 P=p 1 &P 2  1 0*2 3 +0*2 2 +1* P 101 &p 1 n=1p=1100 P=p 0 &P 1  1 0*2 3 +0*2 2 +1*2 1 +1*2 0 =3 so -2x 2 o X= -6 so -2x 2 o X= P 110 &p 0 n=0p=1 Rank N-1 (X o D)=Rank 2 (X o D) D=x 3 D 1,1 D 1,0 1 0 D 2,1 D 2, XoDXoDXoDXoD p3p3p3p3 p2p2p2p2 p1p1p1p1 p,0 011 P=P&p 2 2  2 1* P 011 p2 p2 p2 p2n=2p=2001 P=p' 1 &P 1<2 2-1=1 1*2 2 +0* P 110 &p 1 n=1p=2 001 P=p 0 &P 1  1 1*2 2 +0*2 1 +1*2 0 =5 so -2x 3 o X= -10 so -2x 3 o X= P 101 &p 0 n=0p=1Example: FAUST Oblique: X o D used in CCC, TKO, PLC and LARC) and (x-X) o (x-X) = -2X o x+x o x+X o X is used in TKO. = -2X o x+x o x+X o X is used in TKO. So in FAUST, we need to construct lots of SPTSs of the type, X dotted with a fixed vector, a costly pTree calculation (Note that X o X is costly too, but it is a 1-time calculation (a pre-calculation?). x o x is calculated for each individual x but it's a scalar calculation and just a read-off of a row of X o X, once X o X is calculated.. Thus, we should optimize the living he__ out of the X o D calculation!!! The methods on the previous seem efficient. Is there a better method? Then for TKO we need to computer ranks:

pTree Rank(K) computation: (Rank(N-1) gives 2 nd smallest which is very useful in outlier analysis?) X P 4,3 P 4,2 P 4,1 P 4, {0} {1} {0} {1} (n=3) c=Count(P&P 4,3 )= 3 < 6 p=6–3=3; P=P&P’ 4,3 masks off highest 3 (val  8) (n=2) c=Count(P&P 4,2 )= 3 >= 3 P=P&P 4,2 masks off lowest 1 (val  4) (n=1) c=Count(P&P 4,1 )=2 < 3 p=3-2=1; P=P&P' 4,1 masks off highest 2 (val  8-2=6 ) (n=0) c=Count(P&P 4,0 )=1 >= 1 P=P&P 4, {0}{1}{0}{1} RankKval=0; p=K; c=0; P=Pure1; /*Note: n=bitwidth-1. The RankK Points are returned as the resulting pTree, P*/ For i=n to 0 {c=Count(P&P i ); If (c>=p) {RankVal=RankVal+2 i ; P=P&P i }; else {p=p-c; P=P&P' i }; return RankKval, P; /* Above K=7-1=6 (looking for the Rank6 or 6 th highest vaue (which is also the 2 nd lowest value) */ Cross out the 0-positions of P each step. 5 P=MapRankKPts= ListRankKPts={2} * * * * = RankKval= Rank N-1 (X o D)=Rank 2 (X o D) D D 1,1 D 1,0 0 1 D 2,1 D 2, X X 1 X 2 p 11 p p 21 p XoDXoDXoDXoD p3p3p3p3 p2p2p2p2 p1p1p1p1 p,0 011 P=P&p 3 2  2 1* P=p' 2 &P 0<2 2-0=2 1*2 3 +0* P 011 p3 p3 p3 p3 011 P 100 &p 2 n=3p=2n=2p=2 011 P=p' 1 &P 0<2 2-0=2 1*2 3 +0*2 2 +0* P 100 &p 1 n=1p=2 011 P=p 0 &P 2  2 1*2 3 +0*2 2 +0*2 1 +1*2 0 =9 011 P 011 &p 0 n=0p=2

p6' 1 0 5/64 [0,64) p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p /64 [64,128) p6 0 1 p6 0 1 p6 0 1 p6 0 1 p6 0 1 p6 0 1 p6 0 1 Y y1 y2 y1 1 1 y2 3 1 y3 2 2 y4 3 3 y5 6 2 y6 9 3 y y y ya 13 4 pb 10 9 yc yd 9 11 ye yf 7 8 yofM p6 0 1 p p p p p p p6' 1 0 p5' p4' p3' p2' p1' p0' p3' [0,8) p [8,16) p3' [16,24) p [24,32) p3' [32,40) p [40,48) p3' [48,56) p [56,64) p3' p p3' [80,88) p [88,96) p3' [96,104) p [194,112) p3' [112,120) p [120,128) p4' /16[0,16) p4' p /16[16,32) p p4' [32,48) p4' p [48,64) p p4' [64,80) p4' p [80,96) p p4' [96,112) p4' p [112,128) p p5' /32[0,32) p5' p5' p5' p5' /32[64,96) p5' p5' p5' p /32[32,64) p p p p ¼[96,128) p p p f= UDR Univariate Distribution Revealer (on Spaeth:) Pre-compute and enter into the ToC, all DT(Y k ) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector. Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these OneCts should be repeated everywhere (e.g., in every DT). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in fact are often all we need to know about the pTree to get the answers we are after.) depthDT(S)  b≡BitWidth(S) h=depth of a node k=node offset Node h,k has a ptr to pTree{x  S | F(x)  [k2 b-h+1, (k+1)2 b-h+1 )} and its 1count applied to S, a column of numbers in bistlice format (an SpTS), will produce the DistributionTree of S DT(S) 15depth=h=0 depth=h=1 node 2,3 [96.128)

So let us look at ways of doing the work to calculate As we recall from the below, the task is to ADD bitslices giving a result bitslice and a set of carry bitslices to carry forward X o D =  k=1..n X k *D k 3 3 D D 1,1 D 1,0 1 1 D 2,1 D 2,0 1 1 ( = p 1,0 + 1 p p p 1,0 1 p 1,1 (( + 1 p 2,1 ) + 1 p 2,0 + 1 p 2,1 ) + 1 p 2,1 ) + 1 p 2,0 ) X pTrees ( = p 1,0 + 1 p p p 1,0 1 p 1,1 (( + 1 p 2,1 ) + 1 p 2,0 + 1 p 2,1 ) + 1 p 2,1 ) + 1 p 2,0 ) I believe we add by successive XORs and the carry set is the raw set with one 1-bit turned off iff the sum at that bit is a 1-bit Or we can characterize the carry as the raw set minus the result (always carry forward a set of pTrees plus one negative one). We want a routine that constructs the result pTree from a positive set of pTrees plus a negative set always consisting of 1 pTree. The routine is: successive XORs across the positive set then XOR with the negative set pTree (because the successive pset XOR gives us the odd values and if you subtract one pTree, the 1-bits of it change odd to even and vice versa.): /*For P XoD,i (after P XoD,i-1 ). CarrySetPos=CSP i-1,i CarrySetNeg=CSN i-1,i RawSet=RS i CSP -1 =CSN -1 =  */ INPUT: CSP i-1, CSN i-1, RS i ROUTINE: P XoD,i =RS i  CSP i-1,i  CSN i-1,i CSN i,i+1 =CSN i-1,i  P XoD,i ; CSP i,i+1 =CSP i-1,i  RS i-1 ; OUTPUT: P XoD,i, CSN i,i+1 CSP i,i  RS = 699 XoDXoDXoDXoD P XoD,0 CSP -1,0 =CSN -1,0 =  RS 1  CSN 0,1 = CSN -1.0  P XoD,0  000 = P XoD,     CSP 0,1 = CSP -1,0  RS

X o D =  k=1..n X k *D k  k=1..n ( = 2 2B + 2 2B-1 D k,B p k,B-1 + D k,B-1 p k,B + D k,B-1 p k,B + 2 2B-2 D k,B p k,B-2 + D k,B-1 p k,B-1 + D k,B-1 p k,B-1 + D k,B-2 p k,B + 2 2B-3 D k,B p k,B-3 + D k,B-1 p k,B-2 + D k,B-1 p k,B-2 + D k,B-2 p k,B-1 +D k,B-3 p k,B D k,B p k,0 + D k,2 p k,1 + D k,2 p k,1 + D k,1 p k,2 +D k,0 p k, D k,2 p k,0 + D k,1 p k,1 + D k,1 p k,1 + D k,0 p k, D k,1 p k,0 + D k,0 p k,1 + D k,0 p k, D k,0 p k,0 D k,B p k,B  k=1..n (  k=1..n ( X o D=  k=1,2 X k *D k with pTrees: q N..q 0, N=2 2B+roof(log 2 n)+2B+1 N=2 2B+roof(log 2 n)+2B+1  k=1..2 ( = D k,1 p k,0 + D k,0 p k,1 + D k,0 p k, D k,0 p k,0 D k,1 p k,1  k=1..2 ( XpTrees D D 1,1 D 1,0 0 1 D 2,1 D 2,0 1 0 B=1 ( = D 1,1 p 1,0 + D 1,0 p 11 + D 1,0 p D 1,0 p 1,0 D 1,1 p 1,1 (( + D 2,1 p 2,1 ) + D 2,1 p 2,0 + D 2,0 p 2,1 ) + D 2,0 p 2,1 ) + D 2,0 p 2,0 ) ( = D 1,1 p 1,0 + D 1,0 p 11 + D 1,0 p D 1,0 p 1,0 D 1,1 p 1,1 (( + D 2,1 p 2,1 ) + D 2,1 p 2,0 + D 2,0 p 2,1 ) + D 2,0 p 2,1 ) + D 2,0 p 2,0 ) q 0 = p 1,0 = no carry 110 q 1 = carry 1 = q 1 = carry 1 = q 2 =carry 1 = no carry D D 1,1 D 1,0 1 1 D 2,1 D 2,0 1 1 q 0 = carry 0 = ( = p 1,0 + 1 p p p 1,0 1 p 1,1 (( + 1 p 2,1 ) + 1 p 2,0 + 1 p 2,1 ) + 1 p 2,1 ) + 1 p 2,0 ) q 1 =carry 0 +raw 1 = carry 1 = A carryTree is a valueTree or vTree, as is the rawTree at each level (rawTree = valueTree before carry is incl.). In what form is it best to carry the carryTree over? (for speediest of processing?) 1. multiple pTrees added at next level? (since the pTrees at the next level are in that form and need to be added) 2. carryTree as a SPTS, s 1 ? (next level rawTree=SPTS, s 2, then s 10 & s 20 = q next_level and carry next_level ? q 2 =carry 1 +raw 2 = carry 2 = q 3 =carry 2 = carry 3 =  q 3 =carry 2 = carry 3 =  111 CCC Clusterer If DT (and/or DUT) not exceeded at C, partition C further by cutting at each gap and PCC in C o D For a table X(X 1...X n ), the SPTS, X k *D k is the column of numbers, x k *D k. X o D is the sum of those SPTSs,  k=1..n X k *D k X k *D k = D k  b 2 b p k,b = 2 B D k p k,B D k p k,0 = D k (2 B p k,B p k,0 ) = (2 B p k,B p k,0 ) (2 B D k,B D k,0 ) + 2 2B-1 (D k,B-1 p k,B D k,0 p k,0 = 2 2B ( D k,B p k,B ) +D k,B p k,B-1 ) So, DotProduct involves just multi-operand pTree addition. (no SPTSs and no multiplications) Engineering shortcut tricka would be huge!!!

Question: Which primitives are needed and how do we compute them? X(X 1...X n ) D 2 NN yields a 1.a-type outlier detector (top k objects, x, dissimilarity from X-{x}). D 2 NN = each min[D 2 NN(x)] (x-X)o(x-X)=  k=1..n (x k -X k )(x k -X k )=  k=1..n (  b=B..0 2 b x k,b -2 b p k,b )( (  b=B..0 2 b x k,b -2 b p k,b ) =  k=1..n (  b=B..0 2 b (x k,b -p k,b ) ) ( ----a k,b a k,b ---  b=B..0 2 b (x k,b -p k,b ) ) (2 B a k,B + 2 B-1 a k,B a k, a k, 0 ) (2 B a k,B + 2 B-1 a k,B a k, a k, 0 ) =k=k=k=k ( 2 2B a k,B a k,B + 2 2B-1 ( a k,B a k,B-1 + a k,B-1 a k,B ) + { 2 2B a k,B a k,B-1 } 2 2B-2 ( a k,B a k,B-2 + a k,B-1 a k,B-1 + a k,B-2 a k,B ) + { 2B-1 a k,B a k,B B-2 a k,B B-3 ( a k,B a k,B-3 + a k,B-1 a k,B-2 + a k,B-2 a k,B-1 + a k,B-3 a k,B ) + { 2 2B-2 ( a k,B a k,B-3 + a k,B-1 a k,B-2 ) } 2 2B-4 (a k,B a k,B-4 +a k,B-1 a k,B-3 +a k,B-2 a k,B-2 +a k,B-3 a k,B-1 +a k,B-4 a k,B )... {2 2B-3 ( a k,B a k,B-4 +a k,B-1 a k,B-3 )+2 2B-4 a k,B-2 2 } =2 2B ( a k,B 2 + a k,B a k,B-1 ) + 2 2B-1 ( a k,B a k,B-2 ) + 2 2B-2 ( a k,B B-3 ( a k,B a k,B-4 +a k,B-1 a k,B-3 ) + 2 2B-4 a k,B a k,B a k,B-3 + a k,B-1 a k,B-2 ) + D2NN=multi-op pTree adds? When x k,b =1, a k,b =p' k,b and when x k,b =0, a k,b = -p k.b So D2NN just multi-op pTree mults/adds/subtrs? Each D2NN row (each x  X) is separate calc. Should we pre-compute all p k,i *p k,j p' k,i *p' k,j p k,i *p' k,j ANOTHER TRY! X(X 1...X n ) RKN (Rank K Nbr), K=|X|-1, yields1.a_outlier_detector (top y dissimilarity from X-{x}). Install in RKN, each RankK(D2NN(x)) (1-time construct but for. e.g., 1 trillion x s ? |X|=N=1T, slow. Parallelization?)  x  X, the square distance from x to its neighbors (near and far) is the column of number (vTree or SPTS) d 2 (x,X)= (x-X) o (x-X)=  k=1..n |x k -X k | 2 =  k=1..n (x k -X k )(x k -X k )=  k=1..n (x k 2 -2x k X k +X k 2 ) = -2  k x k X k +  k x k 2 +  k X k 2 = -2x o X + x o x + X o X  k=1..n  i=B..0,j=B..0 2 i+j p k,i p k,j  i,j 2 i+j  k p k,i p k,j 1. precompute pTree products within each k 2. Calculate this sum one time (independent of the x) 3. Pick this from XoX for each x and add to Add 3 to this -2x o X cost is linear in |X|=N.x o x cost is ~zero. X o X is 1-time -amortized over x  X (i.e., =1/N) or precomputed The addition cost, -2x o X + x o x + X oX, is linear in |X|=N So, overall, the cost is linear in |X|=n. Data parallelization? No! (Need all of X at each site.) Code parallelization? Yes! (After replicating X to all sites, Each site creates/saves D2NN for its partition of X, then sends requested number(s) (e.g., RKN(x) ) back.

LARC on IRIS150-3 Here we use the diagonals. d=e 1 p=AVGs, L=(X-p) o d S E I R(p,d,X) S E I [43,49) S(16) [49,58) E(24)I(6) 0 S(34) [70,79] I(12) [58,70) E(26) I(32) Only overlap L=  [58,70), R  [792,1557] (E(26), I(5)) With just d=e 1, we get good hulls using LARC: While  I p,d containing >1class, for next (d,p) X o d-p o dX o X+p o p-2X o p-L 2 create L(p,d)  X o d-p o d, R(p,d)  X o X+p o p-2X o p-L 2 1.  MnCls(L), MxCls(L), create a linear boundary. 2.  MnCls(R), MxCls(R).create a radial boundary. 3. Use R&C k to create intra-C k radial boundaries H k =  {I | L p,d includes C k } R & L I(1) I(42) E(50) I(7) (36,7) (11)