Method-1: Find a furthest point from M, f0 = MaxPt[SpS((x-M)o(x-M))].

Slides:



Advertisements
Similar presentations
CS 450: COMPUTER GRAPHICS LINEAR ALGEBRA REVIEW SPRING 2015 DR. MICHAEL J. REALE.
Advertisements

Radial Basis Function (RBF) Networks
Dividing Polynomials.
Module 04: Algorithms Topic 07: Instance-Based Learning
Process Improvement with Solitaire Using the PC Solitaire game to learn basic (and advanced) techniques of Process Improvement (So easy, even a can do.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CSE554Fairing and simplificationSlide 1 CSE 554 Lecture 6: Fairing and Simplification Fall 2012.
How To Do NPV’s ©2007 Dr. B. C. Paul Note – The principles covered in these slides were developed by people other than the author, but are generally recognized.
1 p1 p2 p7 2 p3 p5 p8 3 p4 p6 p9 4 pa pf 9 pb a pc b pd pe c d e f a b c d e f X x1 x2 p1 1 1 p2 3 1 p3 2 2 p4 3 3 p5 6 2 p6.
Class 26: Question 1 1.An orthogonal basis for A 2.An orthogonal basis for the column space of A 3.An orthogonal basis for the row space of A 4.An orthogonal.
G = (  (u 1,n)  r e u 1,n M n,...,  (u L,n)  r e u L,n M n,...,  (v,m 1 )  r e v,m 1 U v,...,  v,m L  R e (v,m L ) U v ) d/dt( (U u +tG u )(M m.
6-hop myrrh example (from Damian). Market agency targeting advertising to friends of customers: Entities: 1. advertisements 2. markets 3. merchants 4.
Level-0 FAUST for Satlog(landsat) is from a small section (82 rows, 100 cols) of a Landsat image: 6435 rows, 2000 are Tst, 4435 are Trn. Each row is center.
Enclose clusters with gaps using functionals (ScalarPTreeSets or SPTSs): C p,d (x)=(x-p) o d /  (x-p) o (x-p) Conical Separating clusters by cone gaps.
Higher Dimensions. x Let's say we use a pencil to mark a point on paper. x is this point. We pick a direction and move the pencil along this direction.
FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
5(I,C) (I,C) (I,C) (I,C)
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Mr Barton’s Maths Notes
Advanced Algorithms Analysis and Design
Linear Algebra Review.
MCR cluster (Midlines of Circumscribing Coordinate Rectangle)
[ ] Find a furthest point from M, f0 = MaxPt[SpS((x-M)o(x-M))].
Clustering CSC 600: Data Mining Class 21.
CS479/679 Pattern Recognition Dr. George Bebis
Hash table CSC317 We have elements with key and satellite data
As the last CC-list represents Maximum Compatible Classes we conclude:
Sparse Gap Revealer Width  24 Count  2
Dividing Polynomials.
Mr F’s Maths Notes Number 7. Percentages.
Opening Bell Opportunity Evaluator (OBOE): The NYSE Bell is at 9AM EST (I think). At 8:53AM EST (EST=UTC-5), create Trends Around the Globe (TAG) pTrees.
Research of William Perrizo, C.S. Department, NDSU
Data Mining (and machine learning)
Fitting Curve Models to Edges
Vectors and the Geometry of Space
Quantum One.
Hidden Markov Models Part 2: Algorithms
[ ] Find a furthest point from M, f0 = MaxPoint[SpS((x-M)o(x-M))].
PDR PTreeSet Distribution Revealer
(Part 3-Floating Point Arithmetic)
עידן שני ביה"ס למדעי המחשב אוניברסיטת תל-אביב
Data Structures Review Session
SpS[ XX, d2(x=(x1,x2), y=(x3,x4) ] = SpS[ XX, (x1-x3)2+(x2-x4)2 ] =
Basic Counting.
From: Perrizo, William Sent: Thursday, February 02, :45 AM To: 'Mark Silverman' The Satlog (Landsat Satellite) data set from UCI Machine Learning.
Shortest Path Trees Construction
Chapter 3 Linear Algebra
Functional Analytic Unsupervised and Supervised data mining Technology
Mr Barton’s Maths Notes
Example: Sample exam scores, n = 20 (“sample size”) {60, 60, 70, 70, 70, 70, 70, 70, 70, 70, 80, 80, 80, 80, 90, 90, 90, 90, 90, 90} Because there are.
Counting Techniques and Some Other Math Team Strategies
I’m working on implementing this…  here’s where I am so far. 
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Regression Forecasting and Model Building
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Chapter 7: Transformations
But, depending on defintions 3 count=1_thin_intervals,
pTree-k-means-classification-sequential (pkmc-s)
Linear Discrimination
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
PAj>c=Pj,m om...ok+1Pj,k oi is AND iff bi=1, k is rightmost bit position with bit-value "0", ops are right binding. c = bm ...
SVMs for Document Ranking
Error Correction Coding
Data Mining CSCI 307, Spring 2019 Lecture 23
Using the Rule Normal Quantile Plots
Presentation transcript:

Method-1: Find a furthest point from M, f0 = MaxPt[SpS((x-M)o(x-M))]. Do M round gap analysis using SpS((x-M)o(x-M)). f2 x' Do f0 round gap analysis using SpS((x-f0)o(x-f0)). d0≡(M-f0)/|M-f0|. f1 x Do d0 linear gap analysis on SpS((x-f0)od0). x-f0 Find a furthest pt from f0, f1MaxPt[SpS((x-f0)o(x-f0))]). d1≡(f1-f0)/|f1-f0|. Do d1 linear gap analysis on SpS((x-f0)od1). Do f1 round gap analysis on SpS((x-f1)o(x-f1)). d2 ((x-f0)od1)d1 d1 = X' ≡ space perpendicular to d1. Projection of x-f0 onto d1 is ≡ - x' (x-f0) ((x-f0)od1)d1 x'ox' [ - (x-f0) ((x-f0)od1)d1 = ] o d1 = (x-f0)o(x-f0) - ((x-f0)od1)2 f0 SpS(x'ox') = SpS[(x-f0)o(x-f0)) - SpS[(x-f0)od1]2 Let f2MaxPt[SpS(x'ox')] d1 d2≡f2'/|f2'|=[(f2-f0)-((f2-f0)od1)d1]/|f2'| d2od1=[(f2-f0)od1-((f2-f0)od1)(d1od1) ]/|f2'|=0 x'' ≡ x' - (x'od2)d2 x''od1 = x'od1- (x'od2)(d2od1) = (x-f0)od1 - ((x-f0)od1)(d1od1) = 0 x''od2 = x'od2- (x'od2)(d2od2) = 0 x(k) ≡ x(k-1) - (x(k-1)odk)dk where fk MaxPt[SpS(x(k-1)ox(k-1))] and dk ≡ fk-1' / |fk-1'| {dk} forms an orthonormal basis. Do fk round gap analysis on SpS[(x-fk)o(x-fk)]. Do dk linear gap analysis on SpS[(x-f0)odk]. Linear gap anal. incl: Coordinate gap analysis. xo(a1...an) ) or (x-p)o(a1...an) ) or i=1..nai*(x-p)i2 ) (len2 is sub-case.) truncated Taylor series, k=1..Nbk*(i=1..nai(x-p)ik) Square gradient-like length: i=1..n(MaxVal(x(k-1)ox (k-1))). x(k-1)-MaxLength itself: MaxVal(x(k-1)ox (k-1)). Each of these defining dk+1 ≡ (fk+1-f0) / |fk+1-f0| rather than dk+1 ≡ fk-1' / |fk-1'|

SL SW PL PW set 51 35 14 2 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 0 set 49 30 14 2 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 set 47 32 13 2 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 set 46 31 15 2 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 set 50 36 14 2 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 set 54 39 17 4 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 0 0 0 1 0 0 set 46 34 14 3 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1 set 50 34 15 2 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 0 set 44 29 14 2 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 set 49 31 15 1 0 1 1 0 0 0 1 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 set 54 37 15 2 0 1 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 set 48 34 16 2 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 set 48 30 14 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 set 43 30 11 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 1 set 58 40 12 2 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 set 57 44 15 4 0 1 1 1 0 0 1 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 set 54 39 13 4 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 0 1 0 0 set 51 35 14 3 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1 set 57 38 17 3 0 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 set 51 38 15 3 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 set 54 34 17 2 0 1 1 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 set 51 37 15 4 0 1 1 0 0 1 1 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 0 set 46 36 10 2 0 1 0 1 1 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 set 51 33 17 5 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 set 48 34 19 2 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 set 50 30 16 2 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 set 50 34 16 4 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 set 52 35 15 2 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 set 52 34 14 2 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 set 47 32 16 2 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 set 48 31 16 2 0 1 1 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 set 54 34 15 4 0 1 1 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 0 set 52 41 15 1 0 1 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 set 55 42 14 2 0 1 1 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 set 50 32 12 2 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 set 55 35 13 2 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 0 set 44 30 13 2 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 set 51 34 15 2 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 0 set 50 35 13 3 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1 set 45 23 13 3 0 1 0 1 1 0 1 0 1 0 1 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1 set 44 32 13 2 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 set 50 35 16 6 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 set 51 38 19 4 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 set 48 30 14 3 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1 set 51 38 16 2 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 set 46 32 14 2 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 set 53 37 15 2 0 1 1 0 1 0 1 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 set 50 33 14 2 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 ver 70 32 47 14 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 ver 64 32 45 15 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 1 ver 69 31 49 15 1 0 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 1 0 0 1 1 1 1 ver 55 23 40 13 0 1 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 ver 65 28 46 15 1 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1 ver 57 28 45 13 0 1 1 1 0 0 1 0 1 1 1 0 0 0 1 0 1 1 0 1 0 0 1 1 0 1 ver 63 33 47 16 0 1 1 1 1 1 1 1 0 0 0 0 1 0 1 0 1 1 1 1 0 1 0 0 0 0 ver 49 24 33 10 0 1 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 0 ver 66 29 46 13 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 1 1 0 1 ver 52 27 39 14 0 1 1 0 1 0 0 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 1 0 ver 50 20 35 10 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 ver 59 30 42 15 0 1 1 1 0 1 1 0 1 1 1 1 0 0 1 0 1 0 1 0 0 0 1 1 1 1 ver 60 22 40 10 0 1 1 1 1 0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 ver 61 29 47 14 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 ver 56 29 36 13 0 1 1 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 1 1 0 1 ver 67 31 44 14 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 0 ver 56 30 45 15 0 1 1 1 0 0 0 0 1 1 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 ver 58 27 41 10 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 ver 62 22 45 15 0 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 ver 56 25 39 11 0 1 1 1 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 ver 59 32 48 18 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 ver 61 28 40 13 0 1 1 1 1 0 1 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 ver 63 25 49 15 0 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 1 ver 61 28 47 12 0 1 1 1 1 0 1 0 1 1 1 0 0 0 1 0 1 1 1 1 0 0 1 1 0 0 ver 64 29 43 13 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 1 ver 66 30 44 14 1 0 0 0 0 1 0 0 1 1 1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 ver 68 28 48 14 1 0 0 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 ver 67 30 50 17 1 0 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 ver 60 29 45 15 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1 ver 57 26 35 10 0 1 1 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 1 0 ver 55 24 38 11 0 1 1 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 1 ver 55 24 37 10 0 1 1 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 ver 58 27 39 12 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 ver 60 27 51 16 0 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 1 0 0 0 0 ver 54 30 45 15 0 1 1 0 1 1 0 0 1 1 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 ver 60 34 45 16 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 0 1 0 0 0 0 ver 67 31 47 15 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 ver 63 23 44 13 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 1 ver 56 30 41 13 0 1 1 1 0 0 0 0 1 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 ver 55 25 40 13 0 1 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 1 ver 55 26 44 12 0 1 1 0 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 1 0 0 ver 61 30 46 14 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 0 1 1 1 0 0 0 1 1 1 0 ver 58 26 40 12 0 1 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 ver 50 23 33 10 0 1 1 0 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 ver 56 27 42 13 0 1 1 1 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 0 1 1 0 1 ver 57 30 42 12 0 1 1 1 0 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 ver 57 29 42 13 0 1 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 1 1 0 1 ver 62 29 43 13 0 1 1 1 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 1 ver 51 25 30 11 0 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 1 ver 57 28 41 13 0 1 1 1 0 0 1 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 vir 63 33 60 25 0 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 0 0 1 vir 58 27 51 19 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 1 0 0 1 1 vir 71 30 59 21 1 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 1 vir 63 29 56 18 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 0 0 0 0 1 0 0 1 0 vir 65 30 58 22 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 vir 76 30 66 21 1 0 0 1 1 0 0 0 1 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 vir 49 25 45 17 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 vir 73 29 63 18 1 0 0 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 0 1 0 0 1 0 vir 67 25 58 18 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1 0 vir 72 36 61 25 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1 vir 65 32 51 20 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 vir 64 27 53 19 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 vir 68 30 55 21 1 0 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 vir 57 25 50 20 0 1 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 1 0 0 vir 58 28 51 24 0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 1 0 0 1 1 0 1 1 0 0 0 vir 64 32 53 23 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 1 vir 65 30 55 18 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 vir 77 38 67 22 1 0 0 1 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 1 1 0 vir 77 26 69 23 1 0 0 1 1 0 1 0 1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 1 1 vir 60 22 50 15 0 1 1 1 1 0 0 0 1 0 1 1 0 0 1 1 0 0 1 0 0 0 1 1 1 1 vir 69 32 57 23 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 1 1 vir 56 28 49 20 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0 vir 77 28 67 20 1 0 0 1 1 0 1 0 1 1 1 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0 vir 63 27 49 18 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 0 vir 67 33 57 21 1 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 0 1 vir 72 32 60 18 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 vir 62 28 48 18 0 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 vir 61 30 49 18 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0 vir 64 28 56 21 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 1 vir 72 30 58 16 1 0 0 1 0 0 0 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 0 vir 74 28 61 19 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 1 1 vir 79 38 64 20 1 0 0 1 1 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0 vir 64 28 56 22 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0 vir 63 28 51 15 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 vir 61 26 56 14 0 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0 vir 77 30 61 23 1 0 0 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 vir 63 34 56 24 0 1 1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 1 1 0 0 0 vir 64 31 55 18 1 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 0 vir 60 30 18 18 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 vir 69 31 54 21 1 0 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1 0 0 1 0 1 0 1 vir 67 31 56 24 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 0 0 0 vir 69 31 51 23 1 0 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 1 1 1 vir 68 32 59 23 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 1 1 vir 67 33 57 25 1 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 1 0 0 1 vir 67 30 52 23 1 0 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 1 vir 63 25 50 19 0 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 1 vir 65 30 52 20 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 vir 62 34 54 23 0 1 1 1 1 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 vir 59 30 51 18 0 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 0 t1 20 30 37 12 0 0 1 0 1 0 0 0 1 1 1 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 t2 58 5 37 12 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 t3 58 30 2 12 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 t4 58 30 37 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 t12 20 5 37 12 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 t13 20 30 2 12 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 t14 20 30 37 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 t23 58 5 2 12 0 1 1 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 t24 58 5 37 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 t34 58 30 2 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 t123 20 5 2 12 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 t124 20 5 37 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 t134 20 30 2 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 t234 58 5 2 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 tall 20 5 2 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 b1 90 30 37 12 1 0 1 1 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 b2 58 60 37 12 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 b3 58 30 80 12 0 1 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 1 b4 58 30 37 40 0 1 1 1 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 b12 90 60 37 12 1 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 b13 90 30 80 12 1 0 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 1 b14 90 30 37 40 1 0 1 1 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 b23 58 60 80 12 0 1 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 1 b24 58 60 37 40 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 b34 58 30 80 40 0 1 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 b123 90 60 80 12 1 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 1 b124 90 60 37 40 1 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 b134 90 30 80 40 1 0 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 b234 58 60 80 40 0 1 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 ball 90 60 80 40 1 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 Before adding the new tuples: MINS 43 20 10 1 MAXS 79 44 69 25 MEAN 58 30 37 12 same after additions. 1 2 3 4 5 6 7 8 9 10 20 30 40 50

Summarizing, the methodology is to: DISTANCES t123 b234 tal b134 b123 0.00 106.48 12.00 111.32 118.36 106.48 0.00 110.24 43.86 42.52 12.00 110.24 0.00 114.93 118.97 111.32 43.86 114.93 0.00 41.04 118.36 42.52 118.97 41.04 0.00 All outliers! M RndGp>4 1 53 b13 0 58 t123 0 59 b234 0 59 tal 0 60 b134 1 61 b123 0 67 ball f0=t123 RnGp>4 1 0 t123 0 25 t13 1 28 t134 0 34 set42 ... 1 103 b23 0 108 b13 SubClust-1 f1=ver49 RdGp>4 none SubClust-1 f1=ver49 LnGp>4 none Summarizing, the methodology is to: 1. Choose a point, f0 (chosen for high outlier potential? e.g., furthest from mean, M?) 2. Do f0-round-gap outlier analysis (+ subcluster anal?) 3. Let f1 be s.t. no x is further away from f0 (in some direction) than f1 (so that all d1 dot products are  0) 4. Do f1-round-gap outlier analysis (+ subclust anal?). 5. Do d1-linear-gap analysis, where d1≡ f0-f1 / |f0-f1|. 6. Let f2 s.t. no x is further away (in some direction) from the d1-line than f2 7. Do f2-round-gap analysis. 8. Do d2-linear-gap-anal, d2 ≡ f0-f2 - (f0-f2)od1 / length ... SubClust-1 f0=b2 RnGp>4 1 0 b2 0 28 ver36 SubClust-2 f0=t3 RnGp>4 none f0=b23 RnGp>4 1 0 b23 0 30 b3 ... 1 84 t34 0 95 t23 0 96 t234 SubClust-1 f0=b3 RnGp>4 1 0 b3 0 23 vir8 ... 1 54 b1 0 62 vir39 SubClust-2 f0=t3 LinGap>4 1 0 t3 0 12 t34 SubClust-2 f0=t34 LinGap>4 1 0 t34 0 13 set36 f0=b124 RnGp>4 1 0 b124 0 28 b12 0 30 b14 1 32 b24 0 41 vir10 ... 1 75 t24 1 81 t1 1 86 t14 1 93 t12 0 98 t124 b12 b14 b24 0.00 41.04 42.52 41.04 0.00 43.86 42.52 43.86 0.00 All are outliers again! SubClust-1 f0=t24 RnGp>4 1 0 t24 1 12 t2 0 20 ver13 SubClust-2 f0=set16 LnGp>4 none SubClust-1 f0=b1 RnGp>4 1 0 b1 0 23 ver1 SubClust-2 f1=set42 RdGp>4 none SubClust-1 f0=ver19 RnGp>4 none f0=b34 RnGp>4 1 0 b34 0 26 vir1 ... 1 66 vir39 0 72 set24 1 83 t3 0 88 t34 SubClust-1 f0=ver19 LinGp>4 none SubClust-2 f1=set42 LnGp>4 none Observe that SubClust-2 consist precisely of the 50 setosa iris samples! Likely f2,f3 and f4 analysis will not find none. SubClust-1 SubClust-2

Method-2 f=(mx1,mn2,mn3,mn4) RdGp>4 SubC-2 1 42 t3 0 47 set37 ... 1 54 set14 0 82 t14 0 83 t1 f=(mx1,mn2,mn3,mn4) SubC-1 1 77 vir18 1 83 b24 1 89 b3 0 101 b23 f=(mn1,mn2,mn3,mx4) SubC-1 none f=MiVector RdGp>4 1 10 tal 1 16 t123 0 25 t124 ... 1 96 b23 0 101 b124 0 101 b13 1 103 b234 1 108 b134 1 113 b123 0 119 ball g=(mx1,mx2,mx3,mn4) SubC-1 0 36 vir32 1 36 vir18 0 42 vir6 ... 1 73 ver44 0 78 t2 g=(mn1,mx2,mx3,mx4) SubC-1 1 48 b34 0 57 vir1 g=(mn1,mx2,mx3,mx4) RdGp>4 SubC-2 none LnG>4 SbC1 0 53 ver49 0 54 ver11 0 54 ver44 1 54 ver8 0 59 ver32 ver49 ver11 ver44 ver8 0.0 7.2 3.9 3.9 7.2 0.0 3.6 4.6 3.9 3.6 0.0 1.4 3.9 4.6 1.4 0.0 none are separated by 4 from all others LnGp>4 SubClus-1 none LnGp>4 SubClus-2 none f=(mn1,mx2,mn3,mn4) SubC-1 none g=MaxVector RdGp>4 1 92 t23 0 97 t234 0 97 t12 0 101 t124 0 102 t13 0 105 t134 DISTANCES t234 t12 t124 t13 t134 0.00 53.04 51.66 47.04 45.49 53.04 0.00 12.00 43.01 44.65 51.66 12.00 0.00 44.65 43.01 47.04 43.01 44.65 0.00 12.00 45.49 44.65 43.01 12.00 0.00 g=(mx1,mn2,mx3,mx4) SubC-1 none LnGp>4 SubClus-1 none LnGp>4 SubClus-1 none x4Gp>4 SubCl-1 1 0 t4 0 10 ver18 f=(mn1,mn2,mx3,mn4) SubC-1 1 74 b4 1 80 vir39 1 87 b1 1 95 b14 0 100 b12 f=(mn1,mn2,mx3,mn4) RdGp>4 SubC-2 none Finds all 30 added outliers (but they were added as "circum-corners"); 4 virginica, 1 setosa; 1 versicolor outliers. d=(f-g) / len LnGp>4 1 25 t23 0 30 set14 g=(vmx1,vmx2,vmn3,vmx4) RdGp>4 SubC-2 none f=(MnV1,mx2,mn3,mn4) RdGp>4 none Method-2: f,g opposite corners of circumscribing box: f = MaxVectorX≡(maxXx1..maxXxn) LnGp>4 SubClus-2 none g=(mx1,mn2,mx3,mx4) RdGp>4 32 ver19 ... 1 77 vir39 1 82 b2 0 87 set21... 97 t34 g=(vmx1,vmx2,vmn3,vmx4) SubC-1 1 56 b4 0 62 ver16 ... 1 78 vir19 0 83 t24 SubClus1 g≡MinVectorX≡(minXx1...minXxn) f=(mn1,mn2,mn3,mx4) RdGp>4 SubC-2 none Then sequence thru other opposite corner pairs (d ortho-normal iff cube) SubClus2 g=(mx1,mx2,mx3,mn4) RdGp>4 SubC-2 none Advantages? No calculation for f, g. Calc SpS(x-f|g)o(x-f|g) for round-gap-analysis. Calc SpS(do(x-f|g)) for Linear gap analysis (if it is deemed productive) LnGp>4 SubClus-1 none LnGp>4 SubClus-2 none In fact, Sub-Cluster-2 consist precisely of the remaining 49 Setosa iris samples. LnGp>4 SubClus-1 none LnGp>4 SubClus-2 1 41 set19 0 96 t34

In Method-2 we used, as our projection lines, the diagonals of the circumscribing coordinate rectangle. In method 3.1 we used a circumscribing rectangle in which the corners are actual points from X and the diagonals are diameters. DEFINITIONS: Given an aggregate function, A: 2RR (e.g., max, min, median, mean) and VRn, with Vk≡projk(V)k=1..n  R1, define AVector(V) ≡ ( A(V1), A(V2), ..., A(Vn) ) and call it "the A-Vector or the Vector of As". E.g., MinVector, MaxVector, MedianVector, MeanVector). Each of the first 3 is actually a "RankVector" for the right choice of Rank. E.g., MinVector(V)=RankVector1(V), Max(V)=RankVector|V|(V), MedianVector(V)=VectoRank|V|/2 where, as is customary, if |V| is even, Rank|V|/2≡ (Rank(|V|-1)/2+Rank(|V|+1))/2). Other [non-Rank] Vectors include, SumVector, StdVector and DiameterVector. In the previous Method-2 example, I just picked some diagonals. Ideally we should sequence through the main diagonals first and after that, possibly the sub-main diagonals, then the sub-sub-main diagonals... What are these? 001 101 Let b' be the bit complement of b. The [4] main 3D diagonals run from b1b2b3 to b1'b2'b3' and can be sequenced by: b1=0 and b2b3 = 00, 01, 10, 11. The [8] main 4D diags: b1=0 and b2b3b4=000,001,010,011,100,101,110,111 etc. 011 111 1000 1100 1010 1001 1111 1110 1011 1101 0000 0100 0010 0001 0111 0110 0011 0101 x y z 000 100 010 110 Next, we redo Method-2 using the eight main diagonals in the order given (doing round-gap-analysis first with fk,1=0 and gk,1=1 (and the other bits sequencing identically for both fk and gk through k=000, 001, 010, 011, 100, 101, 110, 111), then linear analysis with dk≡(fk-gk)/|fk-gk|. The advantage [may be] that fk and gk are already know (no SpS has to be built and analyzed to get them) and the dks are [close to] orthogonal. Next we test this revision of Method-2 (called 2.1) against Method-3.1 to see if orienting the rectangle to "fit" X (all corners, fk, gk are from X) is worth the extra work (better accuracy? Clearly it is slower processing!

Method-3: cluster X  Rn. Calculate M=MeanVector(X) directly, using only the residualized 1-counts of the basic pTrees of X. And BTW, use residualized STD calculations to guide in choosing good gap width thresholds (which define what an outlier is going to be and also determine when we divide into sub-clusters.)) Pick f1MxPt(SpS[(M-x)o(M-x)]). d1≡(M-f1)/|M-f1|. If d1k 0, Gram-Schmidt {d1, e1=(1..0), ..., ek-1, ek+1, ..., en=(0...1)} giving an orthonormal basis {d1, ..., dn}. Assume k=1 d2 ≡ (e2 - (e2od1)d1) / |e2 - (e2od1)d1| d3 ≡ (e3 - (e3od1)d1 - (e3od2)d2) / |e3 - (e3od1)d1 - (e3od2)d2| ... dh ≡ (eh - (ehod1)d1 - (ehod2)d2 - ... - (ehodh-1)dh-1) / | eh - (ehod1)d1 - (ehod2)d2 - ... - (ehodh-1)dh-1 | Theorem: M a fixed pt., MxPt[SpS((M-x)od)]=MxPt[SpS(xod)]. Since (M-x)od =Mod-xod, the projected values{xod | xX}, are just the shift by Mod of the projected values, {(M-x)od | xX}. Therefore the Max values are generated at the same pt(s). So we can use xod instead of (M-x)od always when calculating MxPt (and MnPt). Repick f1MnPt[SpS(xod1)]. Pick g1MxPt[SpS(xod1)] Pick f2MnPt[SpS(xod2)]. Pick g2MxPt[SpS(xod2)]. ... Pick fhMnPt[SpS(xodh)]. Pick ghMxPt[SpS(xodh)]. Do some combination of round gap analysis using the f's and g's and linear gap analysis with the d's (possibly only the round gap analysis?) Notes on implementation speed: For a fixed point, p in Rn (e.g., p=M or p=fkor p=dk). (p-x)o(p-x) = pop + xox -2xop = pop + k=1..nxk2 + k=1..n(-2pk)xk xop = k=1..n(-2pk)xk Since loading is the most expensive step in our logical operations (AND/OR/XOR/...), there should be ways to. e.g., load x and then use it for nearly all the binary operations above (rather than reloading for each one individually.)

In this first attempt with Method-3, I will user SpS((M-x)o(M-x)) to find f1, then use do all the Linear gap analyses. f1=ball g1=tall LnGp>4 1 -137 ball 0 -126 b123 0 -124 b134 1 -122 b234 0 -112 b13 ... 1 -29 t13 1 -24 t134 1 -18 t123 1 -13 tal f2=vir11 g2=set16 Ln>4 none b123 b134 b234 0.0 41.0 42.5 41.0 0.0 43.9 42.5 43.9 0.0 f3=t34 g3=vir18 Ln>4 none f4=t4 g4=b4 Ln>4 1 24 vir1 0 39 b4 0 39 b14 x x x x x x xx x x x x x x x x x x x x x x x x x x x x x x x x x x xx x x x x x x x x x x x x x x x xxx x x x x xx x x x x x x x x xxx x xx x x x x x x xx x x x x x x x x x x x x x x xx x x x x xx x x x xx x x  Method-2 f1=MinVector Method-2 g1=MaxVector ↓ f4=t4 g4=vir1 Ln>4 none This ends the process. We found all (and only) added anomalies, but missed t34, t14, t4, t1, t3, b1, b3. f1 in Method-3 g1 in Method-3 x x x x xx x x x x x x x x x x x x x x x x x x x x x x x x x x x x xx x x x x x x x x x x x x x x x xxx x x x x xx x x x x x x x x f1=b13 g1=b2 LnGp>4 none f2=t2 g2=b2 LnGp>4 1 21 set16 0 26 b2 f2=t2 g2=t234 Ln>4 0 5 t23 0 5 t234 0 6 t12 0 6 t24 0 6 t124 1 6 t2 0 21 ver11 t23 t234 t12 t24 t124 t2 0.0 12.0 51.7 37.0 53.0 35.0 12.0 0.0 53.0 35.0 51.7 37.0 51.7 53.0 0.0 39.8 12.0 38.0 37.0 35.0 39.8 0.0 38.0 12.0 53.0 51.7 12.0 38.0 0.0 39.8 35.0 37.0 38.0 12.0 39.8 0.0 f2=vir11 g2=b23 Ln>4 1 43 b12 0 50 b34 0 51 b124 0 51 b23 0 52 t13 0 53 b13 b34 b124 b23 t13 b13 0.0 61.4 41.0 91.2 42.5 61.4 0.0 60.5 88.4 59.4 41.0 60.5 0.0 91.8 43.9 91.2 88.4 91.8 0.0 104.8 42.5 59.4 43.9 104.8 0.0 f2=vir11 g2=b12 Ln>4 1 45 set16 0 61 b24 0 61 b2 0 61 b12 b24 b2 b12 0.0 28.0 42.5 28.0 0.0 32.0 42.5 32.0 0.0

f1=bal RnGp>4 1 0 ball 0 28 b123... 1 73 t4 0 78 vir39... 1 98 t34 0 103 t12 0 104 t23 0 107 t124 1 108 t234 0 113 t13 1 116 t134 0 122 t123 0 125 tal Finally we would classify within SubCluster1 using the means of another training set (with FAUST Classify). We would also classify SubCluster2.1 and SubCluster2.2, but would we know we would find SubCluster2.1 to be all Setosa and SubCluster2.2 to be all Versicolor (as we did before). In SubCluster1 we would separate Versicolor from Virginica perfectly (as we did before). Meth-3.1 start  f1Mx(SpS((M-x)o(M-x))), Round gaps first, then Linear gaps. Sub Clus1 Sub Clus2 t12 t23 t124 t234 0.0 51.7 12.0 53.0 51.7 0.0 53.0 12.0 12.0 53.0 0.0 51.7 53.0 12.0 51.7 0.0 We could FAUST Classify each outlier (if so desired) to find out which class they are outliers from. However, what about the rouge outliers I added? What would we expect? They are not represented in the training set, so what would happen to them? My thinking: they are real iris samples so we should not do the really do the outlier analysis and subsequent classification on the original 150. We already know (assuming the "other training set" has the same means as these 150 do), that we can separate Setosa, Versicolor and Virginica prefectly using FAUST Classify. SubClus2 f1=t14 Rn>4 0 0 t1 1 0 t14 0 30 ver8 ... 1 47 set15 0 52 t3 0 52 t34 SubClus1 f1=b123 Rn>4 1 0 b123 0 30 b13 0 30 vir32 0 30 vir18 1 32 b23 0 37 vir6 b13 vir32 vir18 b23 0.0 22.5 22.4 43.9 22.5 0.0 4.1 35.3 22.4 4.1 0.0 33.4 43.9 35.3 33.4 0.0 If this is typical (though concluding from one example is definitely "over-fitting"), then we have to conclude that Mark's round gap analysis is more productive than linear dot product proj gap analysis! In 3.1, I computed SpS((M-x)o(M-x)) for f1 (expensive? Grab any pt?, a corner pt?) then compute SpS((x-f1)o(x-f1)) for f1-round-gap-analysis. Then compute SpS(xod1) to get g1 (and for d1 linear gap analysis) (Too expensive? since gk-round-gap-analysis and linear analysis contributed very little! But we need it to get f2, etc. Are there other cheaper ways to get a good f2? Need SpS((x-g1)o(x-g1)) for g1-round-gap-analysis (too expensive!) SubClus2 f1=set23 Rn>4 1 17 vir39 0 23 ver49 0 26 ver8 0 27 ver44 1 30 ver11 0 43 t24 0 43 t2 SubClus1 f1=b134 Rn>4 1 0 b134 0 24 vir19 |ver49 ver8 ver44 ver11 0.0 3.9 3.9 7.1 3.9 0.0 1.4 4.7 3.9 1.4 0.0 3.7 7.1 4.7 3.7 0.0 Almost outliers! Subcluster2.2 Which type? Must classify. Sub Clus2.2 SC1 f2=ver13 Rn>4 1 0 ver13 0 5 ver43 SubClus1 f1=b234 Rn>4 1 0 b234 1 30 b34 0 37 vir10 SC1 g2=vir10 Rn>4 1 0 vir10 0 6 vir44 SubClus1 f1=b124 Rn>4 1 0 b124 0 28 b12 0 30 b14 1 32 b24 0 41 b1... 1 59 t4 0 68 b3 b124 b12 b14 0.0 28.0 30.0 28.0 0.0 41.0 30.0 41.0 0.0 SbCl_2.1 g1=ver39 Rn>4 1 0 vir39 0 7 set21 Note:what remains in SubClus2.1 is exactly the 50 setosa. But we wouldn't know that, so we continue to look for outliers and subclusters. SC1 f4=b1 Rn>4 1 0 b1 0 23 ver1 SbCl_2.1 g1=set19 Rn>4 none SbCl_2.1 LnG>4 none SbCl_2.1 f3=set16 Rn>4 none SbCl_2.1 g3=set9 Rn>4 none SC1 f1=vir19 Rn>4 1 44 t4 0 52 b2 SbCl_2.1 f2=set42 Rn>4 1 0 set42 0 6 set9 SC1 g4=b4 Rn>4 1 0 b4 0 21 vir15 SbCl_2.1 LnG>4 none SbCl_2.1 f4=set Rn>4 none SbCl_2.1 f2=set9 Rn>4 none SbCl_2.1 g4=set Rn>4 none SC1 g1=b2 Rn>4 1 0 t4 0 28 ver36 SubC1us1 has 91, only versicolor and virginica. SbCl_2.1 g2=set16 Rn>4 none SbCl_2.1 LnG>4 none SbCl_2.1 LnG>4 none

2.1 start  f1=MnVec RnGp>4 none Meth-2.1 on IRIS: f and g are opposite corners of the X-circumscribing box: f=MinVecX≡(minXx1..minXxn) g1=MxVec RnGp>4 0 7 vir18... 1 47 ver30 0 53 ver49.. 0 74 set14 g≡MaxVecX≡(maxXx1..maxXxn), d≡(g-f)/|g-f| Sub Clus1 Sequence thru main diagonal pairs, {f, g} lexicographically. For each, create d. Sub Clus2 2.1.a Do SpS((x-f)o(x-f)) round gap analysis 2.1.b Do SpS((x-g)o(x-g)) round gap analysis. SubClus1 Lin>4 none SubCluster2 2.1.c Do SpS((xod)) linear gap analysis. Notes: No calculation is required to find f and g (assuming MaxVecX and MinVecX have been calculated and residualized when pTreeSetX was captured.) 2.1.c (and 2.1.b?) may be unproductive in finding new subclusters/anomalies (either because 2.1.a finds almost all or because 2.1.b and/or 2.1.c find the same ones) and could be skipped (very likely if the dimension is high, since the main diagonal corners are typically far from X, in a high dimensional vector space and thus the radii of a round gap is large and large radii round gaps are nearly linear, suggesting 2.1.a will find all the subclusters that 2.1.b and 2.1.c would find. f2=0001 RnGp>4 none This ends SubClus2 = 47 setosa only g2=1110 RnGp>4 none Lin>4 none f1=0000 RnGp>4 none g1=1111 RnGp>4 none Lin>4 none f3=0010 RnGp>4 none f2=0001 RnGp>4 none g2=1110 RnGp>4 none Lin>4 none g3=1101 RnGp>4 none Lin>4 none f3=0010 RnGp>4 none g3=1101 RnGp>4 none Lin>4 none f4=0011 RnGp>4 none f4=0011 RnGp>4 none g4=1100 RnGp>4 none Lin>4 none g4=1100 RnGp>4 none f5=0100 RnGp>4 none g5=1011 RnGp>4 none Lin>4 none Lin>4 none f6=0101 RnGp>4 1 19 set26 0 28 ver49 0 31 set42 0 31 ver8 0 32 set36 0 32 ver44 1 35 ver11 0 41 ver13 f5=0100 RnGp>4 none ver49 set42 ver8 set36 ver44 ver11 0.0 19.8 3.9 21.3 3.9 7.2 19.8 0.0 21.6 10.4 21.8 23.8 3.9 21.6 0.0 23.9 1.4 4.6 21.3 10.4 23.9 0.0 24.2 27.1 3.9 21.8 1.4 24.2 0.0 3.6 7.2 23.8 4.6 27.1 3.6 0.0 ver49 ver8 ver44 ver11 Subc2.1 g5=1011 RnGp>4 none Lin>4 none f6=0101 RnGp>4 none g6=1010 RnGp>4 none g6=1010 RnGp>4 none Lin>4 none Lin>4 none Final Notes: Clearly 2.1.b is very productive in this example! Without it we would not have separated setosa from versicolor+virginica! But 2.1.c was unproductive. This suggest that it is productive to calculate 2.1.a and 2.1.b but having done that 2.1.c will probably not be productive. Next we consider doing only 2.1.c to see if it is as productive as 2.1.a + 2.1.b. f7=0110 RnGp>4 none f7=0110 RnGp>4 1 28 ver13 0 33 vir49 g7=1001 RnGp>4 none Lin>4 none g7=1001 RnGp>4 none Lin>4 none f8=0111 RnGp>4 none f8=0111 RnGp>4 none g8=1000 RnGp>4 none g8=1000 RnGp>4 none Lin>4 none Lin>4 none This ends SubClus1 = 95 ver and vir samples only

2.1.c start f0000=VectoMn Meth-2.1c on IRIS: f and g are opposite corners of the X-circumscribing box: f=MinVecX≡(minXx1..minXxn) g1111=VectoMx LinGp>4 0 38 set14... 1 60 ver11 0 65 ver30... 0 106 vir18 Looks like the same split as before! g≡MaxVecX≡(maxXx1..maxXxn), d≡(g-f)/|g-f| Sub Clus1 Sequence thru main diagonal pairs, {f, g} lexicographically. For each create d. Sub Clus2 2.1.c Do SpS((xod)) linear gap analysis. SubClus1 LinGap>4 Class Means: Mset 50.06 34.18 14.64 2.44 Mver 59.36 27.7 42.6 13.26 Mvir 65.88 29.74 54.92 20.26 f0001 g1110 none SL SW PL PW dMset dMver dMvir FAUSTclass name 49 25 45 17 34.9 11.6 20.4 ver vir7 62 28 48 18 39.2 7.7 8.4 ver vir27 56 28 49 20 39.5 9.9 11.7 ver vir22 70 32 47 14 39.8 12.3 11.1 vir ver1 68 28 48 14 40.1 10.2 9.7 vir ver27 60 22 50 15 40.7 9.5 12.1 ver vir20 60 27 51 16 40.7 8.9 8.7 vir ver34 57 25 50 20 41.1 10.6 11.2 ver vir14 69 31 49 15 41.3 12.2 8.6 vir ver3 67 30 50 17 42.0 11.5 6.0 vir ver28 So FAUST Classifier (midpoint of means version) miss-classifies 5 of the veriscolor and 5 of the virginica of SubCluster-2 (and it miss-classifies vir39 as veriscolor). So it is 93% accurate overall (100% on setosa, 89% on versicolor and 90% on virginica). f0010 g1101 none f0011 g1100 0 12 ver11 0 14 ver44 0 14 ver8 1 17 ver49 0 26 set42... 0 42 set15 SubClus1.1 ver11 ver44 ver8 ver49 Mset Mver Mvir 0.0 3.6 4.6 7.2 25.9 14.7 29.1 3.6 0.0 1.4 3.9 22.8 14.6 29.7 4.6 1.4 0.0 3.9 22.3 15.0 30.1 7.2 3.9 3.9 0.0 19.9 15.5 30.8 25.9 22.8 22.3 19.9 0.0 32.1 47.0 = Mset 14.7 14.6 15.0 15.5 32.1 0.0 15.7 = Mver 29.1 29.7 30.1 30.8 47.0 15.7 0.0 = Mvir Class means show SubClus1.1  ver. None are outliers! SubClus1.2 SubClus1.2 LnGp>4 SubClus2 LinGap>4 f0100 g1011 1 27 set21 0 37 vir39 f0001 g1110 none f0010 g1101 none f0011 g1100 f0101 g1010 none f0100 g1011 none f0101 g1010 none f0110 g1001 none f0110 g1001 none So 2.1.c is as good as the combo of 2.1.a and 2.1.b (projection on d appears to be as accurate as the combination of square length of f and of g). This is probably because the round gaps (centered at the corners) are nearly linear by the time they get to the set X itself. To compare the time costs, we note: f0111 g1000 none f0111 g1000 none SubClus2 is the 46 remaining verisolor and the 49 remaining virginica SubClus1.2 is exactly the 50 setosa The combination of 2.1.a and 2.1.b, (p-x)o(p-x)= pop +xox -2xop= pop + k=1..nxk2 + k=1..n(-2pk)xk has n multiplications in the second term, n scalar multiplications and n additions in the third term. For both p=f and p=g, then, it takes 2n multiplications, 2n scalar multiplications and 2n additions. For 2.1.c, xod = k=1..n(dk)xk involves n scalar multiplications and n additions. So it appears to be much cheaper (timewise)

Thin interval finder on the fM line using the scalar pTreeSet, PTreeSet(xofM) (the pTree slices of these projection lengths) Looking for Width24_Count=1_ThinIntervals or W16_C1_TIs 1 z1 z2 z7 2 z3 z5 z8 3 z4 z6 z9 4 za 5 M 6 7 8 zf 9 zb a zc b zd ze c 0 1 2 3 4 5 6 7 8 9 a b c d e f X x1 x2 z1 1 1 z2 3 1 z3 2 2 z4 3 3 z5 6 2 z6 9 3 z7 15 1 z8 14 2 z9 15 3 za 13 4 zb 10 9 zc 11 10 zd 9 11 ze 11 11 zf 7 8 xofM 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 f= &p5' 1 C=3 p5' C=2 p5 C=8 &p4' 1 C=1 p4' p4 C=2 C=0 C=6 p6' 1 C=5 p6 C10 W=24 C=1 [000 0000, 000 1111] =[0,16). z1ofM=11 is 5 units from 16, so z1 not declared an anomaly. W=24 C=1 [010 0000 , 010 1111] =[32,48). z4ofM=34 is within 2 of 32, so z4 is not declared an anomaly. W=24 C=1 [0110000, 0111111] =[48, 64). z5ofM=53 is 19 from z4ofM=34 (>24) but 11 from 64. The next interval [64,80) is empty and it's 27 from 80 (>24) so z5 is an anomaly and we make a cut through z5. W=24 C=0 [100 0000 , 100 1111]=[64, 80). Ordinarily we cut thru the midpoint of C=0 intervals, but in this case it's unnecessary since it would duplicate the z5 cut just made. Here we started with xofM distances. The same process works starting with any distance based ScalarPTreeSet, e.g., xox, etc.

[ ] Find a furthest point from M, f0MaxPt[SpS((x-M)o(x-M))]. F:XR any distance dominated functional (=ScalarPTreeSet(x,F(x)) s.t. |F(x)-F(y)|dis(x,y) for gap-based FAUST machine teaching. E.g., the dot product with any fixed vector, v, (gaps in the projections along the line generated by the vector). E.g., use vectors: fM; fM/|fM|; or in general, a*fM (a constant); (where M is a medoid (mean or vector of medians) and f is a "furthest point" from M). fF; fF/|fF|; or in general, a*fF (a constant); (where F is a "furthest point" from f). ek where ek - (0 0 0 ... 1 0 0 0 ...) (1 in the kth position) But also, if one takes the ScalarPTreeSet(x,xox) of square vector lengths (or just lengths), the gaps are rounded gaps as one proceeds out from the origin. One can note that this is just the column of xox values, so it is dot product generated also. Find a furthest point from M, f0MaxPt[SpS((x-M)o(x-M))]. Do f0 round gap analysis onSpS((x-f0)o(x-f0)) to identify/eliminate (repeating if f0 is eliminated). d0≡(M-f0)/|M-f0|. Find a furthest point from f0, f1MaxPt[SpS((x-f0)o(x-f0))]. d1≡(f1-f0)/|f1-f0|. Do f1 round gap analysis { SpS((x-f1)o(x-f1)) } to identify/eliminate (repeating if f1 is eliminated) anomalies on the f1 end. Do d1 linear gap analysis (SpS((x-f0)od1)) X'=d1≡space perp to d1. The projection of x-f0 onto d1 is ≡ - x' (x-f0) ((x-f0)od1)d1 x'ox' [ - (x-f0) ((x-f0)od1)d1 = ] o = (x-f0)o(x-f0) - ((x-f0)od1)2 SpS(x'ox') = SpS[(x-f0)o(x-f0)) - SpS[(x-f0)od1]2 For each subcluster, find f2MaxPt[SpSSubCluster(x'ox')] and d2≡f2'/|f2'|=[(f2-f0)-((f2-f0)od1)d1]/|f2'| x(k) ≡ x(k-1) - (x(k-1)odk)dk where fk MaxPtSubCluster[SpS(x(k-1)ox(k-1))] dk≡fk-1'/ fk-1'| Do fk round gap analysis { SpS[(x-fk)o(x-fk)] } to identify/eliminate (repeating if fk is eliminated) on the fk end. Do dk linear gap analysis { SpS[(x-f0)odk] } to separate sub-clusters.

APPENDIX: FAUST=Fast, Accurate Unsupervised and Supervised Teaching (Teaching big data to reveal info) FAUST CLUSTER-fmg (furthest-to-mean gaps for finding round clusters): C=X (e.g., X≡{p1, ..., pf}= 15 pix dataset.) While an incomplete cluster, C, remains find M ≡ Medoid(C) ( Mean or Vector_of_Medians or? ). Pick fC furthest from M from S≡SPTreeSet(D(x,M) .(e.g., HOBbit furthest f, take any from highest-order S-slice.) If ct(C)/dis2(f,M)>DT (DensThresh), C is complete, else split C where P≡PTreeSet(cofM/|fM|) gap > GT (GapThresh) End While. Notes: a. Euclidean and HOBbit furthest. b. fM/|fM| and just fM in P. c. find gaps by sorrting P or O(logn) pTree method? C2={p5} complete (singleton = outlier). C3={p6,pf}, will split (details omitted), so {p6}, {pf} complete (outliers). That leaves C1={p1,p2,p3,p4} and C4={p7,p8,p9,pa,pb,pc,pd,pe} still incomplete. C1 is dense ( density(C1)= ~4/22=.5 > DT=.3 ?) , thus C1 is complete. Applying the algorithm to C4: In both cases those probably are the best "round" clusters, so the accuracy seems high. The speed will be very high! {pa} outlier. C2 splits into {p9}, {pb,pc,pd} complete. 1 p1 p2 p7 2 p3 p5 p8 3 p4 p6 p9 4 pa 5 6 7 8 pf 9 pb a pc b pd pe c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f M0 8.3 4.2 M1 6.3 3.5 f1=p3, C1 doesn't split (complete). M f M4 1 2 p2 p5 p1 3 p4 p6 p9 4 p3 p8 p7 5 pf pb 6 pe pc 7 pd pa 8 1 2 3 4 5 6 7 8 9 a b c d e f Interlocking horseshoes with an outlier X x1 x2 p1 1 1 p2 3 1 p3 2 2 p4 3 3 p5 6 2 p6 9 3 p7 15 1 p8 14 2 p9 15 3 pa 13 4 pb 10 9 pc 11 10 pd 9 11 pe 11 11 pf 7 8 D(x,M0) 2.2 3.9 6.3 5.4 3.2 1.4 0.8 2.3 4.9 7.3 3.8 3.3 1.8 1.5 C1 C2 C3 C4 M1 M0

FAUST Oblique PR = P(X dot d)<a d-line D≡ mRmV = oblique vector. d=D/|D| Separate classR, classV using midpoints of means (mom) method: calc a View mR, mV as vectors (mR≡vector from origin to pt_mR), a = (mR+(mV-mR)/2)od = (mR+mV)/2 o d (Very same formula works when D=mVmR, i.e., points to left) Training ≡ choosing "cut-hyper-plane" (CHP), which is always an (n-1)-dimensionl hyperplane (which cuts space in two). Classifying is one horizontal program (AND/OR) across pTrees to get a mask pTree for each entire class (bulk classification) Improve accuracy? e.g., by considering the dispersion within classes when placing the CHP. Use 1. the vector_of_median, vom, to represent each class, rather than mV, vomV ≡ ( median{v1|vV}, 2. project each class onto the d-line (e.g., the R-class below); then calculate the std (one horizontal formula per class; using Md's method); then use the std ratio to place CHP (No longer at the midpoint between mr [vomr] and mv [vomv] ) median{v2|vV}, ... ) dim 2 vomR vomV r   r vv r mR   r      v v v v       r    r      v mV v      r    v v     r         v                     v2 v1 d-line dim 1 d a std of these distances from origin along the d-line

1. MapReduce FAUST. Current_Relevancy_Score =9. Killer_Idea_Score=2 1. MapReduce FAUST Current_Relevancy_Score =9 Killer_Idea_Score=2 Nothing comes to minds as to what we would do here.  MapReduce.Hadoop is a key-value approach to organizing complex BigData.  In FAUST PREDICT/CLASSIFY we start with a Training TABLE and in FAUST CLUSTER/ANOMALIZER  we start with a vector space. Mark suggests (my understanding), capturing pTreeBases as Hadoop/MapReduce key-value bases? I suggested to Arjun developing XML to capture Hadoop datasets as pTreeBases. The former is probably wiser. A wish list of great things that might result would be a good start. 2.  pTree Text Mining: Current_Relevancy_Score =10  Killer_Idea_Score=9   I I think Oblique FAUST is the way to do this.  Also there is the very new idea of capturing the reading sequence, not just the term-frequency matrix (lossless capture) of a corpus. 3. FAUST CLUSTER/ANOMALASER: Current_Relevancy_Score =9               Killer_Idea_Score=9   No No one has taken up the proof that this is a break through method.  The applications are unlimited! 4.  Secure pTreeBases: Current_Relevancy_Score =9            Killer_Idea_Score=10     This seems straight forward and a certainty (to be a killer advance)!  It would involve becoming the world expert on what data security really means and how it has been done by others and then comparing our approach to theirs.  Truly a complete career is waiting for someone here! 5. FAUST PREDICTOR/CLASSIFIER: Current_Relevancy_Score =9             Killer_Idea_Score=10 No one done a complete analysis of this is a break through method.  The applications are unlimited here too! 6.  pTree Algorithmic Tools: Current_Relevancy_Score =10                 Killer_Idea_Score=10 This is Md’s work.  Expanding the algorithmic tool set to include quadratic tools and even higher degree tools is very powerful.  It helps us all! 7.  pTree Alternative Algorithm Impl: Current_Relevancy_Score =9               Killer_Idea_Score=8 This is Bryan’s work.  Implementing pTree algorithms in hardware/firmware (e.g., FPGAs) - orders of magnitude performance improvement? 8.  pTree O/S Infrastructure: Current_Relevancy_Score =10                    Killer_Idea_Score=10 This is Matt’s work.  I don’t yet know the details, but Matt, under the direction of Dr. Wettstein, is finishing up his thesis on this topic – such changes as very large page sizes, cache sizes, prefetching,…  I give it a 10/10 because I know the people – they do double digit work always! From: Arjun.Roy@my.ndsu.edu] Sent: Thurs, Aug 09 Dear Dr. Perrizo, Do you think a map reduce class of FAUST algorithms could be built into a thesis? If the ultimate aim is to process big data, modification of existing P-tree based FAUST algorithms on Hadoop framework could be something to look on? I am myself not sure how far can I go but if you approve, then I can work on it. From: Mark to:Arjun Aug 9 From industry perspective, hadoop is king (at least at this point in time). I believe vertical data organization maps really well with a map/reduce approach –   these are complimentary as hadoop is organized more for unstructured data, so these topics are not mutually exclusive. So from industry side I’d vote hadoop… from Treeminer side text (although we are very interested in both) From: msilverman@treeminer.com Sent: Friday, Aug 10 I’m working thru a list of what we need to get done – it will include implementing anomaly detection which is now on my list for some time.  I tried to establish a number of things such that even if we had some difficulties with some parts we could show others (w/o digging us too deep). Once I get this I’ll get a call going.  I have another programming resource down here who’s been working with me on our production code who will also be picking up some of the work to get this across the finish line, and a have also someone who was a director at our customer previously assisting us in packaging it all up so the customer will perceive value received… I think Dale sounded happy yesterday.