FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Machine learning continued Image source:
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Lecture 3 Nonparametric density estimation and classification
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
With PGP-D, to get pTree info, you need: the ordering (the mapping of bit position to table row) the predicate (e.g., table column id and bit slice or.
Planning under Uncertainty
Linear Separators.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
4. Ad-hoc I: Hierarchical clustering
Induction of Decision Trees
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Clustering Unsupervised learning Generating “classes”
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Chapter 9 – Classification and Regression Trees
Scaling up Decision Trees. Decision tree learning.
FAUST Oblique Analytics (based on the dot product, o). Given a table, X(X 1..X n ), |X|=N and vectors, D=(D 1..D n ), FAUST Oblique employs the ScalarPTreeSets.
Dr. Wettstein: Below is a copy of the sent to Max Glover of Intel on the physical guarantee imposed by symmetric encryption systems. I hope you.
Jan Topological Order and SCC Edge classification Topological order Recognition of strongly connected components.
Measures of variability: understanding the complexity of natural phenomena.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.
1 p1 p2 p7 2 p3 p5 p8 3 p4 p6 p9 4 pa pf 9 pb a pc b pd pe c d e f a b c d e f X x1 x2 p1 1 1 p2 3 1 p3 2 2 p4 3 3 p5 6 2 p6.
Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.
Given k, k-means clustering is implemented in 4 steps, assumes the clustering criteria is to maximize intra- cluster similarity and minimize inter-cluster.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Heuristic Functions. A Heuristic is a function that, when applied to a state, returns a number that is an estimate of the merit of the state, with respect.
FAUST Oblique Analytics : X(X 1..X n )  R n |X|=N, Classes={C 1..C K }, d=(d 1..d n ) |d|=1, p=(p 1..p n )  R n, L, R: FAUST C ount C hange C lusterer.
Association Analysis (3)
Correspondences this week From: Arjun Roy Sent: Sunday, March 02, :14 PM Subject: C++/C# Compiler I did some tests to compare C/C++ Vs C# on some.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Adversarial Search 2 (Game Playing)
Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.
Q&A f=distance dominated functional, avgGap=(f max -f min )/|f(X)| may be a good measurement for setting thresholds, e.g., x is an outlier=anomaly if.
Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.
Age stage expectations The calculation policy is organised according to age stage expectations as set out in the National Curriculum 2014, however it.
Clustering CSC 600: Data Mining Class 21.
Data Mining Soongsil University
Time complexity Here we will consider elements of computational complexity theory – an investigation of the time (or other resources) required for solving.
CS Fall 2016 (Shavlik©), Lecture 11, Week 6
FAUST for One-class Classification Let the one class = C
Póth Miklós Polytechnical Engineering College, Subotica
Structural testing, Path Testing
Introduction to Operations Research
Taking our Pulse on FAUST Classifiers, 03/01/2014
FAUST Oblique Analytics Given a table, X(X1
Classify x as class C iff there exists cC such that (c-x)o(c-x)  r2
PDR PTreeSet Distribution Revealer
FAUST Outlier Detector To be used when the goal is to find outliers as quickly as possible. FOD recursively uses a vector, D=FurthestFromMedian-to-FurthestFromFurthestFromMedia.
FAUST Oblique Analytics Given a table, X(X1
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lectures on Graph Algorithms: searching, testing and sorting
Shortest Path Trees Construction
Let's review Data Analytics Technology, Supervised and Supervised
FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.
FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.
Feature space tansformation methods
Decision Trees Jeff Storey.
Classify x into C iff there exists cC such that (c-x)o(c-x)  r2
Presentation transcript:

FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built for speed improvements so that big data can be mined in human time. Oblique FAUST is generalized to CC FAUST which places cuts at all large Count Changes (CCs). A CC reveals a cluster boundary almost always (i.e., almost always a large Count Decrease (CD) occurs iff we are exiting a cluster somewhere on the cut hyper-plane and a large Count Increase (CI) occurs iff we are entering a cluster. CC FAUST makes a cut at each CC in the y o d values (A gap is a LCD followed by a LCI so LCC includes Oblique FAUST) CC FAUST is Divisive Hierarchical Clustering which, if continued to singleton sub-clusters, builds a complete dendogram. If the problem at hand is outlier [anomaly] detection, any singleton sub-cluster separated by a sufficient gaps, is an outlier. CC FAUST will scale up, because entering and leaving a cluster "smoothly" (meaning without noticeable count change) is no more likely for large datasets than for small. (It's a measure=0 phenomenon). Do we need BARREL FAUST at all now? BARREL CC FAUST is still useful for estimating the diameter of a set as SQRT((dot_product_width onto a d-line) 2 +(max barrel radius from that d-line) 2 ). Density Uniformity (DU) of a sub-cluster might be defined as the reciprocal of the variance of the counts. A sub-cluster dendogram should have a Density Label (DE) and a Density Uniformity label (DU) on every edge (subcluster We can end a dendogram branch as soon as DE and DU are high enough (> thresholds, DT and DUT) to save time. How can we [quickly] estimate DE and DU? DU is easy - just calculate the variance of the point counts. Density, DE, = count/volume = count/c n r n. We have the count and n. c n is a known constant (e.g., c 1 = , c 2 =4  /3...) We have volume once we have the radius. Barrel CC FAUST gives us a good radius estimate. In advance, we decide on a density threshold, DET, and a Density Uniformity Threshold DUT. To choose the "best" clustering (partitioning of the set into sub-clusters) we proceed depth first across the dendogram left-most branch to right-most branch, going down until the DET and DUT thresholds are exceeded. See next slides)

Oblique FAUST Code Layering? A layer (or object or black box or procedure) in the code called the CUTTER : INPUTS: I1. A SPTS (Scalar PTreeSet = bitsliced column of numbers, presumably coming from a dot product functional) I2. The method: Cut_at? I2a. p%_Count_Change (e.g., p=25%), I2b. Other, non-uniform count change thresholds? I2c. centers of gaps only I3. Whether the 1-counts of the sub-cluster mask pTrees should be computed and returned (Y/N), since it is an expensive step. OUTPUTS: O1. A pointer to a mask pTree for each new "sub-cluster" (i.e., indetifying each set of points separated by consecutive cuts). O2. The 1-count of each mask the GRAMMER : INPUTS: I1. An existing Labeled Dendogram (labeled with e.g., the unit vector that produced it, the density of each edge subcluster...) including the tree of pointers to a mask pTrees for each node (incl. the root, which need not be all of the original set) I2. The new threshold levels (if, e.g., the density threshold is lower than that of the existing, GRAMMER prunes the dendogram OUTPUTS: O1. The new labeled Dendogram I like the idea of building a custom dendogram for the user according to specifications. Then the user can examine it while we churn out the next level (as done in the next 2 slides, i.e., the next higher density threshold). The reason is that the full dendogram down to singletons is impossibly large and the information gain with new each level rises from zero up to a maximum and then falls steadily to zero again at the singleton level (the bottom of the full dendo is huge but worthless) A thought on the sub-cluster dendogram in general: The root should be labeled with the PTreeSet of the table involved. A thought on the sub-cluster dendogram in general: The root should be labeled with the PTreeSet of the table involved. The sub-root level should be labeled with the particular SPTS of this branch (the D-line or unit vectors, d, of the dot product... Each sub-level after that should be labeled as above. Hadoop Treeminer principles? Never discard a derived pTree and never, never discard a computed count (makes catalogue mgmt a serious undertaking?) OR pTree hoarding is good.

Choosing a clustering from a DEL and DUL labeled Dendogram A B C D E F G The algorithm for choosing the optimal clustering from a labeled dendogram is as follows: Let DET=.4 and DUT=½ DEL=.1DUL=1/6DEL=.2DUL=1/8 DEL=.4DUL=1 DEL=  DUL=  DEL=  DUL=  DEL=  DUL=  DEL=  DUL=  DEL=  DUL=  DEL=  DUL=  DEL=  DUL=  DEL=.5 DUL=½ DEL=.3

1 y1y2 y7 2 y3 y5 y8 3 y4 y6 y9 4 ya yf 9 yb a yc b yd ye c d e f a b c d e f MA cut at 7 and a bc def APPLYING CC FAUST TO SPAETH Density  Count/  r 2 labeled dendogram for LCC FAUST on Spaeth with D=AvgMedian DET=.3 Y (.15) {y1,y2,y3,y4,y5} (.37) {y6,yf} (.08) {y7,y8,y9,ya,yb.yc.yd.ye} (.07) {y7,y8,y9,ya} (.39) {yb,yc,yd,ye} (1.01) {y6} (  ) {yf} (  ) D=AM DET=.5 {y1,y2,y3,y4} (.63) {y5} (  ) {y7,y8,y9} (1.27) {ya} (  ) D=AM DET=1 {y1,y2,y3} (2.54) {y4} (  ) D  Count/  r 2 labeled dendogram for LCC FAUST on Spaeth w D=cylces thru diagonals nnxx,nxxn,nnxx,nxxn..., DET=.3 Y (.15) {y1,y2,y3,y4,y5} (.37) {y6,y7,y8,y9,ya,yb.yc.yd.ye,yf} (.09) {y6,y7,y8,y9,ya} (.17) {yb,yc,yd,ye,yf} (.25) {yf} (  ) {yb,yc,yd,ye} (1.01) {y7,y8,y9,ya} (.39) {y6} (  ) D-line labeled dendogram for LCC FAUST on Spaeth w D=furthestAvg, DET=.3 Y (.15) Y (.15) y1,2,3,4,5 (.37 {y6,yf} (.08) {y7,y8,y9,ya,yb.yc.yd.ye} (.07) {y7,8,9,a} (.39) {yb,yc,yd,ye} (1.01) {y6} (  ) {yf} (  ) 1 y1y2 y7 2 y3 y5 y8 3 y4 y6 y9 4 ya yf 9 yb a yc b yd ye a b c d e f

p6' 1 0 5/64 [0,64) p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p6' 1 0 p /64 [64,128) p6 0 1 p6 0 1 p6 0 1 p6 0 1 p6 0 1 p6 0 1 p6 0 1 Y y1 y2 y1 1 1 y2 3 1 y3 2 2 y4 3 3 y5 6 2 y6 9 3 y y y ya 13 4 pb 10 9 yc yd 9 11 ye yf 7 8 yofM p6 0 1 p p p p p p p6' 1 0 p5' p4' p3' p2' p1' p0' p3' [0,8) p [8,16) p3' [16,24) p [24,32) p3' [32,40) p [40,48) p3' [48,56) p [56,64) p3' p p3' [80,88) p [88,96) p3' [96,104) p [194,112) p3' [112,120) p [120,128) p4' /16[0,16) p4' p /16[16,32) p p4' [32,48) p4' p [48,64) p p4' [64,80) p4' p [80,96) p p4' [96,112) p4' p [112,128) p p5' /32[0,32) p5' p5' p5' p5' /32[64,96) p5' p5' p5' p /32[32,64) p p p p ¼[96,128) p p p f= UDR Univariate Distribution Revealer (on Spaeth:) Pre-compute and enter into the ToC, all DT(Y k ) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector. Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these OneCts should be repeated everywhere (e.g., in every DT). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in fact are often all we need to know about the pTree to get the answers we are after.) depthDT(S)  b≡BitWidth(S) h=depth of a node k=node offset Node h,k has a ptr to pTree{x  S | F(x)  [k2 b-h+1, (k+1)2 b-h+1 )} and its 1count applied to S, a column of numbers in bistlice format (an SpTS), will produce the DistributionTree of S DT(S) 15depth=h=0 depth=h=1 node 2,3 [96.128)