FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Machine learning continued Image source:
K Means Clustering , Nearest Cluster and Gaussian Mixture
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
What is Cluster Analysis?
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering Unsupervised learning Generating “classes”
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Module 04: Algorithms Topic 07: Instance-Based Learning
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Data mining and machine learning A brief introduction.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Chapter 9 – Classification and Regression Trees
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
FAUST Oblique Analytics (based on the dot product, o). Given a table, X(X 1..X n ), |X|=N and vectors, D=(D 1..D n ), FAUST Oblique employs the ScalarPTreeSets.
Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
FAUST Oblique Analytics : X(X 1..X n )  R n |X|=N, Classes={C 1..C K }, d=(d 1..d n ) |d|=1, p=(p 1..p n )  R n, L, R: FAUST C ount C hange C lusterer.
Correspondences this week From: Arjun Roy Sent: Sunday, March 02, :14 PM Subject: C++/C# Compiler I did some tests to compare C/C++ Vs C# on some.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Chapter 10 Introduction to Data Mining
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
PREDICT 422: Practical Machine Learning
Data Science Algorithms: The Basic Methods
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Maximizing theVariance =
SAD: 6º Projecto.
Discrimination and Classification
CSE 5243 Intro. to Data Mining
Taking our Pulse on FAUST Classifiers, 03/01/2014
Counter propagation network (CPN) (§ 5.3)
FAUST Oblique Analytics Given a table, X(X1
Clustering.
Classify x as class C iff there exists cC such that (c-x)o(c-x)  r2
FAUST Oblique Analytics Given a table, X(X1
Hierarchical clustering approaches for high-throughput data
מדינת ישראל הוועדה לאנרגיה אטומית
FAUST Outlier Detector To be used when the goal is to find outliers as quickly as possible. FOD recursively uses a vector, D=FurthestFromMedian-to-FurthestFromFurthestFromMedia.
FAUST Oblique Analytics Given a table, X(X1
Clustering 77B Recommender Systems
Let's review Data Analytics Technology, Supervised and Supervised
FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.
Dots 5 × TABLES MULTIPLICATION.
Dots 5 × TABLES MULTIPLICATION.
Dots 2 × TABLES MULTIPLICATION.
Dots 3 × TABLES MULTIPLICATION.
LECTURE 23: INFORMATION THEORY REVIEW
Dots 6 × TABLES MULTIPLICATION.
Text Categorization Berlin Chen 2003 Reference:
Dots 2 × TABLES MULTIPLICATION.
Mean Absolute Deviation
Multivariate Methods Berlin Chen
Dots 4 × TABLES MULTIPLICATION.
Multivariate Methods Berlin Chen, 2005 References:
Mean Absolute Deviation
Hierarchical Clustering
Exploiting the Power of Group Differences to Solve Data Analysis Problems Outlier & Intrusion Detection Guozhu Dong, PhD, Professor CSE
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Computational Intelligence
Dots 3 × TABLES MULTIPLICATION.
Classify x into C iff there exists cC such that (c-x)o(c-x)  r2
Presentation transcript:

FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ the ScalarPTreeSet (SPTS) of a valueTree, XoD  k=1..nXk*Dk, D=(D1...Dn) = a fixed vector. FAUST Count Change (FC2) for clustering Choose a nextD recursion plan to specify which D to use at each recursive step, e.g., if a cluster, C, needs further partitioning, a. D = the diagonal producing the maximum Standard Deviation (STD(C)) or maximum STD(C)/Spread(C). b. AM(C) (Average-to-Median) c. AFFA(C) (Avg-FurthestFromAverage) [or FFAFFF(C) (FurthestFromAvg-FurthestFromFurthest)]. d. cycle thru diagonals: e1,...,..en, e1e2.. Or cycle thru AM, AFFA, FFAFFF or cycle through both. Choose DensityThreshold(DT), DensityUniformityThreshold(DUT),Precipitous Count Change (PCC) def (PCCs include gaps). ALGORITHM: If DT (and DUT) are not exceeded at cluster, C, partition by cutting at each PCC in CoD using the nextD. FAUST Polygon Prediction (FP 2) for 1-class or multi-class classification. Let Xn+1= Class label column, C. For each vector, D, let lD,kminCkoD (or the1st Precipitous_Count_Increase=PCI?); hD,k=hD,kmaxCkoD (or the last PCD?). ALGORITHM: y is declared to be class=k iff yHullk where Hullk={z| lD,k  Doz  hD,k all D}. (If y is in multiple hulls, Hi1..Hih, y isa Ck for the k maximizing OneCount{PCk&PHi..&PHih} or fuzzy classify using those OneCounts as k-weights) Outlier Mining can mean: 1. Given a set of n objects and given a k, find the top k objects in terms of dissimilarity from the rest of the objects. 1.a This could mean the k object, xh, (h=1..k) most dissimilar [distant from] their individual complements, X-{xh}, or 1.b The top "set of k objects“, Sk, for which that set is most dissimilar from its complement, X-Sk. 2. Given a Training Set, identify outliers in each class (correctly classified but noticeably dissimilar to fellow class members). 3. Determine "fuzzy" clusters, i.e., assign a weight for each (object, cluster) pair. (A dendogram does that to some extent.). Note: FC3 is a good outlier detector, since it identifies and removes large clusters so small clusters (outliers) appear. FAUST Distance Analytics useSPTS of a distance valueTree. e.g., SquareDistanceToNearestNeighbor (D2NN) FAUST Outlier Observer (FO2) uses D2NN. (L2 or Euclidean distance best, but L (EIN) works too.) D2NN provides an instantaneous k-slider for 1.a (Find k objects, x, most dissimilar from X-{x}. It’s useful for the others too.