FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.

Slides:

Advertisements

Similar presentations

Advertisements

Machine learning continued Image source:

K Means Clustering , Nearest Cluster and Gaussian Mixture

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.

Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.

What is Cluster Analysis?

Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz

Clustering Unsupervised learning Generating “classes”

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Module 04: Algorithms Topic 07: Instance-Based Learning

Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.

Data mining and machine learning A brief introduction.

Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (

Chapter 9 – Classification and Regression Trees

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

FAUST Oblique Analytics (based on the dot product, o). Given a table, X(X 1..X n ), |X|=N and vectors, D=(D 1..D n ), FAUST Oblique employs the ScalarPTreeSets.

Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

FAUST Oblique Analytics : X(X 1..X n )  R n |X|=N, Classes={C 1..C K }, d=(d 1..d n ) |d|=1, p=(p 1..p n )  R n, L, R: FAUST C ount C hange C lusterer.

Correspondences this week From: Arjun Roy Sent: Sunday, March 02, :14 PM Subject: C++/C# Compiler I did some tests to compare C/C++ Vs C# on some.

DATA MINING LECTURE 10b Classification k-nearest neighbor classifier

FAUST Technology for Clustering (includes Anomaly Detection) and Classification (Where are we now?) FAUST technology for classification/clustering is built.

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

Chapter 10 Introduction to Data Mining

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering

PREDICT 422: Practical Machine Learning

Data Science Algorithms: The Basic Methods

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Maximizing theVariance =

SAD: 6º Projecto.

Discrimination and Classification

CSE 5243 Intro. to Data Mining

Taking our Pulse on FAUST Classifiers, 03/01/2014

Counter propagation network (CPN) (§ 5.3)

FAUST Oblique Analytics Given a table, X(X1

Classify x as class C iff there exists cC such that (c-x)o(c-x)  r2

FAUST Oblique Analytics Given a table, X(X1

Hierarchical clustering approaches for high-throughput data

מדינת ישראל הוועדה לאנרגיה אטומית

FAUST Outlier Detector To be used when the goal is to find outliers as quickly as possible. FOD recursively uses a vector, D=FurthestFromMedian-to-FurthestFromFurthestFromMedia.

FAUST Oblique Analytics Given a table, X(X1

Clustering 77B Recommender Systems

Let's review Data Analytics Technology, Supervised and Supervised

FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.

Dots 5 × TABLES MULTIPLICATION.

Dots 5 × TABLES MULTIPLICATION.

Dots 2 × TABLES MULTIPLICATION.

Dots 3 × TABLES MULTIPLICATION.

LECTURE 23: INFORMATION THEORY REVIEW

Dots 6 × TABLES MULTIPLICATION.

Text Categorization Berlin Chen 2003 Reference:

Dots 2 × TABLES MULTIPLICATION.

Mean Absolute Deviation

Multivariate Methods Berlin Chen

Dots 4 × TABLES MULTIPLICATION.

Multivariate Methods Berlin Chen, 2005 References:

Mean Absolute Deviation

Hierarchical Clustering

Exploiting the Power of Group Differences to Solve Data Analysis Problems Outlier & Intrusion Detection Guozhu Dong, PhD, Professor CSE

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Computational Intelligence

Dots 3 × TABLES MULTIPLICATION.

Classify x into C iff there exists cC such that (c-x)o(c-x)  r2

Presentation transcript:

FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ the ScalarPTreeSet (SPTS) of a valueTree, XoD  k=1..nXk*Dk, D=(D1...Dn) = a fixed vector. FAUST Count Change (FC2) for clustering Choose a nextD recursion plan to specify which D to use at each recursive step, e.g., if a cluster, C, needs further partitioning, a. D = the diagonal producing the maximum Standard Deviation (STD(C)) or maximum STD(C)/Spread(C). b. AM(C) (Average-to-Median) c. AFFA(C) (Avg-FurthestFromAverage) [or FFAFFF(C) (FurthestFromAvg-FurthestFromFurthest)]. d. cycle thru diagonals: e1,...,..en, e1e2.. Or cycle thru AM, AFFA, FFAFFF or cycle through both. Choose DensityThreshold(DT), DensityUniformityThreshold(DUT),Precipitous Count Change (PCC) def (PCCs include gaps). ALGORITHM: If DT (and DUT) are not exceeded at cluster, C, partition by cutting at each PCC in CoD using the nextD. FAUST Polygon Prediction (FP 2) for 1-class or multi-class classification. Let Xn+1= Class label column, C. For each vector, D, let lD,kminCkoD (or the1st Precipitous_Count_Increase=PCI?); hD,k=hD,kmaxCkoD (or the last PCD?). ALGORITHM: y is declared to be class=k iff yHullk where Hullk={z| lD,k  Doz  hD,k all D}. (If y is in multiple hulls, Hi1..Hih, y isa Ck for the k maximizing OneCount{PCk&PHi..&PHih} or fuzzy classify using those OneCounts as k-weights) Outlier Mining can mean: 1. Given a set of n objects and given a k, find the top k objects in terms of dissimilarity from the rest of the objects. 1.a This could mean the k object, xh, (h=1..k) most dissimilar [distant from] their individual complements, X-{xh}, or 1.b The top "set of k objects“, Sk, for which that set is most dissimilar from its complement, X-Sk. 2. Given a Training Set, identify outliers in each class (correctly classified but noticeably dissimilar to fellow class members). 3. Determine "fuzzy" clusters, i.e., assign a weight for each (object, cluster) pair. (A dendogram does that to some extent.). Note: FC3 is a good outlier detector, since it identifies and removes large clusters so small clusters (outliers) appear. FAUST Distance Analytics useSPTS of a distance valueTree. e.g., SquareDistanceToNearestNeighbor (D2NN) FAUST Outlier Observer (FO2) uses D2NN. (L2 or Euclidean distance best, but L (EIN) works too.) D2NN provides an instantaneous k-slider for 1.a (Find k objects, x, most dissimilar from X-{x}. It’s useful for the others too.