GhostMiner Wine example Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland ISEP Porto,

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

ECG Signal processing (2)

DECISION TREES. Decision trees  One possible representation for hypotheses.

Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

An Introduction of Support Vector Machine

Support Vector Machines

Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.

Chapter 4: Linear Models for Classification

Discriminative and generative methods for bags of features

Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland.

K-separability Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Torun, Poland School of Computer Engineering, Nanyang Technological.

Fuzzy rule-based system derived from similarity to prototypes Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Poland School.

Coloring black boxes: visualization of neural network decisions Włodzisław Duch School of Computer Engineering, Nanyang Technological University, Singapore,

Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

A Posteriori Corrections to Classification Methods Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,

Support Feature Machine for DNA microarray data Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

CS Instance Based Learning1 Instance Based Learning.

Feature selection based on information theory, consistency and separability indices Włodzisław Duch, Tomasz Winiarski, Krzysztof Grąbczewski, Jacek Biesiada,

Data Mining Techniques

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Chapter 9 Neural Network.

IJCNN 2012 Competition: Classification of Psychiatric Problems Based on Saccades Włodzisław Duch 1,2, Tomasz Piotrowski 1 and Edward Gorzelańczyk 3 1 Department.

Computational Intelligence: Methods and Applications Lecture 37 Summary & review Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

_____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Computational Intelligence: Methods and Applications Lecture 36 Meta-learning: committees, sampling and bootstrap. Włodzisław Duch Dept. of Informatics,

Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Data Mining and Decision Support

Computational Intelligence: Methods and Applications Lecture 21 Linear discrimination, linear machines Włodzisław Duch Dept. of Informatics, UMK Google:

Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:

Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang

Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Data Transformation: Normalization

Support Feature Machine for DNA microarray data

Computational Intelligence: Methods and Applications

Data Mining, Neural Network and Genetic Programming

Department of Informatics, Nicolaus Copernicus University, Toruń

Computational Intelligence: Methods and Applications

K Nearest Neighbor Classification

Tomasz Maszczyk and Włodzisław Duch Department of Informatics,

Projection of network outputs

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

CSE572, CBS572: Data Mining by H. Liu

Fuzzy rule-based system derived from similarity to prototypes

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

Heterogeneous adaptive systems

CSE572: Data Mining by H. Liu

A task of induction to find patterns

Presentation transcript:

GhostMiner Wine example Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland ISEP Porto, 8-12 July 2002

GhostMiner Philosophy There is no free lunch – provide different type of tools for knowledge discovery. Decision tree, neural, neurofuzzy, similarity-based, committees. Provide tools for visualization of data. Support the process of knowledge discovery/model building and evaluating, organizing it into projects. GhostMiner, data mining tools from our lab. Separate the process of model building and knowledge discovery from model use => GhostMiner Developer & GhostMiner Analyzer.

GM summary Ghost Miner combines 4 basic tools for predictive data mining and understanding of data, avoiding too many choices of parameters (like network structure specs): IncNet ontogenic neural network using Kalman filter learning separating each class from all other classes;IncNet ontogenic neural network using Kalman filter learning separating each class from all other classes; Feature Space Mapping neurofuzzy system producing logical rules of crisp and fuzzy types.Feature Space Mapping neurofuzzy system producing logical rules of crisp and fuzzy types. Separability Split Value decision tree.Separability Split Value decision tree. Weighted nearest neighbor method.Weighted nearest neighbor method. K-classifiers and committees of models.K-classifiers and committees of models. MDS visualizationMDS visualization

Wine data example alcohol content ash content magnesium content flavanoids content proanthocyanins phenols content OD280/D315 of diluted wines malic acid content alkalinity of ash total phenols content nonanthocyanins phenols content color intensity hue proline. Chemical analysis of wine from grapes grown in the same region in Italy, but derived from three different cultivars. Task: recognize the source of wine sample. 13 quantities measured, continuous features:

Exploration and visualization Load data (using load icon) and look at general info about the data.

Exploration: data Inspect the data itself in the raw form.

Exploration: data statistics Look at distribution of feature values Note that Proline has very large values, therefore the data should be standardized before further processing.

Exploration: data standardized Standardized data: unit standard deviation, about 2/3 of all data should fall within [mean-std,mean+std] Other options: normalize to fit in [-1,+1], or normalize rejecting some extreme values.

Exploration: 1D histograms Distribution of feature values in classes Some features are more useful than the others.

Exploration: 1D/3D histograms Distribution of feature values in classes, 3D

Exploration: 2D projections Projections (cuboids) on selected 2D Projections on selected 2D

Visualize data Relations in more than 3D are hard to imagine. SOM mappings: popular for visualization, but rather inaccurate, no measure of distortions. Measure of topographical distortions: map all X i points from R n to x i points in R m, m < n, and ask: How well are R ij = D(X i, X j ) distances reproduced by distances r ij = d(x i,x j ) ? Use m = 2 for visualization, use higher m for dimensionality reduction.

Visualize data: MDS Multidimensional scaling: invented in psychometry by Torgerson (1952), re-invented by Sammon (1969) and myself (1994) … Minimize measure of topographical distortions moving the x coordinates.

Visualize data: Wine The green outlier can be identified easily. 3 clusters are clearly distinguished, 2D is fine.

Decision trees Test single attribute, find good point to split the data, separating vectors from different classes. DT advantages: fast, simple, easy to understand, easy to program, many good algorithms. 4 attributes used, 10 errors, 168 correct, 94.4% correct. Simplest things first: use decision tree to find logical rules.

Decision borders Multivariate trees: test on combinations of attributes, hyperplanes. Result: feature space is divided into cuboids. Wine data: univariate decision tree borders for proline and flavanoids Univariate trees: test the value of a single attribute x < a.

Separability Split Value (SSV) SSV criterion: select attribute and split value that maximizes the number of correctly separated pairs from different classes; if several equivalent split values exist select one that minimizes the number of pairs split from the same class. Works on raw data, including symbolic values. Search for splits using best-first or beam-search method. Tests are A(x) < T or x  {s i } Create tree that classifies all data correctly. Use crossvalidation to determine how many node to prune or what should be the pruning level.

Wine – SSV 5 rules Lower pruning leads to more complex tree. 7 nodes, corresponding to 5 rules; 10 errors, mostly Class2/3 wines mixed; check the confusion matrix in “results”.

Wine – SSV optimal rules Various solutions may be found, depending on the search: 5 rules with 12 premises, making 6 errors, 6 rules with 16 premises and 3 errors, 8 rules, 25 premises, and 1 error. if OD280/D315 >  proline >  color > then class 1 if OD280/D315 >  proline >  color < then class 2 if OD280/D  malic-acid < 2.82 then class 2 if OD280/D315 >  proline < then class 2 if OD280/D315 <  hue < then class 3 if OD280/D  malic-acid > 2.82 then class 3 What is the optimal complexity of rules? Use crossvalidation to estimate generalization.

Neurofuzzy systems MLP: discrimination, finds separating surfaces as combination of sigmoidal functions. Fuzzy approach: define MF replacing  x (no/yes) by a degree  x . Typically triangular, trapezoidal, Gaussian... MF are used.  X const M.f-s in many dimensions are constructed using products to determine the threshold of  X const. Advantage: easy to add a priori knowledge (proper bias); may work well for very small datasets!

Feature Space Mapping Describe the joint prob. density p(X,C). Neural adaptation using RBF-like algorithms. Good for logical rules and NN predictive models. Feature Space Mapping (FSM) neurofuzzy system. Find best network architecture (number of nodes and feature selection) using an ontogenic network (growing and shrinking) with one hidden layer. Use separable rectangular, triangular, Gaussian MF. Initialize using clusterization techniques. Allow for rotation of Gaussian functions.

Wine – FSM rules Complexity of rules depends on desired accuracy. Use rectangular functions for crisp rules. Optimal accuracy may be evaluated using crossvalidation. FSM discovers simpler rules, for example: if proline > then class 1 (48 cases, 45 correct, 2 recovered by other rules). if color < then class 2 (63 cases, 60 correct) SSV: hierarchical rules FSM: density estimation with feature selection.

IncNet Transfer functions: Gaussians or combination of sigmoids (bi-central functions). Training: use Kalman filter approach to estimate network parameters. Fast Kalman filter training is usually sufficient. Always creates one network per class, separating it from other samples. Creates predictive models equivalent to fuzzy rules. Incremental Neural Network (IncNet). Ontogenic NN with single hidden layer, adding, removing and merging neurons.

k-nearest neighbors Similarity functions include Minkovsky and similar functions. Optimize k, the number of neighbors included. Optimize the scaling factors of features W i |X i -Y i |: this goes beyond feature selection. Use search-based techniques to find good scaling parameters for features. Notice that: For k=1 always 100% on the training set is obtained! To evaluate accuracy on training use leave-one-out procedure. Use various similarity functions to evaluate how similar new case is to all reference (training) cases, use p(C i |X) = k(C i )/k.

Committees and K-classifiers Committees: combine results from different classification models: create different models using the same method (for example decision tree) on different data samples (bootstraping); combine several different models, including other committees, into one model; use majority voting to decide on the predicted class. No rules, but stable and accurate classification models. K-classifiers: in K-class problems create K classifiers, one for each class.

SummarySummary Please get your copy from Ghost Miner combines 4 basic tools for predictive data mining and understanding of data. GM includes K-classifiers and committees of models. GM includes MDS visualization/dimensionality reduction. Model building is separated from model use. GM provides tools for easy testing of statistical accuracy. Many new classification models are coming.