Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
Self Organization of a Massive Document Collection
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Self Organization: Competitive Learning
DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.
Kohonen Self Organising Maps Michael J. Watts
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
X0 xn w0 wn o Threshold units SOM.
Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.
Dimensional reduction, PCA
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Lecture 09 Clustering-based Learning
Radial Basis Function (RBF) Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Radial Basis Function Networks
Summarized by Soo-Jin Kim
KOHONEN SELF ORGANISING MAP SEMINAR BY M.V.MAHENDRAN., Reg no: III SEM, M.E., Control And Instrumentation Engg.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Self Organizing Maps (SOM) Unsupervised Learning.
Self Organized Map (SOM)
Self-organizing Maps Kevin Pang. Goal Research SOMs Research SOMs Create an introductory tutorial on the algorithm Create an introductory tutorial on.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
NEURAL NETWORKS FOR DATA MINING
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi.
Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
381 Self Organization Map Learning without Examples.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Principal Component Analysis (PCA)
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Self-Organizing Network Model (SOM) Session 11
Ch 12. Continuous Latent Variables ~ 12
Data Mining, Neural Network and Genetic Programming
Principal Component Analysis (PCA)
Lecture 22 Clustering (3).
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Dimension reduction : PCA and Clustering
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Feature mapping: Self-organizing Maps
Artificial Neural Networks
Unsupervised Networks Closely related to clustering
Presentation transcript:

Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory Systems 65 (2003) 97–112

Visualization of a data set structure is one of the most challenging goals in data mining. In this paper, a survey of different projection techniques, linear and nonlinear, is given. Compression is possible due to the two reasons: visualization and interpretation of high-dimensional data set structure carry out with clustering of data or data reduction.  often many variables are highly correlated  their variance is smaller than the measurement noise.

Linear projection methods:  principal component analysis PCA  Pursuit projection PP This type of analysis (PCA) has been first proposed by Pearson and fully developed by Hoteling. PCA allows projection of multidimensional data onto few orthogonal features, called principal components (PCs), constructed as linear combination of original variables to maximize description of the data variance. The dimensionality reduction techniques do not always reveal clustering tendency of the data. The intent pursuit projection (PP) is to reveal the sharpest low-dimensional projection to find clusters.

PP was originally introduced by Roy and Kruskal. PP is an unsupervised technique that searches interesting low dimensional linear projections of a high-dimensional data by optimizing a certain objective function called projection index (PI). The goal of data mining (i.e. revealing data clustering tendency) should be translated into a numerical index, being a functional of the projected data distribution. This function should change continuously with the parameters defining the projection and have a large value when the projected distribution is defined to be interesting and small otherwise. In this paper, the described algorithm is used with two different projection indices

Entropy: Huber and Jones and Sibson suggested PI based on the Shannon entropy: where f(x) is a density estimate of the projected data. This index is uniquely minimized by the standard normal density. The required density estimate, f(x) can be calculated as a sum of m individual density functions (kernels), generated at any position x by each projected object:

where r parameter is estimated from the data, usually by sample standard deviation, and m is the number of data objects. where h is the so-called smoothing parameter (band width), k is a kernel function, t1, t2,..., tm denote coordinates of the projected objects.

Yenyukov index Q. According to the nearest neighbour approach proposed by Yenyukov, the clustering tendency of data can be judged based on the ratio of the mean of all inter-objects distances, D, and the average nearest neighbour distance, d, i.e.: For clustered data, Q has a large value, whereas for less clustered ones Q is small.

Nonlinear projection methods:  Kohonen self organization map SOM  Generative Topographic Maps GTM  Sammon projection  Auto-associative feed-forward networks Kohonen self-organizing maps (SOM) A Kohonen neural network is an iterative technique used to map multivariate data. The network is able to learn and display the topology of the data. When each sample is represented by n measurements (n>3), by a two or three-dimensional representation of the measurement space we can to visualize the relative position of the data points in n-space.

To compare with PCA, SOM didn’t need to data preprocessing. A Kohonen neural network maps multivariate data onto a layer of neurons arranged in a two dimensional grid. Each neuron in the grid has a weight associated with it, which is a vector of the same dimension as the pattern vectors comprising the data set. m weight level Position of neuron by excited Xs Each input m Each neuron m

the number of neurons used should be between 33% and 50% of the number of input vector in the training set. The components of each weight vector are assigned random numbers. where wi(t + 1) is the ith weight vector for the next iteration, wi(t) is the ith weight vector for the current iteration, is the learning rate function, is the neighborhood function, and xi is the sample vector currently passed to the network. The learning rate is chosen by the user as a positive real number less than 1. The decrease of the neighborhood can be scaled to be linear with time, thereby reducing the number of neurons around the winner being adjusted during each epoch.

The control parameters include: the number of epochs (iterations), grid topology and size, the neighborhood function, the neighborhood adjustment factor, the learning rate function Top map

169 training set prediction set 188 Raman spectra of six common household plastics

Generative Topographic Maps (GTM): Generative Topographic Mapping (GTM), introduced by Bishop et al. The aim of the GTM procedure is to model the distribution of data in an n- dimensional space x=[x1, x2,..., xn] in terms of a smaller number of latent variables, u=[u1, u2,..., uL].

Sammon projection: Sammon’s algorithm maps the original space onto a low-dimensional projection space in such a way that the distances among the objects in the original space are being preserved as well as possible. where dij* is the distance between two objects i and j in the original space and dij defines the distance between those objects in the reduced space. The computational time is much longer than for SOM and for new samples it is not possible to compute their coordinates in the latent space where as SOM allow that.

Auto-associative feed-forward networks (BNN): For the first time auto-associative mapping was used by Ackley et al.

Feed-forward network is usually used in supervised settings. This type of neural network is also known as a bottleneck neural network (BNN), and in the literature is often referred as nonlinear PCA. Net training is equivalent with weights’ adjustment. Weights, initialized randomly, are adjusted in each iteration to minimize the sum of squared residuals between the desired and observed output. Once the net is trained, the outputs of the nonlinear nodes in the second hidden layer serve as data coordinates in reduced data space.

Results and discussion: Data sets: Data set 1 contains 536 NIR spectra of three creams with three different concentrations of an active drug. Data set 2 contains 83 NIR spectra collected in the spectral range of 1330–2352 nm for four different quality classes of polymer products. Data set 3 contains 159 variables and 576 objects. Objects are the products of Maillard reaction of mixtures of one sugar and one or two amino acids at constant pH = 3.

Results and discussion: data set 1 containing 701 variables can very efficiently be compressed by PCA to two significant PCs

Data set2

Data set3:

the size and the color intensity of the node are proportional to the number of objects therein. The biggest node (1,1), i.e. contains 21 objects and the smallest nodes, (4,2) and (5,2) contain only one object each

sammon SOM BNN PCA

In case of Sammon projection, no real clustering tendency is observed by the Kohonen map the biggest nodes are in the corners of the map. Based on the content of Fig. 10 only, it is difficult to draw more conclusions. The results of BNN with two nodes in the ‘‘bottleneck’’ and seven nodes in mapping and de-mapping layer, respectively, reveal two classes, better separated than in SOM