Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi
Contents Introduction to Artificial Neural Network (ANN) Self Organizing Map ANN Kohonen ANN Applications
An artificial neural network (ANN), is a mathematical model based on biological neural networks. In more practical terms neural networks are non-linear statistical data modeling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data. Introduction
The basic types of goals or problems in analytical chemistry for solution of which the ANNs can be used are the following: Election of samples from a large quantity of the existing ones for further handling. Classification of an unknown sample into a class out of several pre-defined (known in advance) number of existing classes. Clustering of objects, i.e., finding the inner structure of the measurement space to which the samples belong. Making models for predicting behaviors or effects of unknown samples in a quantitative manner.
The first thing to be aware of in our consideration of employing the ANNs is the nature of the problem we are trying to solve: Supervised or Unsupervised
The supervised problem means that the chemist has already a set of experiments with known outcomes for specific inputs at hand. In this networks, structure consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. Target Supervised Learning
The unsupervised problem means that one deals with a set of experimental data which have no specific associated answers (or supplemental information) attached. In unsupervised problems (like clustering) it is not necessary to know in advance to which cluster or group the training objects Xs belongs. The Network automatically adapts itself in such a way that the similar input objects are associated with the topological close neurons in the ANN. Unsupervised Learning
The Kohonen ANN offers considerably different approach to ANNs. The main reason is that the Kohonen ANN is a ‘self-organizing’ system which is capable to solve the unsupervised rather than the supervised problems. Kohonen Artificial Neural Networks The Kohonen network is probably the closest of all artificial neural networks architectures and learning schemes to the biological neuron network
As a rule, the Kohonen type of net is based on a single layer of neurons arranged in a two- dimensional plane having a well defined topology A defined topology means that each neuron has a defiend number of neurons as nearest neighbors, second-nearest neighbor, etc.
The neighborhood of a neuron is usually arranged either in squares or in hexagon. In the Kohonen conception of neural networks, the signal similarity is related to the spatial (topological) relation among neurons in the network.
The Kohonen learning concept tries to map the input so that similar signals excite neurons that are very close together. Competitive Learning
1 st step : an m-dimensional object Xs enters the network and only one neuron from those in the output layer is selected after input occurs, the network selects the winner “c” (central) according to some criteria. “c” is the neuron having either: the largest output in the entire network
2 nd step : After finding the neuron c, its weight vector are corrected to make its response closer to input. WWWW 3 rd step : The weight of neighboring neurons must be corrected as well. These corrections are usually scaled down, depending on the distance from c. Beside decreasing with increasing the distance from c, it decreases with each iteration step. (learning rate)
XSXS Input (1×i ) 5×5 i a max dcdc d Triangular a max dcdc d Mexican hat
4 th step : After the correction have been made the weights should be normalized to a constant value, usually 1. 5 th step : The next object X s is input and the process repeated. After all objects are input once, one epoch is completed.
×4× Input vector output Winner
Input vector × 0.9× 0.8×0.9× 0.6×0.9× × 0.4×0.9 a max =0.9 a min =0.1 t=1 (first epoch) Neighbor function: Linear winner d
Top Map After the training process accomplished, the complete set of the training vectors is once more run through the KANN. In this last run the labeling of the neurons excited by the input vector is made into the table called top map. e XSXS d c b a Trained KANN e db c a Top Map
Weight Map The number of weights in each neuron is equal to the dimension m of the input vector. Hence, in each level of weight only data of one specific variable are handled. Trained KANN X S Input Vector LLLL LLL H HHHH HHH Top Map
Toroidal Topology Kohonen Map toroid W 3 rd layer of neighbor neurons
Analytical Applications Anal. Chem.2007, 79, Linking Databases of Chemical Reactions to NMR Data: an Exploration of 1H NMR-Based Reaction Classification Changes in the 1H NMR spectrum of a mixture and their interpretation in terms of chemical reactions taking place. Classification of photochemical and metabolic reactions by Kohonen self-organizing maps is demonstrated Difference between the 1H NMR spectra of the products and the reactants as a descriptor of reaction was introduced as input vector to Kohonen self organizing map. Classification and Reaction monitoring
Dataset: Photochemical cycloadditions. This was partitioned into a training set of 147 reactions and a test set of 42 reactions, all manually classified into seven classes. The 1H NMR spectra were simulated from the molecular structures by SPINUS. The input variables: Reaction descriptors derived from 1H NMR spectra. Topology: toroidal 13×13 and 15×15 for photochemical reactions and 29×29 for metabolic reactions. Neighbor Scaling function: Linear decreasing triangular with learning rate of 0.1 to 0 with epoch Winning neuron selection criteria: Euclidean distance.
Toroidal top map of a 14×14 Kohonen self- organizing map Classes of Photochemical Reactions After the predictive models for the classification of chemical reactions were established on the basis of simulated NMR data, their applicability to reaction data from mixed sources (experimental and simulated) was evaluated.
A second dataset : 911 metabolic reactions catalyzed by transferases classified into eight subclasses according to the Enzyme Commission (E.C.) system. resulting surface for such a SOM, each neuron colored according to the Enzyme Commission subclass of the reactions activating it, that is, the second digit of the EC number.
For photochemical reactions, The percentage of correct classifications obtained for the training and test sets by SOMs. Correct predictions could be achieved for 94-99% of the training set and for 81-88% of the test set. For metabolic reactions, 94-96% of correct predictions for SOMs. The test set was predicted with 66-67% of accuracy by individual SOMs.
QSAR & QSTR Current Computer-Aided Drug Design, 2005, 1, Kohonen Artificial Neural Network and Counter Propagation Neural Network in Molecular Structure- Toxicity Studies A general problem in QSAR modeling is the selection of most relevant descriptors. Molecule m Descriptors For n molecules Activity n×m n×1 Analytical Applications
Descriptor clustering n molecule m descriptor Data n molecule m descriptor Data input n×1 m×1 Calibration and test set Selection
References Chem.Int.Lab.sys. 38 (1997) 1-23 Neural Networks For Chemists, An Introduction. (Weinheim/VCH Publishers ) Anal. Chem. 2007, 79, Current Computer-Aided Drug Design, 2005, 1, Acta Chimica Slovenica 1994, pp
Thanks