Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,

Slides:

Advertisements

Similar presentations

Memristor in Learning Neural Networks

Advertisements

Slides from: Doug Gray, David Poole

Clustering Basic Concepts and Algorithms

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.

Computer Vision Lecture 18: Object Recognition II

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

K Means Clustering , Nearest Cluster and Gaussian Mixture

Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.

Self Organization: Competitive Learning

5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

Kohonen Self Organising Maps Michael J. Watts

Artificial neural networks:

Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)

Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.

Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

X0 xn w0 wn o Threshold units SOM.

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Cluster Analysis (1).

Neural Networks based on Competition

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Lecture 09 Clustering-based Learning

Radial-Basis Function Networks

Clustering Unsupervised learning Generating “classes”

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way

CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:

Chapter 4. Neural Networks Based on Competition Competition is important for NN –Competition between neurons has been observed in biological nerve systems.

Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.

Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Connectionist Luger: Artificial.

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.

Ming-Feng Yeh1 CHAPTER 16 AdaptiveResonanceTheory.

UNSUPERVISED LEARNING NETWORKS

381 Self Organization Map Learning without Examples.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

What is Unsupervised Learning? Learning without a teacher. No feedback to indicate the desired outputs. The network must by itself discover the relationship.

Unsupervised Learning Networks 主講人 : 虞台文. Content Introduction Important Unsupervised Learning NNs – Hamming Networks – Kohonen’s Self-Organizing Feature.

Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.

Vector Quantization CAP5015 Fall 2005.

Self-Organizing Maps (SOM) (§ 5.5)

Self Organizing Maps: Clustering With unsupervised learning there is no instruction and the network is left to cluster patterns. All of the patterns within.

CHAPTER 14 Competitive Networks Ming-Feng Yeh.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Soft Computing Lecture 15 Constructive learning algorithms. Network of Hamming.

Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.

J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.

Data Classification by using Artificial Neural Networks Mohammed Hamdi CS 6800 Spring 2016, WMU.

Chapter 5 Unsupervised learning

Neural Networks Winter-Spring 2014

Data Mining, Neural Network and Genetic Programming

Instance Based Learning

Unsupervised Learning Networks

Other Applications of Energy Minimzation

Creating fuzzy rules from numerical data using a neural network

Synaptic Dynamics: Unsupervised Learning

Kohonen Self-organizing Feature Maps

Self-Organizing Maps (SOM) (§ 5.5)

The Naïve Bayes (NB) Classifier

Artificial Neural Networks

Unsupervised Networks Closely related to clustering

Presentation transcript:

Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification, evaluation, interesting features, etc. It must learn these by itself! Tasks –Clustering - Group patterns based on similarity (Focus of this lecture) –Vector Quantization - Fully divide up S into a small set of regions (defined by codebook vectors) that also helps cluster P. –Probability Density Approximation - Find small set of points whose distribution matches that of P. –Feature Extraction - Reduce dimensionality of S by removing unimportant features (i.e. those that do not help in clustering P)

Weight Vectors in Clustering Networks Node k represents a particular class of input vectors, and the weights into k encode a prototype/centroid of that class. So if prototype(class(k)) = [i k1, i k2, i k3, i k4 ], then: w km = f e (i km ) for m = 1..4, where f e is the encoding function. In some cases, the encoding function involves normalization. Hence w km = f e (i k1 …i k4 ). The weight vectors are learned during the unsupervised training phase k w k1 w k4 w k3 w k2

Network Types for Clustering Winner-Take-All Networks –Hamming Networks –Maxnet –Simple Competitive Learning Networks Topologically Organized Networks –Winner & its neighbors “take some”

Hamming Networks Given: a set of patterns m patterns, P, from an n-dim input space, S. Create: a network with n input nodes and m simple linear output nodes (one per pattern), where the incoming weights to the output node for pattern p is based on the n features of p. i pj = jth input bit of the pth pattern. i pj = 1 or -1 Set w pj = i pj /2. Also include a threshold input of -n/2 at each output node. Testing: enter an input pattern, I, and use the network to determine which member of P that is closest to I. Closeness is based on the Hamming distance (# non-matching bits in the two patterns). Given input I, the output value of the output node for pattern p = = the negative of the Hamming distance between I and p

Hamming Networks (2) Proof: (that output of output node p is the negative of the Hamming distance between p and input vector I). Assume: k bits match. Then: n - k bits do not match, and n-k is the Hamming distance. And: the output value of p’s output node is: k matches, where each match gives (1)(1) or (-1)(-1) = 1 n-k mismatches, where each gives (-1)(1) or (1)(-1) = -1 Neg. Hamming distance The pattern p* with the largest negative Hamming distance to I is thus the pattern with the smallest Hamming distance to I (i.e. the nearest to I). Hence, the output node that represents p* will have the highest output value of all output nodes: it wins!

Hamming Network Example P = {(1 1 1), ( ), (1 -1 1)} = 3 patterns of length 3 i1i1 i2i2 i3i3 p1p1 p3p3 p2p2 wgt = 1/2 wgt = -1/2 InputsOutputs Given: input pattern I = (-1 1 1) Output (p 1 ) = -1/2 + 1/2 + 1/2 - 3/2 = -1 (Winner) Output (p 2 ) = 1/2 - 1/2 - 1/2 - 3/2 = -2 Output (p 3 ) = -1/2 - 1/2 + 1/2 - 3/2 = -2 wgt = -n/2 = -3/2 1

Simple Competitive Learning Combination Hamming-like Net + Maxnet with learning of the input-to- output weights. –Inputs can be real valued, not just 1, -1. So distance metric is actually Euclidean or Manhattan, not Hamming. Each output node represents a centroid for input patterns it wins on. Learning: winner node’s incoming weights are updated to move closer to the input vector. Euclidean Maxnet

Winning & Learning ``Winning isn’t everything…it’s the ONLY thing” - Vince Lombardi Only the incoming weights of the winner node are modified. Winner = output node whose incoming weights are the shortest Euclidean distance from the input vector. Update formula: If j is the winning output node: = Euclidean distance from input vector I to the vector represented by output node k’s incoming weights k w k1 w k4 w k3 w k2 Note: The use of real-valued inputs & Euclidean distance means that the simple product of weights and inputs does not correlate with ”closeness” as in binary networks using Hamming distance.

SCL Examples (1) 6 Cases: (0 1 1)( ) ( )( ) ( ) (0 0 0) Learning Rate: 0.5 Initial Randomly-Generated Weight Vectors: [ ] [ ] Hence, there are 3 classes to be learned [ ] Training on Input Vectors Input vector # 1: [ ] Winning weight vector # 1: [ ] Distance: 0.41 Updated weight vector: [ ] Input vector # 2: [ ] Winning weight vector # 3: [ ] Distance: 0.50 Updated weight vector: [ ]

SCL Examples (2) Input vector # 3: [ ] Winning weight vector # 2: [ ] Distance: 0.86 Updated weight vector: [ ] Input vector # 4: [ ] Winning weight vector # 2: [ ] Distance: 0.27 Updated weight vector: [ ] Input vector # 5: [ ] Winning weight vector # 2: [ ] Distance: 0.25 Updated weight vector: [ ] Input vector # 6: [ ] Winning weight vector # 2: [ ] Distance: 0.83 Updated weight vector: [ ] Weight Vectors after epoch 1: [ ] [ ] [ ]

SCL Examples (3) Clusters after epoch 1: Weight vector # 1: [ ] Input vector # 1: [ ] Weight vector # 2: [ ] Input vector # 3: [ ] Input vector # 4: [ ] Input vector # 5: [ ] Input vector # 6: [ ] Weight vector # 3: [ ] Input vector # 2: [ ] Weight Vectors after epoch 2: [ ] [ ] [ ] Clusters after epoch 2: unchanged.

SCL Examples (4) 6 Cases ( ) ( ) ( ) (1 1 1) ( ) ( ) Other parameters: Initial Weights from Set: { } Learning rate: 0.5 # Epochs: 10 Run same case twice, but with different initial randomly- generated weight vectors. The clusters formed are highly sensitive to the initial weight vectors.

SCL Examples (5) Initial Weight Vectors: [ ] * All weights are medium to high [ ] [ ] Clusters after 10 epochs: Weight vector # 1: [ ] Input vector # 3: [ ] Input vector # 6: [ ] Weight vector # 2: [ ] Weight vector # 3: [ ] Input vector # 1: [ ] Input vector # 2: [ ] Input vector # 4: [ ] Input vector # 5: [ ] ** Weight vector #3 is the big winner & #2 loses completely!!

SCL Examples (6) Initial Weight Vectors: [ ] * Better balance of initial weights [ ] [ ] Clusters after 10 epochs: Weight vector # 1: [ ] Input vector # 1: [ ] Input vector # 2: [ ] Weight vector # 2: [ ] Input vector # 4: [ ] Input vector # 5: [ ] Weight vector # 3: [ ] Input vector # 3: [ ] Input vector # 6: [ ] ** 3 clusters of equal size!! Note: All of these SCL examples were run by a simple piece of code that had NO neural-net model, but merely a list of weight vectors that was continually updated.

SCL Variations Normalized Weight & Input Vectors To minimize distance, maximize: When I and W are normal vectors (i.e. Length = 1) and = IW angle Normalization of I =. W = normalized similarly. So by keeping all vectors normalized, the dot product of the input and weight vector is the cosine of the angle between the two vectors, and a high value means a small angle (i.e., the vectors are nearly the same). Thus, the node with the highest net value is the nearest neighbor to the input. Due to Normalization Distance =

Maxnet Simple network to find node with largest initial input value. Topology: clique with self-arcs, where all self-arcs have a small positive (excitatory) weight, and all other arcs have a small negative (inhibitory) weight. Nodes: have transfer function f T = max(sum, 0) Algorithm: Load initial values into the clique Repeat: Synchronously update all node values via f T Until: all but one node has a value of 0 Winner = the non-zero node e.g.

Maxnet Examples Input values: (1, 2, 5, 4, 3) with epsilon = 1/5 and theta = Input values: (1, 2, 5, 4.5, 4.7) with epsilon = 1/5 and theta = Stable attractor = (1)5 - (0.2)( )

Maxnet Examples (2) Input values: (1, 2, 5, 4, 3) with epsilon = 1/10 and theta = Input values: (1, 2, 5, 4, 3) with epsilon = 1/2 and theta = * Epsilon > 1/n with theta = 1 => too much inhibition Stable attractor