Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised learning

Similar presentations


Presentation on theme: "Unsupervised learning"— Presentation transcript:

1 Unsupervised learning
Unsupervised learning is the process of finding structure, patterns or correlation in the given data. we distinguish: Unsupervised Hebbian learning Principal component analysis Unsupervised competitive learning Clustering Data compression PCA not treated: Oja’s rule performs normalized Hebbian learning w_i (n+1) = w_i(n) + \eta y(n)(x_i(n) – y(n)w_i(n)) UCL is treated 20-Nov-18 Rudolf Mak TU/e Computer Science

2 Unsupervised Competitive Learning
In unsupervised competitive learning the neurons take part in some competition for each input. The winner of the competition and sometimes some other neurons are allowed to change their weights In simple competitive learning only the winner is allowed to learn (change its weight). In self-organizing maps other neurons in the neighborhood of the winner may also learn. Requires lateral inhibition: output neurons that fire inhibit others from firing If a single neuron fires this is a form of classification. Each class Determines a cluster of input vectors. The weight associated with each output neuron can be interpreted as a representing the whole Cluster. Question: How many cluster does the data contain? SOM also retain topological properties of the data. 20-Nov-18 Rudolf Mak TU/e Computer Science

3 Applications Speech Recognition OCR, e.g. handwritten characters
Image compression (using code-book vectors) Texture maps Classification of cloud pattern (cumulus etc.) Contextual maps 20-Nov-18 Rudolf Mak TU/e Computer Science

4 Network topology For simple competitive learning the network con-
sists of a single layer of linear neurons each con- nected to all inputs. Lateral inhibition not indicated 20-Nov-18 Rudolf Mak TU/e Computer Science

5 Definition of the Winner
There are various criteria to define which neuron i becomes the winner of the competition for input x: When the weights are normalized these criteria are identical as can be seen from the equation 20-Nov-18 Rudolf Mak TU/e Computer Science

6 Training Set A training set for unsupervised learning consists
only of input vectors (no targets!) Given a network with weight matrix W the training set can be partitioned into clusters Xi according to the classification made by the network 20-Nov-18 Rudolf Mak TU/e Computer Science

7 Simple Competitive Learning (incremental version)
Reinforcement learning (compare perceptron rule) 0 < alpha < 1 This technique is sometimes called ‘the winner takes it all’ 20-Nov-18 Rudolf Mak TU/e Computer Science

8 Convergence (incremental version)
Unless the learning parameter tends to 0, the incremental version of simple compe- titive learning does not convergence In absence of convergence the weight vectors oscillate around the centers of their clusters 20-Nov-18 Rudolf Mak TU/e Computer Science

9 Simple Competitive Learning (batch version)
20-Nov-18 Rudolf Mak TU/e Computer Science

10 Cluster Means Let ni be the number of element in cluster Xi.
Then we define the mean mi of cluster i by Hence in the batch version the weight is given by So the weights of the winning neuron are moved in the direction of the mean of its cluster. 20-Nov-18 Rudolf Mak TU/e Computer Science

11 Data Compression The final value of the weight vectors are some-
times called code-book vectors. This nomencla- ture stems from data compression applications. Compress (encode) Map vector x to code-word i = win (W, x) Decompress (decode) Map code-word i to code-book vector wi which is presumably close to the original vector x Note that this is a form of lossy data compression Compression to speed-up data-transport. Not encryption. 20-Nov-18 Rudolf Mak TU/e Computer Science

12 Convergence (batch version)
In the batch version of simple competitive learning the weight vector wi can be shown to converge to the mean of the input vectors that have i as winning neuron In fact the batch version is a gradient des-cent method that converges to a local minimum of a suitably chosen error func-tion 20-Nov-18 Rudolf Mak TU/e Computer Science

13 Error Function For a network with weight matrix W and training set we define the error function E(W) by Let , then 20-Nov-18 Rudolf Mak TU/e Computer Science

14 Gradients of the Error functions
Because It follows that the gradient of the error in the i-th cluster is given by 20-Nov-18 Rudolf Mak TU/e Computer Science

15 Minima are Cluster Means
After termination of the learning algorithm all gradients are zero, i.e. for all i, 1 · i · k , So after learning the weight vectors of the non-empty clusters have converged to the mean vectors of those clusters. Note that learning stops in a local minimum, so better clusters may exist. 20-Nov-18 Rudolf Mak TU/e Computer Science

16 Dead neurons & minima input 2 7 w1 w2 E winner 1 3 >11 3.5 local
2 7 w1 w2 E winner 1 3 >11 3.5 local 1/3 global - <-3 Geval dan w1 = (0+2+7)/3 |w2 – 7| > |w1 -7| 20-Nov-18 Rudolf Mak TU/e Computer Science

17 K-means clustering as SCL
K-means clustering is a popular statistical method to organize multi-dimensional data into K groups. K-means clustering can be seen as an instance of simple competitive learning, where each neuron has its own learning rate. 20-Nov-18 Rudolf Mak TU/e Computer Science

18 SCL (batch version) 20-Nov-18 Rudolf Mak TU/e Computer Science

19 Move learning factor outside repetition
20-Nov-18 Rudolf Mak TU/e Computer Science

20 Set individual learning rate
20-Nov-18 Rudolf Mak TU/e Computer Science

21 Split such that 20-Nov-18 Rudolf Mak TU/e Computer Science

22 Eliminate 20-Nov-18 Rudolf Mak TU/e Computer Science

23 Introduce separate cluster variables
20-Nov-18 Rudolf Mak TU/e Computer Science

24 Reuse mj :K-means clustering I
20-Nov-18 Rudolf Mak TU/e Computer Science

25 K-means Clustering II 20-Nov-18 Rudolf Mak TU/e Computer Science

26 Convergence of K-means Clustering
The convergence proof of the K-means clustering algorithms involves showing two facts Reassigning a vector to a different cluster does not increase the error function Updating the mean of a cluster does not increase the error function 20-Nov-18 Rudolf Mak TU/e Computer Science

27 Reassigning a vector Assume vector x(p) moves from cluster j to clus-
ter i. Then it follows that Hence 20-Nov-18 Rudolf Mak TU/e Computer Science

28 Updating the Mean of a Cluster
Consider cluster Xi with old mean and new mean 20-Nov-18 Rudolf Mak TU/e Computer Science

29 Non-optimal Stable Clusters
3 6 8 10 20-Nov-18 Rudolf Mak TU/e Computer Science


Download ppt "Unsupervised learning"

Similar presentations


Ads by Google