Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Slides:



Advertisements
Similar presentations
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Advertisements

Clustering Beyond K-means
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 2nd Edition
Clustering II.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Clustering Unsupervised learning Generating “classes”
Radial Basis Function Networks
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Lecture 2: Statistical learning primer for biologists
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Flat clustering approaches
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
Review for final exam 2015 Fundamentals of ANN RBF-ANN using clustering Bayesian decision theory Genetic algorithm SOM SVM.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Machine Learning 12. Local Models.
CH 5: Multivariate Methods
Classification of unlabeled data:
Clustering Evaluation The EM Algorithm
Probabilistic Models with Latent Variables
INTRODUCTION TO Machine Learning
A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
Machine Learning – a Probabilistic Perspective
Test #1 Thursday September 20th
EM Algorithm and its Applications
Review for test #3 Radial basis functions SVM SOM.
Presentation transcript:

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set

Example of Radial Basis Function (RBF) network Input vector d dimensions K radial basis functions Single output Structure used for multivariate regressions or binary classification

Review: RBF network provides alternative to back propagation Each hidden node is associated with a cluster of input instances Hidden layer connected to the output by linear least squares Gaussians are the most frequently used radial basis function  j (x) = exp(-½(|x-  j |/  j ) 2 ) Clusters of input instances are parameterized by a mean and variance

Linear least squares with basis functions Given training set and the mean and variance of K clusters of input data, construct the NxK matrix D and column vector r. Add a column of ones to include a bias node. Solve normal equations D T Dw = D T r for a vector w of K weights connecting hidden nodes to output node

RBF networks perform best with large datasets With large datasets, expect redundancy (i.e. multiple examples expressing the same general pattern) In RBF network, hidden layer is a feature-space representation of the data where redundancy has been used to reduce noise. A validation set may be helpful to determine K, the best number clusters of input data

6 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Supervised learning: mapping input to output Unsupervised learning: find regularities in the input regularities reflect some probability distribution of attribute vectors, p(x t ) discovering p(x t ) called “density estimation” parametric method uses MLE to find  in p(x t |  ) In clustering, we look for regularities as group membership assume we know the number of clusters, K given K and dataset X, we want to find the size of each group P(G i ) and its component density p(x|G i ) Background on clustering

7 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Define group labels based on nearest center Get new trial centers Find group labels using the geometric interpretation of a cluster as points in attribute space closer to a “center” than they are to data points not in the cluster Define trial centers by reference vectors m j j = 1…k Judge convergence by K-Means Clustering: hard labels

K-means clustering pseudo code 8 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

9 Example of pseudo code application

Example of K-means with arbitrary starting centers and convergence plot 10 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Convergence

K-means is an example of the Expectation- Maximization (EM) approach to MLE 11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Log likelihood of mixture model cannot be solved analytically for  Use a 2-step iterative method: E-step: estimate labels of x t given current knowledge of mixture components M-step: update component knowledge using labels from E-step

12 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) E - step M - step K-means clustering pseudo code with steps labeled

Given converged K-means centers, estimate variance for RBFs by  2 = d 2 max /2K, where d max is the largest distance between clusters. Gaussian mixture theory is another approach to getting RBFs Application of K-means clustering to RBF-ANN

14 X ={x t } t is made up of K groups (clusters) P(G i ) proportion of X in group i attributes in each group are Gaussian distributed p(x t |G i ) = N d ( μ i, ∑ i )  i  means of x t in group i  i covariance matrix of x t in group i Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Distribution of attributes is mixture of Gaussians Gaussian Mixture Densities

Given a group label for each data point, r i t, MLE provides estimates of parameters of Gaussian mixtures where p ( x | G i ) ~ N ( μ i, ∑ i ) Φ = {P (G i ), μ i, ∑ i } i=1 to k 15Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Estimators

p(x) = N ( μ, σ 2 ) MLE for μ and σ 2 : 16 μ σ Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 1D Gaussian distribution

17 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Mahalanobis distance: (x – μ) T ∑ –1 (x – μ) analogous to (x-  ) 2 /  2 x -  is column vector dx1  is dxd matrix M-distance is a scalar Measures distance of x from mean in units of  d denotes number of attributes dD Gaussian distribution

If x i are independent, offdiagonals of ∑ are 0, p(x) is product of probabilities for each component of x 18 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

19 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Replace the hard labels, r i t, by soft label, h i t, the probability that x t belongs to cluster i. Assume that cluster densities p(x t |  ) are Gaussian, then mixture proportions, means and covariance matrix are estimated by where h i t are soft labels from previous E-step Gaussian mixture model by EM: soft labels

20 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Initialize by k-means clustering. After a few iterations, use centers m i and instances covered by each center to estimate the covariance matrices S i and mixture proportions  i From m i, S i, and  i, calculate h i t, soft labels by Calculate new proportions, centers and covariance by Use these to calculate new soft labels Gaussian mixture model by EM: soft labels

21 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) K-means Hard labels Centers marked EM Gaussian mixtures with soft labels Contours show 1 standard deviation Colors show mixture proportions

22 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) k-means hard lables

23 P(G 1 |x)=0.5 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Data points color coded by greater soft label Contours show  +  of Gaussian densities Dashed contour is “separating” curve Gaussian mixtures; soft labels x marks cluster mean Outliers?

24 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) In applications of Gaussian mixtures to RBFs, correlation of attributes is ignored and diagonal elements of the covariance matrix are equal. In this approximation Mahalanobis distance reduces to Euclidence distance. Variance parameter of radial basis function becomes a scalar

Cluster based on similarities (distances) Distance measure between instances x r and x s Minkowski (L p ) (Euclidean for p = 2) City-block distance 25 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Hierarchical Clustering

Start with N groups each with one instance and merge the two closest groups at each iteration Distance between two groups G i and G j : Single-link: smallest distance between all possible pairs of attributes Complete-link: largest distance between all possible pairs of attributes Average-link, distance between centroids 26 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Agglomerative Clustering

27 Dendrogram Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) At height h > sqrt(2) and < 2, dendrogram has the 3 clusters shown on data graph At h > 2 dendrogram shows 2 clusters. c, d, and f are one cluster at this distance Example: single-linked clusters

Application specific Plot data (after PCA, for example) and check for clusters Add one at a time using validation set 28 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Choosing K (how many clusters?)