CS623: Introduction to Computing with Neural Nets (lecture-15)

Slides:



Advertisements
Similar presentations
Associative Learning Memories -SOLAR_A
Advertisements

Introduction to Neural Networks Computing
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
Lecture 7: Principal component analysis (PCA)
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Project reminder Deadline: Monday :00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday during.
© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
WEEK 8 SYSTEMS OF EQUATIONS DETERMINANTS AND CRAMER’S RULE.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
IE 585 Competitive Network – Learning Vector Quantization & Counterpropagation.
Matrix Notation for Representing Vectors
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Clustering by soft-constraint affinity propagation: applications to gene- expression data Michele Leone, Sumedha and Martin Weight Bioinformatics, 2007.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
CS623: Introduction to Computing with Neural Nets (lecture-16) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
REVIEW Linear Combinations Given vectors and given scalars
Chapter 7. Classification and Prediction
Principal Component Analysis
ECE 3301 General Electrical Engineering
Hopfield net and Traveling Salesman problem
Principle Component Analysis (PCA) Networks (§ 5.8)
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
Regression.
Pushpak Bhattacharyya Computer Science and Engineering Department
Fitting Curve Models to Edges
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Principal Component Analysis (PCA)
Principal Component Analysis and Linear Discriminant Analysis
Contrasts & Statistical Inference
Word Embedding Word2Vec.
Machine Learning Math Essentials Part 2
CS623: Introduction to Computing with Neural Nets (lecture-4)
EE513 Audio Signals and Systems
CS623: Introduction to Computing with Neural Nets (lecture-9)
Multivariate Statistical Methods
Generally Discriminant Analysis
Numerical Analysis Lecture 17.
Product moment correlation
CS621: Artificial Intelligence
CS 621 Artificial Intelligence Lecture 27 – 21/10/05
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Multivariate Methods Berlin Chen
Contrasts & Statistical Inference
CS621: Artificial Intelligence Lecture 12: Counting no
Principal Component Analysis
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-14)
CS623: Introduction to Computing with Neural Nets (lecture-3)
Introduction to Regression
CS 621 Artificial Intelligence Lecture 29 – 22/10/05
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Prof. Pushpak Bhattacharyya, IIT Bombay
Contrasts & Statistical Inference
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS623: Introduction to Computing with Neural Nets (lecture-11)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS623: Introduction to Computing with Neural Nets (lecture-15) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Finding weights for Hopfield Net applied to TSP Alternate and more convenient Eproblem EP = E1 + E2 where E1 is the equation for n cities, each city in one position and each position with one city. E2 is the equation for distance

Expressions for E1 and E2

Explanatory example 1 2 3 x11 x12 X13 X21 x22 x23 x31 x32 x33 3 1 2 Fig. 1 shows two possible directions in which tour can take place Fig. 1 pos 1 2 3 x11 x12 X13 X21 x22 x23 x31 x32 x33 city For the matrix alongside, xiα = 1, if and only if, ith city is in position α

Expressions of Energy

Expressions (contd.)

Enetwork

Find row weight To find, w11,12 = -(co-efficient of x11x12) in Enetwork Search a11a12 in Eproblem w11,12 = -A ...from E1. E2 cannot contribute

Find column weight To find, w11,21 = -(co-efficient of x11x21) in Enetwork Search co-efficient of x11x21 in Eproblem w11,21 = -A ...from E1. E2 cannot contribute

Find Cross weights To find, w11,22 = -(co-efficient of x11x22) Search x11x22 from Eproblem. E1 cannot contribute Co-eff. of x11x22 in E2 (d12 + d21) / 2 Therefore, w11,22 = -( (d12 + d21) / 2)

Find Cross weights To find, w11,33 = -(co-efficient of x11x33) Search for x11x33 in Eproblem w11,33 = -( (d13 + d31) / 2)

Summary Row weights = -A Column weights = -A Cross weights = -[(dij + dji)/2], j = i ± 1 0, j>i+1 or j<(i-1)s Threshold = -2A

Interpretation of wts and thresholds Row wt. being negative causes the winner neuron to suppress others: one 1 per row. Column wt. being negative causes the winner neuron to suppress others: one 1 per column. Threshold being -2A makes it possible to for activations to be positive sometimes. For non-neighbour row and column (j>i+1 or j<i-1) neurons, the wt is 0; this is because non-neighbour cities should not influence the activations of corresponding neurons. Cross wts when non-zero are proportional to negative of the distance; this ensures discouraging cities with large distances between them to be neighbours.

Can we compare Eproblem and Enetwork? E1 has square terms (xiα)2 which evaluate to 1/0. It also has constants again evaluating to 1. Sum of square terms and constants <=n X (1+1+…n times +1) + n X (1+1+…n times +1) =2n(n+1) Additionally, there are linear terms of the form const* xiα which will produce the thresholds of neurons by equating with the linear terms in Enetwork.

Can we compare Eproblem and Enetwork? (contd.) This expressions can contribute only product terms which are equated with the product terms in Enetwork

Can we compare Eproblem and Enetwork (contd) So, yes, we CAN compare Eproblem and Enetwork. Eproblem <= Enetwork + 2n(n+1) When the weight and threshold values are chosen by the described procedure, minimizing Enetwork implies minimizing Eproblem

Principal Component Analysis

Purpose and methodology Detect correlations in multivariate data Given P variables in the multivariate data, introduce P principal components Z1, Z2, Z3, …ZP Find those components which are responsible for the biggest variation Retain them only and thereby reduce the dimensionality of the problem

Example: IRIS Data (only 3 values out of 150) ID Petal Length (a1) Petal Width (a2) Sepal Length (a3) Sepal Width (a4) Classification 001 5.1 3.5 1.4 0.2 Iris-setosa 051 7.0 3.2 4.7 1.4, Iris-versicolor 101 6.3 3.3 6.0 2.5 Iris-virginica

Training and Testing Data Training: 80% of the data; 40 from each class: total 120 Testing: Remaining 30 Do we have to consider all the 4 attributes for classification? Do we have to have 4 neurons in the input layer? Less neurons in the input layer may reduce the overall size of the n/w and thereby reduce training time It will also likely increase the generalization performance (Occam Razor Hypothesis: A simpler hypothesis (i.e., the neural net) generalizes better

The multivariate data X1 X2 X3 X4 X5… Xp x11 x12 x13 x14 x15 … x1p xn1 xn2 xn3 xn4 xn5 … xnp

Some preliminaries Sample mean vector: <µ1, µ2, µ3,…, µp> For the ith variable: µi= (Σnj=1xij)/n Variance for the ith variable: σi 2= [Σnj=1 (xij - µi)2]/ [n-1] Sample covariance: cab= [Σnj=1 ((xaj - µa)(xbj - µb))]/ [n-1] This measures the correlation in the data In fact, the correlation coefficient rab= cab/ σa σb

Standardize the variables For each variable xij Replace the values by yij = (xij - µi)/σi 2 Correlation Matrix