Download presentation
Presentation is loading. Please wait.
Published byMari Kivelä Modified over 5 years ago
1
Review for test #2 Fundamentals of ANN Dimensionality reduction Genetic algorithms
2
HW #3 Boolean OR Linear discriminant wTx = x1+x2-0.5 = 0
Classes are linearly separable x1 x2 r w0 <0 w0=-0.5 w2 + w0 >0 w1= 1 w w0 >0 w2= 1 w1 + w2 + w0>0 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 2
3
XOR in feature space with Gaussian kernels f1 = exp(-||X – [1,1]T||2)
X f1 f2 (1,1) (0,1) (0,0) (1,0) XOR in feature space with Gaussian kernels This transformation puts examples (0,1) and (1,0) at the same point in feature space
4
(0,0) and (1,1) are at the same point
Consider hidden units zh as features Choose wh so that in feature space (0,0) and (1,1) are at the same point z2 z1 feature space attribute space
5
Design criteria for hidden layer
x1 x2 r z1 z2 0 0 0 ~0 ~0 0 1 1 ~0 ~1 1 0 1 ~1 ~0 1 1 0 ~0 ~0 whTx < 0 → zh ~ 0 whTx > 0 → zh ~ 1
6
Find weights for design criteria
x1 x2 z1 w1Tx required choice 0 0 ~0 <0 w0 <0 w0=-0.5 0 1 ~0 <0 w2 + w0 <0 w2= -1 1 0 ~1 >0 w1 + w0 >0 w1= 1 1 1 ~0 <0 w1 + w2 + w0<0 x1 x2 z2 w2Tx required choice 0 0 ~0 <0 w0 <0 w0=-0.5 0 1 ~1 >0 w2 + w0 >0 w2= 1 1 0 ~0 <0 w1 + w0 <0 w1= -1 1 1 ~0 <0 w1 + w2 + w0>0
7
Training a neural network by back-propagation
Initialize weights randomly How is the adjustment of weights related to the difference between output and target?
8
Approaches to Training
Online: weights updated based on training-set examples seen one by one in random order Batch: weights updated based on whole training set after summing deviations from individual examples How can we tell the difference? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8
9
online or batch? Update = learning factor ∙ output error ∙ input
10
Multivariate nonlinear regression with multilayer perceptron
Backward Forward x Can you express Et as an explicit function of whj? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
11
Batch mode x Backward Forward
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
12
Why do some sums go and others stay?
zh vih yi xj whj Total error in connection zh to output Update = learning factor ∙ output error ∙ input Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
13
Back propagation for perceptron dichotomizer
Like the sum of squared residuals, cross entropy depends on weights w through yt More complex yt dependence and yt = sigmoid(wTx) Same primciples apply
14
Review of protein biology
Central dogma of biology
15
Dogma on protein function
Proteins are polymers of amino acids The sequence of amino acids determines a protein’s shape (folding pattern) The shape of a protein determines its function In natural selection, which changes fastest, protein sequence or protein shape?
16
chemical properties of amino acids
17
Which of the amino acids V and S is more likely to be found the core of a protein sturcture?
18
Dimensionality Reduction by Auto-Association hidden layer smaller than input, output required to reproduce input “Reconstruction error” Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 18
19
In validation and test sets reconstruction error will not zero
How do we make use of this? “Reconstruction error”
20
Linear and Non-linear Data Smoothing
Examples (blue: original, red: smoothed):
21
PCA brings back an old friend
Find w1 such that w1TSw1 is maximum subject to constraint ||w1|| = w1Tw1 = 1 Maximize L = w1TSw1 + c(w1Tw1 – 1) gradient of L = 2Sw1+ 2cw1 = 0 Sw1 = -cw1 w1 is an eigenvector of covariance matrix let c = -l1 l1 is eigenvalue associate with w1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 21
22
A simple example constrained optimization using Lagrange multipliers
find the stationary points of f(x1, x2) = 1 - x12 – x22 subject to the constraint g(x1, x2) = x1 + x2 - 1 = 0
23
Form the Lagrangian L(x, l) = 1-x12-x22 +l(x1+x2-1)
24
-2x1 + l = 0 -2x2 + l = 0 x1 + x2 -1 = 0 Solve for x1 and x2 HOW?
Set the partial derivatives of L with respect to x1, x2, and l equal to zero L(x, l) = 1-x12-x22 +l(x1+x2-1) -2x1 + l = 0 -2x2 + l = 0 x1 + x2 -1 = 0 Solve for x1 and x2 HOW? -2x1 + l = 0 -2x2 + l = 0 x1 + x2 -1 = 0 Solve for x1 and x2
25
x1* = x2* = ½ contours of f(x1,x2)
In this case, not necessary to find l l sometimes called “undetermined multiplier”
26
Application of characteristic polynomial
calculate eigenvalues of det(A - lI) = det( ) (3-l)(3-l) –1 = l2 – 6l +8 = 0 by quadratic formula l1 = 4 and l2 = 2 Not a practical way to calculate eigenvalues of S
27
In PCA, don’t confuse eigenvalues and principal components
Are these eigenvalues principal components? 2.8856 1.9068 0.7278 0.5444 0.4238 0.3501 0.1631 d = 9 k = 1,2,…
28
Data Projected onto Principal Components 1st and 2nd
How was this figure constructed?
29
Principal Components Analysis (PCA)
If w is unit vector, then z=wTx is the projection of x in the direction of w. Note z=wTx = xTw = w1x1 + w2x2 + … is a scalar Use projection to find a low-dimension feature space where the essential information in the data is preserved. Accomplish this by finding features z such that Var(z) is maximal (i.e. spread the data out) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29
30
Method to select chromosomes for refinement
Calculate fitness f(xi) for each chromosome in population Assigned each chromosome a discrete probability by Use pi to design a roulette wheel How do we spin the wheel?
31
Spinning the roulette wheel
Divide number line between 0 and 1 into segments of length pi in a specified order Get r, random number uniformly distributed between 0 and 1 Choose the chromosome of the line segment containing r Similarly for decisions about crossover and mutations Crossover probability = 0.75 Mutation probability = 0.002
32
Sigma scaling allows variable selection pressure
Sigma scaling of fitness f(x) m and s are the mean and standard deviation of fitness in the population In early generations, selection pressure should be low to enable wider coverage of search space (large s) In later generations selection pressure should be higher to encourage convergence to optimum solution (small s)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.