Download presentation
Presentation is loading. Please wait.
1
ECE 5424: Introduction to Machine Learning
Topics: Neural Networks Backprop Readings: Murphy 16.5 Stefan Lee Virginia Tech
2
Recap of Last Time (C) Dhruv Batra
3
Not linearly separable data
Some datasets are not linearly separable!
4
Addressing non-linearly separable data – Option 1, non-linear features
Choose non-linear features, e.g., Typical linear features: w0 + i wi xi Example of non-linear features: Degree 2 polynomials, w0 + i wi xi + ij wij xi xj Classifier hw(x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin
5
Addressing non-linearly separable data – Option 2, non-linear classifier
Choose a classifier hw(x) that is non-linear in parameters w, e.g., Decision trees, neural networks,… More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin
6
Biological Neuron (C) Dhruv Batra
7
Recall: The Neuron Metaphor
Neurons accept information from multiple inputs, transmit information to other neurons. Multiply inputs by weights along edges Apply some function to the set of inputs at each node Slide Credit: HKUST
8
Types of Neurons Slide Credit: HKUST Linear Neuron Logistic Neuron
Potentially more. Require a convex loss function for gradient descent training. Perceptron Slide Credit: HKUST
9
Limitation A single “neuron” is still a linear decision boundary
What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra
10
Multilayer Networks Cascade Neurons together
The output from one layer is the input to the next Each Layer has its own sets of weights Slide Credit: HKUST
11
Universal Function Approximators
Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi ’89] (C) Dhruv Batra
12
Plan for Today Neural Networks Parameter learning Backpropagation
(C) Dhruv Batra
13
Forward Propagation On board (C) Dhruv Batra
14
Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST
15
Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST
16
Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST
17
Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST
18
Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST
19
Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST
20
Gradient Computation First let’s try:
Single Neuron for Linear Regression (C) Dhruv Batra
21
Gradient Computation First let’s try: Now let’s try the general case
Single Neuron for Linear Regression Now let’s try the general case Backpropagation! Really efficient (C) Dhruv Batra
22
Neural Nets Best performers on OCR NetTalk Rick Rashid speaks Mandarin
NetTalk Text to Speech system from 1987 Rick Rashid speaks Mandarin (C) Dhruv Batra
23
Historical Perspective
(C) Dhruv Batra
24
Convergence of backprop
Perceptron leads to convex optimization Gradient descent reaches global minima Multilayer neural nets not convex Gradient descent gets stuck in local minima Hard to set learning rate Selecting number of hidden units and layers = fuzzy process NNs had fallen out of fashion in 90s, early 2000s Back with a new name and significantly improved performance!!!! Deep networks Dropout and trained on much larger corpus (C) Dhruv Batra Slide Credit: Carlos Guestrin
25
Overfitting Many many many parameters Avoiding overfitting?
More training data Regularization Early stopping (C) Dhruv Batra
26
A quick note (C) Dhruv Batra Image Credit: LeCun et al. ‘98
27
Rectified Linear Units (ReLU)
(C) Dhruv Batra
28
Convolutional Nets Basic Idea On board Assumptions:
Local Receptive Fields Weight Sharing / Translational Invariance / Stationarity Each layer is just a convolution! (C) Dhruv Batra Image Credit: Chris Bishop
29
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
30
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
31
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
32
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
33
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
34
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
35
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
36
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
37
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
38
Convolutional Nets Example:
(C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy
39
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
40
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
41
(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato
42
Visualizing Learned Filters
(C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]
43
Visualizing Learned Filters
(C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]
44
Visualizing Learned Filters
(C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]
45
Autoencoders Goal Compression: Output tries to predict input
(C) Dhruv Batra Image Credit:
46
Autoencoders Goal Learns a low-dimensional “basis” for the data
(C) Dhruv Batra Image Credit: Andrew Ng
47
Stacked Autoencoders How about we compress the low-dim features more?
(C) Dhruv Batra Image Credit:
48
Figure courtesy: Quoc Le
Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le (C) Dhruv Batra
49
Stacked Autoencoders Finally perform classification with these low-dim features. (C) Dhruv Batra Image Credit:
50
What you need to know about neural networks
Perceptron: Representation Derivation Multilayer neural nets Derivation of backprop Learning rule Expressive power
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.