Download presentation
Presentation is loading. Please wait.
Published byAnthony Washington Modified over 6 years ago
1
Computer Science and Engineering, Seoul National University
Using Neural Network Hanock Kwak Biointelligence Lab Computer Science and Engineering, Seoul National University
2
Test, Valid, and Train Data
Training Set: this data set is used to adjust the weights on the neural network. Validation Set: this data set is used to minimize overfitting. (ex: model selection, early stopping) Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.
3
Early Stopping To prevent overfitting, we stop training in some period. for each epoch for each training data instance propagate error through the network adjust the weights calculate the accuracy over the validation data if validation accuracy decreased by some threshold exit training else continue training
4
Adjusting Learning Rate
Decreasing learning rate improves overall performance. for each epoch for each training data instance propagate error through the network adjust the weights calculate the accuracy over the validation data if validation accuracy decreased by some threshold if learning rate is too low exit training else decrease learning rate continue training
5
Cross Validation Evaluate the model strictly with cross validation.
6
Optimizer Instead of performing naive gradient descent methods, use some improved optimizers. Adagrad Adadelta momentum rmsprop Adam (recommended) ...
7
Regularizer Another way of preventing overfitting is to regularize your parameters L1 (or L2) regularizer
8
Data Preprocessing Do not let your model do everything.
Do some preprocessing to make the model easier to learn. preprocessed data
9
Data Normalization Normalizing the data make neural network easier to learn Zero-centered Similar scale for each dimension
10
Weight Initialization
All zero initialization: Bad. If every neuron in the network computes the same output, then they will also all compute the same gradients during backpropagation and undergo the exact same parameter updates. Small random numbers: w = np.random.randn(n) / sqrt(n)
11
Batch Normalization Performs the normalization for each training mini-batch. Reduces internal covariate shift. Accelerates learning process.
12
See More Information! http://cs231n.github.io/
These notes accompany the Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition.
13
Supplementary Notes on Convolutional Neural Networks
14
Why Convolutional Layers?
A filter slides over the input image to produce a feature map. It extracts some type of features overall the images. Filters depicted by weights
15
Hierarchical Features
In general, the more convolution steps we have, the more complicated features our network will be able to learn to recognize
16
Max Pooling Spatial Pooling (also called subsampling or downsampling) reduces the dimensionality of each feature map but retains the most important information.
17
Zero Padding Ensure that the window can be sliding from scratch to the end of the tail.
18
All Convolution Nets Remove fully connected layers
Replace max-pooling with convolution with stride 2 Similar or better performance & Lower computation
19
Transposed Convolution (Deconvolution)
Inverse operation of convolution
20
Residual Network Adding linear jump gate helps very deep neural networks (50+ layers) to work well.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.