Computer Science and Engineering, Seoul National University

Computer Science and Engineering, Seoul National University
Using Neural Network Hanock Kwak Biointelligence Lab Computer Science and Engineering, Seoul National University

Test, Valid, and Train Data
Training Set: this data set is used to adjust the weights on the neural network. Validation Set: this data set is used to minimize overfitting. (ex: model selection, early stopping) Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

Early Stopping To prevent overfitting, we stop training in some period. for each epoch for each training data instance propagate error through the network adjust the weights calculate the accuracy over the validation data if validation accuracy decreased by some threshold exit training else continue training

Adjusting Learning Rate
Decreasing learning rate improves overall performance. for each epoch for each training data instance propagate error through the network adjust the weights calculate the accuracy over the validation data if validation accuracy decreased by some threshold if learning rate is too low exit training else decrease learning rate continue training

Cross Validation Evaluate the model strictly with cross validation.

Optimizer Instead of performing naive gradient descent methods, use some improved optimizers. Adagrad Adadelta momentum rmsprop Adam (recommended) ...

Regularizer Another way of preventing overfitting is to regularize your parameters L1 (or L2) regularizer

Data Preprocessing Do not let your model do everything.
Do some preprocessing to make the model easier to learn. preprocessed data

Data Normalization Normalizing the data make neural network easier to learn Zero-centered Similar scale for each dimension

Weight Initialization
All zero initialization: Bad. If every neuron in the network computes the same output, then they will also all compute the same gradients during backpropagation and undergo the exact same parameter updates. Small random numbers: w = np.random.randn(n) / sqrt(n)

Batch Normalization Performs the normalization for each training mini-batch. Reduces internal covariate shift. Accelerates learning process.

See More Information! http://cs231n.github.io/
These notes accompany the Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition.

Supplementary Notes on Convolutional Neural Networks

Why Convolutional Layers?
A filter slides over the input image to produce a feature map. It extracts some type of features overall the images. Filters depicted by weights

Hierarchical Features
In general, the more convolution steps we have, the more complicated features our network will be able to learn to recognize

Max Pooling Spatial Pooling (also called subsampling or downsampling) reduces the dimensionality of each feature map but retains the most important information.

Zero Padding Ensure that the window can be sliding from scratch to the end of the tail.

All Convolution Nets Remove fully connected layers
Replace max-pooling with convolution with stride 2 Similar or better performance & Lower computation

Transposed Convolution (Deconvolution)
Inverse operation of convolution

Residual Network Adding linear jump gate helps very deep neural networks (50+ layers) to work well.

Computer Science and Engineering, Seoul National University

Similar presentations

Presentation on theme: "Computer Science and Engineering, Seoul National University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Science and Engineering, Seoul National University

Similar presentations

Presentation on theme: "Computer Science and Engineering, Seoul National University"— Presentation transcript:

Similar presentations

About project

Feedback