Coding neural networks: A gentle Introduction to keras

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Deep Convolutional Nets
Convolutional Neural Network
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Assignment 4: Deep Convolutional Neural Networks
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Welcome deep loria !.
Deep Residual Learning for Image Recognition
Unsupervised Learning of Video Representations using LSTMs
CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.
Environment Generation with GANs
Summary of “Efficient Deep Learning for Stereo Matching”
Compact Bilinear Pooling
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
ECE 5424: Introduction to Machine Learning
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson.
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Matt Gormley Lecture 16 October 24, 2016
Lecture 24: Convolutional neural networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Lecture 25: Backprop and convnets
Classification with Perceptrons Reading:
ECE 6504 Deep Learning for Perception
Lecture 5 Smaller Network: CNN
Training Techniques for Deep Neural Networks
Convolutional Networks
CS6890 Deep Learning Weizhen Cai
Neural Networks and Backpropagation
A brief introduction to neural network
Deep Learning Convoluted Neural Networks Part 2 11/13/
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
RNNs: Going Beyond the SRN in Language Prediction
Introduction to Neural Networks
Image Classification.
Tensorflow in Deep Learning
A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
CS 4501: Introduction to Computer Vision Training Neural Networks II
Logistic Regression & Parallel SGD
Very Deep Convolutional Networks for Large-Scale Image Recognition
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Smart Robots, Drones, IoT
[Figure taken from googleblog
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
LECTURE 35: Introduction to EEG Processing
Neural Networks Geoff Hulten.
On Convolutional Neural Network
Lecture: Deep Convolutional Neural Networks
LECTURE 33: Alternative OPTIMIZERS
Lip movement Synthesis from Text
Analysis of Trained CNN (Receptive Field & Weights of Network)
实习生汇报 ——北邮 张安迪.
CSC 578 Neural Networks and Deep Learning
Department of Computer Science Ben-Gurion University of the Negev
Automatic Handwriting Generation
Introduction to Neural Networks
Deep Learning Libraries
Natalie Lang Tomer Malach
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
Object Detection Implementations
An introduction to neural network and machine learning
Overall Introduction for the Lecture
Machine Learning for Cyber
Presentation transcript:

Coding neural networks: A gentle Introduction to keras

Why do I want to learn about neural networks? They work! Nuclei Segmentation (Caicedo et al. 2018) Automated Motif Discovery (Koo and Eddy 2018) Protein Subcellular Localization Classification (Kraus et al. 2017)

Lesson Outline and Learning Goals How do I preprocess data? Learn about image scaling, one-hot vector representations Learn about test/train splits and evaluation. How do I build a convolutional neural network? Learn about the building blocks of a neural network: filters, downsampling, layers, etc. How do I train a neural network? Learn about loss functions, gradient descent, and parameters. Overall goals: Learn how to code with Keras and Tensorflow Understand how to build, train, and feed data in and out of a neural network Understand the components of a neural network conceptually

Keras Keras is a open source neural network library for Python It’s designed to use Tensorflow (and other packages) as a back-end Developed with a focus on user-friendliness and extensibility, automatically interacts with CPU/GPU Coding neural networks used to be a LOT harder: Keras makes it easy and accessible! As you’ll see today, you can build your own neural network with just a few lines of code. Documentation: https://keras.io/

Open the “step1_preprocessing.py” file now… Preprocessing data Open the “step1_preprocessing.py” file now…

Fashion-MNIST We’ll be working with a toy dataset today called Fashion-MNIST Prepackaged with Keras This dataset contains 70,000 images of 10 classes of clothes/shoes/bags Each image is 1 channel, and 28 x 28

Task 1: Explore the images in the dataset Run step1_preprocessing.py as is. This will give you a pop up screen of the first 25 images in the dataset. How are the images stored numerically? Hint – hover your mouse over the images and observe how the number reported in the bottom left of the pop up window change.

Training, validation, and test sets You’ll notice that I have split the dataset into training, validation, and test sets This is an important practice when working with supervised neural networks Just because the CNN gets good results on the training dataset doesn’t mean it will get good results on all datasets (“overfitting”) To be more confident we have a general solution, we evaluate performance on images that the CNN has never seen Training dataset: the CNN learns a model using these images Validation dataset: the CNN is not shown these images; this dataset is used to ensure the model is not overfit DURING training Test dataset: the CNN is not shown these images; this dataset is used to ensure the model is not overfit AFTER training

Task 2: Explore the labels in the dataset In your console, you should see various values printed – these are the labels for the first 10 training images before and after processing. How are the labels stored before processing? How are the labels stored after processing? What are these lines of code doing to the labels?

One-hot vector encoding This format for storing labels is called a one-hot vector encoding. This is the format that we will require the neural network to output its predictions as. Our classes are discrete and independent: we don’t want to have the neural network predict integers linearly (what does a 5.5 mean?) With a one-hot encoding, a neural network can predict probabilities of each class. [0., 0., 0.04, 0., 0.40, 0., 0.56, 0., 0., 0.] “I’m 56% sure this is a shirt!”

Task 3: Preprocess the images CNNs work best when the input data is scaled between 0 and 1 (e.g. stored as a floating point) Right now, the images are stored as integers between 0 and 255 Under this line, add in some lines to scale both test and train images between 0 and 1 Hint: There’s multiple ways to do this! Hint: With some ways, you may have to convert the images to float first. Check your work by running the script and making sure the images are scaled properly.

Task 3 Solutions: Two different ways Bonus points: How are these two strategies different?

Open the “step2_building_cnn.py” file now… Building a model Open the “step2_building_cnn.py” file now…

What are image convolutions and filters? Filters (also called kernels) are small matrices of weights that are applied on images in a sliding window (convolutions). By applying filters to images, we get feature maps. These are transformations of the image that show how much the filter is activated. The above filters detect edges in the image.

Introducing some terminology… Depth: The number of filters we use. The left image uses two filters. Neural networks will often use a lot of filters to generate hundreds or thousands of feature maps at each network layer! Stride: How much the filter “jumps” from step to step. For example, a stride of 2 would move the filter 2 pixels for each step. The stride affects how big the feature map will be. Padding: We often want to control the size of our feature maps. To do this, we can add a border to the image, so that the filter can “see” the edges.

CNNs are composed of layers of filters A CNN will consist of multiple layers of many filters. The first layer will calculate features on the original image, the second layer will calculate features on the output of the first layer… These filters are random to begin with, but a CNN will learn to improve them as training proceeds (we’ll discuss how later.)

Activation Functions Each layer in a CNN will output feature maps showing how much filters are activated in a linear way We transform this into a non-linear activation using an activation function (analogy: action potentials in biology) Usually, ReLU functions work the best and are a good default for most cases

Putting this all together in code… We’ll add a layer to our neural network… How many filters do we want? What’s the size of each filter? (Here, it’s 3x3) Padding (Same is fine in most cases) Our activation function

Task 1: Add another convolutional layer to our neural network I started creating a CNN for you: I declared a model in Keras I added a convolutional layer I added a max-pooling layer (we’ll discuss this next) Task: add another convolutional layer to this CNN Use 64 filters Use a kernel size of 2x2 When you’re done, run the code to output your model: “same” padding “relu” activation

Max-Pooling Layers Between convolutions, we often apply max pooling on feature maps. Pooling lets us fit more layers and filters into our memory. Usually, we want to increase the depth of layers as we go deeper. Pooling reduces the number of parameters in the network, controlling overfitting. Most of the time, close spatial features are somewhat redundant – we don’t lose a lot of important information by pooling.

Task 2: Add a maximum pooling layer Immediately after the convolutional layer you just added, add a maximum pooling layer. Use a pooling size of 2 – this will tell the CNN to pool in a 2x2 block. Hint: Look at the code above to understand how it’s done! Run the code and make sure your model looks like this:

Fully connected layers Convolutional layers slide over images in a sliding window, so they only see your kernel size in pixels at once (local features) In contrast, fully connected layers are layers where each neural sees EVERY pixel at once (global features) We usually use fully connected layers at the very end to sum up all information, because they’re very expensive in terms of number of parameters

Task 3: Add two fully connected layers After the flattening layer, add two fully connected layers: Run the code again to see your model. Why are there so many more parameters than before now? Note that we will use softmax activation for the last fully connected layer, because this is intended to be our output The softmax function will translate the output into a nice multi-class probability prediction

You’ve learned how to build a neural network! We’ve kept this CNN very simple because we don’t have a GPU and we have to limit the number of parameters. If you have a GPU, you can just increase the number of layers and filters. Very deep networks can be harder to train: Further reading: Deep Residual Learning for Image Recognition (He et al. 2015) – Arxiv This type of architecture only works for classification. Further reading: U-Net Convolutional Networks for Biomedical Image Segmentation (Ronneberger et al 2015) - Arxiv

Compiling and Training Open the “step3_compile_and_train.py” file now…

What are loss functions? To train a CNN, we need to define a loss function, which is a measurement of how different the prediction of the CNN is from our ground truth. For this task, we will use the cross-entropy loss, which will penalize the model more with larger divergences of the prediction from the ground truth. Cross-entropy loss is appropriate for classification, but different problems require different losses Further reading: https://isaacchanghau.github.io/post/ loss_functions/

How do we learn from the loss function? The derivative of the loss function lets us understand how much we need to tweak the parameters (or weights) of the filters in our CNN. By applying the chain rule, we can back- propogate this derivative to determine the local gradient for each parameter The learning rate controls how much we’ll update the parameters in respect to the gradient

Batch Size and Epochs Two other important parameters are the batch size and the epochs The batch size controls how many images we’ll input into the network for training at once. The weights are updated for each batch, so larger batches stabilize training However, our batch size is constrained by memory The epochs control how many passes we do over the entire dataset during training We want to pick enough epochs that the model converges to a good solution However, too many epochs will take too much time and may result in overfitting as the model memorizes data Data augmentation can combat the second issue

Task: Play around with parameters and understand the code To train a model, run the step3_compile_and_train.py script. Your goal is to find parameters that increase the validation and test accuracy: try adjusting the learning rate, batch size, and epochs. What line of code starts the training, and what variables need to be given? How does adjusting these parameters affect run time? When you’re satisfied with your model, run step4_load_and_evaluate_saved_model.py to visualize some of the predictions.

Further reading for training We won’t implement these today, but there are some common techniques to improve the training process – these are all available as simple, user-friendly implementations in Keras: Adaptive learning rates during training: Adam: A Method for Stochastic Optimization (Kingma and Ba 2014) Dropping out neurons randomly during training: Dropout: A simple way to prevent neural networks from overfitting (Srivastava et al. 2014) Normalizing feature maps during training: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe and Szegedy 2015)

Saving and Loading Models Keras provides nice and easy functions to save and load trained weights: You can either re-initialize the model architecture from scratch (done in the code that I have given you), or save the model architecture using the to_json() function (check out the Keras documentation for details).

What can I do with this knowledge? Today, I’ve covered how to build classifiers with CNNs, but they are not just limited to that! Generative models of cells: Generative Modeling with Conditional Autoencoders: Building an Integrated Cell (Johnson et al. 2017) GANs for Biological Image Synthesis (Osokin et al. 2017) Classification of cells using whole image labels: Classifying and Segmenting Microscopy Images using Convolutional Multiple Instance Learning (Kraus et al. 2016) Identifying motifs (sequences): Convolutional neural network architectures for predicting DNA– protein binding (Zeng et al. 2016) Representation Learning of Genomic Sequence Motifs with Convolutional Neural Networks (Koo and Eddy 2018) Unsupervised Representation Learning: Weakly Supervised Learning of Single Cell Feature Embeddings (Caicedo et al. 2018) Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting (Lu et al. 2018) Segmenting Images: U-Net: Convolutional Networks for Biomedical Image Segmentation (Ronneberger et al. 2015) Adapting Mask-RCNN for Automatic Nucleus Segmentation (Johnson 2018) Detecting Mitosis in Histology Images: Mitosis detection in breast cancer histology images with deep neural networks (Ciresan et al. 2013)

Questions?