Coding neural networks: A gentle Introduction to keras

Coding neural networks: A gentle Introduction to keras

Why do I want to learn about neural networks?
They work! Nuclei Segmentation (Caicedo et al. 2018) Automated Motif Discovery (Koo and Eddy 2018) Protein Subcellular Localization Classification (Kraus et al. 2017)

Lesson Outline and Learning Goals
How do I preprocess data? Learn about image scaling, one-hot vector representations Learn about test/train splits and evaluation. How do I build a convolutional neural network? Learn about the building blocks of a neural network: filters, downsampling, layers, etc. How do I train a neural network? Learn about loss functions, gradient descent, and parameters. Overall goals: Learn how to code with Keras and Tensorflow Understand how to build, train, and feed data in and out of a neural network Understand the components of a neural network conceptually

Keras Keras is a open source neural network library for Python
It’s designed to use Tensorflow (and other packages) as a back-end Developed with a focus on user-friendliness and extensibility, automatically interacts with CPU/GPU Coding neural networks used to be a LOT harder: Keras makes it easy and accessible! As you’ll see today, you can build your own neural network with just a few lines of code. Documentation:

Open the “step1_preprocessing.py” file now…
Preprocessing data Open the “step1_preprocessing.py” file now…

Fashion-MNIST We’ll be working with a toy dataset today called Fashion-MNIST Prepackaged with Keras This dataset contains 70,000 images of 10 classes of clothes/shoes/bags Each image is 1 channel, and 28 x 28

Task 1: Explore the images in the dataset
Run step1_preprocessing.py as is. This will give you a pop up screen of the first 25 images in the dataset. How are the images stored numerically? Hint – hover your mouse over the images and observe how the number reported in the bottom left of the pop up window change.

Training, validation, and test sets
You’ll notice that I have split the dataset into training, validation, and test sets This is an important practice when working with supervised neural networks Just because the CNN gets good results on the training dataset doesn’t mean it will get good results on all datasets (“overfitting”) To be more confident we have a general solution, we evaluate performance on images that the CNN has never seen Training dataset: the CNN learns a model using these images Validation dataset: the CNN is not shown these images; this dataset is used to ensure the model is not overfit DURING training Test dataset: the CNN is not shown these images; this dataset is used to ensure the model is not overfit AFTER training

Task 2: Explore the labels in the dataset
In your console, you should see various values printed – these are the labels for the first 10 training images before and after processing. How are the labels stored before processing? How are the labels stored after processing? What are these lines of code doing to the labels?

One-hot vector encoding
This format for storing labels is called a one-hot vector encoding. This is the format that we will require the neural network to output its predictions as. Our classes are discrete and independent: we don’t want to have the neural network predict integers linearly (what does a 5.5 mean?) With a one-hot encoding, a neural network can predict probabilities of each class. [0., 0., 0.04, 0., 0.40, 0., 0.56, 0., 0., 0.] “I’m 56% sure this is a shirt!”

Task 3: Preprocess the images
CNNs work best when the input data is scaled between 0 and 1 (e.g. stored as a floating point) Right now, the images are stored as integers between 0 and 255 Under this line, add in some lines to scale both test and train images between 0 and 1 Hint: There’s multiple ways to do this! Hint: With some ways, you may have to convert the images to float first. Check your work by running the script and making sure the images are scaled properly.

Task 3 Solutions: Two different ways
Bonus points: How are these two strategies different?

Open the “step2_building_cnn.py” file now…
Building a model Open the “step2_building_cnn.py” file now…

What are image convolutions and filters?
Filters (also called kernels) are small matrices of weights that are applied on images in a sliding window (convolutions). By applying filters to images, we get feature maps. These are transformations of the image that show how much the filter is activated. The above filters detect edges in the image.

Introducing some terminology…
Depth: The number of filters we use. The left image uses two filters. Neural networks will often use a lot of filters to generate hundreds or thousands of feature maps at each network layer! Stride: How much the filter “jumps” from step to step. For example, a stride of 2 would move the filter 2 pixels for each step. The stride affects how big the feature map will be. Padding: We often want to control the size of our feature maps. To do this, we can add a border to the image, so that the filter can “see” the edges.

CNNs are composed of layers of filters
A CNN will consist of multiple layers of many filters. The first layer will calculate features on the original image, the second layer will calculate features on the output of the first layer… These filters are random to begin with, but a CNN will learn to improve them as training proceeds (we’ll discuss how later.)

Activation Functions Each layer in a CNN will output feature maps showing how much filters are activated in a linear way We transform this into a non-linear activation using an activation function (analogy: action potentials in biology) Usually, ReLU functions work the best and are a good default for most cases

Putting this all together in code…
We’ll add a layer to our neural network… How many filters do we want? What’s the size of each filter? (Here, it’s 3x3) Padding (Same is fine in most cases) Our activation function

Task 1: Add another convolutional layer to our neural network
I started creating a CNN for you: I declared a model in Keras I added a convolutional layer I added a max-pooling layer (we’ll discuss this next) Task: add another convolutional layer to this CNN Use 64 filters Use a kernel size of 2x2 When you’re done, run the code to output your model: “same” padding “relu” activation

Max-Pooling Layers Between convolutions, we often apply max pooling on feature maps. Pooling lets us fit more layers and filters into our memory. Usually, we want to increase the depth of layers as we go deeper. Pooling reduces the number of parameters in the network, controlling overfitting. Most of the time, close spatial features are somewhat redundant – we don’t lose a lot of important information by pooling.

Task 2: Add a maximum pooling layer
Immediately after the convolutional layer you just added, add a maximum pooling layer. Use a pooling size of 2 – this will tell the CNN to pool in a 2x2 block. Hint: Look at the code above to understand how it’s done! Run the code and make sure your model looks like this:

Fully connected layers
Convolutional layers slide over images in a sliding window, so they only see your kernel size in pixels at once (local features) In contrast, fully connected layers are layers where each neural sees EVERY pixel at once (global features) We usually use fully connected layers at the very end to sum up all information, because they’re very expensive in terms of number of parameters

Task 3: Add two fully connected layers
After the flattening layer, add two fully connected layers: Run the code again to see your model. Why are there so many more parameters than before now? Note that we will use softmax activation for the last fully connected layer, because this is intended to be our output The softmax function will translate the output into a nice multi-class probability prediction

You’ve learned how to build a neural network!
We’ve kept this CNN very simple because we don’t have a GPU and we have to limit the number of parameters. If you have a GPU, you can just increase the number of layers and filters. Very deep networks can be harder to train: Further reading: Deep Residual Learning for Image Recognition (He et al. 2015) – Arxiv This type of architecture only works for classification. Further reading: U-Net Convolutional Networks for Biomedical Image Segmentation (Ronneberger et al 2015) - Arxiv

Compiling and Training
Open the “step3_compile_and_train.py” file now…

What are loss functions?
To train a CNN, we need to define a loss function, which is a measurement of how different the prediction of the CNN is from our ground truth. For this task, we will use the cross-entropy loss, which will penalize the model more with larger divergences of the prediction from the ground truth. Cross-entropy loss is appropriate for classification, but different problems require different losses Further reading: loss_functions/

How do we learn from the loss function?
The derivative of the loss function lets us understand how much we need to tweak the parameters (or weights) of the filters in our CNN. By applying the chain rule, we can back- propogate this derivative to determine the local gradient for each parameter The learning rate controls how much we’ll update the parameters in respect to the gradient

Batch Size and Epochs Two other important parameters are the batch size and the epochs The batch size controls how many images we’ll input into the network for training at once. The weights are updated for each batch, so larger batches stabilize training However, our batch size is constrained by memory The epochs control how many passes we do over the entire dataset during training We want to pick enough epochs that the model converges to a good solution However, too many epochs will take too much time and may result in overfitting as the model memorizes data Data augmentation can combat the second issue

Task: Play around with parameters and understand the code
To train a model, run the step3_compile_and_train.py script. Your goal is to find parameters that increase the validation and test accuracy: try adjusting the learning rate, batch size, and epochs. What line of code starts the training, and what variables need to be given? How does adjusting these parameters affect run time? When you’re satisfied with your model, run step4_load_and_evaluate_saved_model.py to visualize some of the predictions.

Further reading for training
We won’t implement these today, but there are some common techniques to improve the training process – these are all available as simple, user-friendly implementations in Keras: Adaptive learning rates during training: Adam: A Method for Stochastic Optimization (Kingma and Ba 2014) Dropping out neurons randomly during training: Dropout: A simple way to prevent neural networks from overfitting (Srivastava et al ) Normalizing feature maps during training: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe and Szegedy 2015)

Saving and Loading Models
Keras provides nice and easy functions to save and load trained weights: You can either re-initialize the model architecture from scratch (done in the code that I have given you), or save the model architecture using the to_json() function (check out the Keras documentation for details).

What can I do with this knowledge?
Today, I’ve covered how to build classifiers with CNNs, but they are not just limited to that! Generative models of cells: Generative Modeling with Conditional Autoencoders: Building an Integrated Cell (Johnson et al. 2017) GANs for Biological Image Synthesis (Osokin et al. 2017) Classification of cells using whole image labels: Classifying and Segmenting Microscopy Images using Convolutional Multiple Instance Learning (Kraus et al. 2016) Identifying motifs (sequences): Convolutional neural network architectures for predicting DNA– protein binding (Zeng et al. 2016) Representation Learning of Genomic Sequence Motifs with Convolutional Neural Networks (Koo and Eddy 2018) Unsupervised Representation Learning: Weakly Supervised Learning of Single Cell Feature Embeddings (Caicedo et al ) Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting (Lu et al. 2018) Segmenting Images: U-Net: Convolutional Networks for Biomedical Image Segmentation (Ronneberger et al. 2015) Adapting Mask-RCNN for Automatic Nucleus Segmentation (Johnson 2018) Detecting Mitosis in Histology Images: Mitosis detection in breast cancer histology images with deep neural networks (Ciresan et al. 2013)

Questions?

Coding neural networks: A gentle Introduction to keras

Similar presentations

Presentation on theme: "Coding neural networks: A gentle Introduction to keras"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Coding neural networks: A gentle Introduction to keras

Similar presentations

Presentation on theme: "Coding neural networks: A gentle Introduction to keras"— Presentation transcript:

Similar presentations

About project

Feedback