Deep Learning with TensorFlow www.easy-tensorflow.com www.github.com/easy-tensorflow
Easy-TensorFlow Team Aryan Mohammad Jahandar www.easy-tensorflow.com
Outline 1st Lecture (9:15 a.m. - 10:30 a.m.) Intro. to Machine Learning Intro. to TensorFlow 2nd Lecture (10:45 a.m. - 12:00 p.m.) Neural Network 3rd Lecture (1:15 p.m. - 2:30 p.m.) Neural Network in TensorFlow & Keras Visualization in TensorBoard 4th Lecture (2:45 p.m. - 4:00 p.m.) Convolutional Neural Network (CNN) CNNs in TensorFlow/Keras
Deep Learning with TensorFlow Lecture 1: Introduction to Machine Learning & TensorFlow www.easy-tensorflow.com
Machine Learning Design machines that automatically learn from data and experience
Machine Learning restores the color of old black & white photos
Machine Learning Google Translate "reads" the text and replaces it with a text in English in real-time
Machine Learning Generating new photos
Machine Learning Self-driving cars
Machine Learning Playing video games
Machine Learning Playing video games
Machine Learning Healthcare
Machine Learning Generate Music
Deep Learning A class of Machine Learning algorithms Multiple cascade processing stages Each stage learns a representation of the data Oleksiy Ivakhnenko
What is TensorFlow? Created by researchers at Google “TensorFlow™ is an open source software library for numerical computation using data flow graphs.” “… software library for Machine Intelligence” TensorFlow has APIs available in several languages (Python, C++, Java, etc.) The Python API is at present the most complete and the easiest to use
Companies using TensorFlow
Why TensorFlow? Developed and maintained by Google Very large and active community + Nice documentation Python API Multi-GPU support Tensorboard (A very Powerful visualization tool) Faster model compilation than Theano-based options High level APIs built on top of TensorFlow (such as Keras and TFlearn)
How to set it up?! Python – programming language Anaconda - package manager (Optional; instead of installing Python directly) TensorFlow IDE – software application (preferably PyCharm) http://www.easy-tensorflow.com/tf-tutorials/install
Intro to TensorFlow What is a Tensor? Importing the library Multi-dimensional array 0-d tensor: scalar (number) 1-d tensor: vector 2-d tensor: matrix Importing the library import tensorflow as tf Key feature:"computational graph" approach Part 1: building the graph which represents the data flow of the computations Part 2: running a session which executes the operations in the graph TensorFlow separates definition of computations from their execution
Graph and Session Graph Nodes = operations
Graph and Session Graph Nodes = operations Edges = Tensors
Graph and Session Graph Nodes = operations Edges = Tensors Session
Graph and Session Example 1: Graph import tensorflow as tf c = tf.add(2, 3, name='Add') print(c) TF automatically names the nodes when you don’t explicitly name them. x = 3 y = 5
? Graph and Session Example 1: Graph import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') print(c) ? Tensor("Add:0", shape=(), dtype=int32) Variables TF automatically names the nodes when you don’t explicitly name them. x = 3 y = 5
Graph and Session Example 1: Graph Variables 5 import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') sess = tf.Session() print(sess.run(c)) sess.close() Variables 5 Create a session, assign it to variable sess so we can call it later Within the session, evaluate the graph to fetch the value of c
Graph and Session Example 1: Graph Variables 5 import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5
Graph and Session Example 2: Graph Variables import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') with tf.Session() as sess: pow_out = sess.run(pow_op) Variables
Graph and Session Example 3: Graph Variables import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') useless_op = tf.multiply(x, add_op, name='Useless') with tf.Session() as sess: pow_out = sess.run(pow_op) Variables
Graph and Session Example 3: Graph Variables import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') useless_op = tf.multiply(x, add_op, name='Useless') with tf.Session() as sess: pow_out, useless_out = sess.run([pow_op, useless_op]) Variables
Data types 1. Constants are used to create constant values tf.constant(value, dtype=None, shape=None, name='Const', verify_shape=False) Example: s = tf.constant(2, name='scalar') m = tf.constant([[1, 2], [3, 4]], name='matrix')
Data types 1. Constants are used to create constant values Graph Before: Graph import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5
Data types 1. Constants are used to create constant values Graph Now: import tensorflow as tf a = tf.constant(2, name='A') b = tf.constant(3, name='B') c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5
Data types 2. Variables are stateful nodes (=ops) which output their current value They Can be saved and restored Gradient updates will apply to all variables in the graph get_variable( name, shape=None, dtype=None, initializer=None, regularizer=None, trainable=True, collections=None, caching_device=None, partitioner=None, validate_shape=True, use_resource=None, custom_getter=None, constraint=None) ⇒ Network Parameters (weights and biases) Example: s1 = tf.get_variable(name='scalar1', initializer=2) s2 = tf.get_variable(name='scalar2', initializer=tf.constant(2)) m = tf.get_variable('matrix', initializer=tf.constant([[0, 1], [2, 3]])) M = tf.get_variable('big_matrix', shape=(784, 10), initializer=tf.zeros_initializer()) W = tf.get_variable('weight', shape=(784, 10), initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.01)) meaning that they can retain their value over multiple executions of a graph.
Data types 2. Variables Graph Variables import tensorflow as tf # create graph a = tf.get_variable(name="A", initializer=tf.constant([[0, 1], [2, 3]])) b = tf.get_variable(name="B", initializer=tf.constant([[4, 5], [6, 7]])) c = tf.add(a, b, name="Add") # launch the graph in a session with tf.Session() as sess: # now we can run the desired operation print(sess.run(c)) Variables FailedPreconditionError: Attempting to use uninitialized value meaning that they can retain their value over multiple executions of a graph.
Data types Graph Variables import tensorflow as tf # create graph a = tf.get_variable(name="A", initializer=tf.constant([[0, 1], [2, 3]])) b = tf.get_variable(name="B", initializer=tf.constant([[4, 5], [6, 7]])) c = tf.add(a, b, name="Add") # Add an Op to initialize variables init_op = tf.global_variables_initializer() # launch the graph in a session with tf.Session() as sess: # run the variable initializer sess.run(init_op) # now we can run the desired operation print(sess.run(c)) [[ 4 6] [ 8 10]] Graph Variables meaning that they can retain their value over multiple executions of a graph.
Data types 3. Placeholders are nodes whose value is fed in at execution time. ⇒ - Assemble the graph without knowing the values needed for computation - We can later supply the data at the execution time. ⇒ Input data (in classification task: Inputs and labels) tf.placeholder(dtype, shape=None, name=None) a = tf.placeholder(tf.float32, shape=[5]) b = tf.placeholder(dtype=tf.float32, shape=None, name=None) X = tf.placeholder(tf.float32, shape=[None, 784], name='input') Y = tf.placeholder(tf.float32, shape=[None, 10], name='label') meaning that they can retain their value over multiple executions of a graph.
Data types 3. Placeholders Graph Variables import tensorflow as tf a = tf.constant([5, 5, 5], tf.float32, name='A') b = tf.placeholder(tf.float32, shape=[3], name='B') c = tf.add(a, b, name="Add") with tf.Session() as sess: print(sess.run(c)) Variables You must feed a value for placeholder tensor 'B' with dtype float and shape [3] meaning that they can retain their value over multiple executions of a graph.
Data types 3. Placeholders Graph Variables [6. 7. 8.] import tensorflow as tf a = tf.constant([5, 5, 5], tf.float32, name='A') b = tf.placeholder(tf.float32, shape=[3], name='B') c = tf.add(a, b, name="Add") with tf.Session() as sess: # create a dictionary: d = {b: [1, 2, 3]} # feed it to the placeholder print(sess.run(c, feed_dict=d)) Variables [6. 7. 8.] meaning that they can retain their value over multiple executions of a graph.
Example meaning that they can retain their value over multiple executions of a graph.
Deep Learning with TensorFlow Lecture 2: Classification using a Neural Network www.easy-tensorflow.com
Neural Network MODEL Output Input data meaning that they can retain their value over multiple executions of a graph.
Neural Network meaning that they can retain their value over multiple executions of a graph.
MNIST Data 28 28
Logistic Classifier (linear classifier) Set of N labeled inputs: D = {(X1, y1), …, (XN, yN)} WX+b = y 0.0 1.5 0.7 0.4 0.1 0.2 3.5 0.02 0.09 0.04 0.03 0.70 SOFTMAX Weight (784,10) Bias (1,10) (Superscript: index of elements) 28 (1,784) Linear model: takes the input, applies a linear function to generate its predictions 28 Logits Probs.
Logistic Classifier (linear classifier) 0.0 1.5 0.7 0.4 0.1 0.2 3.5 0.02 0.09 0.04 0.03 0.70 1 SOFTMAX WXn+b Cross-entropy (Superscript: index of elements) Logits (yn) Probs. (S(yn)) One-hot encoded labels (Ln) Input (Xn)
Logistic Classifier (linear classifier) 0.02 0.09 0.04 0.03 0.70 1 Cross-entropy … Probs. (S(yn)) Yn=WXn+b One-hot encoded labels (Ln)
Gradient Descent Batch Gradient descent: Learning rate Batch Gradient descent: Learning rate calculate the gradients for the whole dataset to perform just one update -> computationally expensive can be very slow and is intractable for datasets that don't fit in memory Instead of computing this loss, we’re going to estimate it Stochastic gradient descent, feed one example and update, loss func. Fluctuates a lot! - enables it to jump to new and potentially better local minima complicates convergence to the exact minimum, as SGD will keep overshooting. when we slowly decrease the learning rate, SGD shows the same convergence behaviour as batch gradient descent
Gradient Descent Mini-batch Gradient descent: Learning rate Mini-batch Gradient descent: Learning rate Instead of computing this loss, we’re going to estimate it Mini-batch gradient descent: takes the best of both worlds Take a small small batch of training samples randomly, compute L, compute derivative, and pretend that this derivative is the right direction to choose. (sometimes it’s not the correct direction and increases the loss). We’re going to compensate for that by running this procedure many many times. One challenge: find proper learning rate
Gradient Descent Momentum: Take advantage of accumulated knowledge Keep a running average of the gradients An overview of gradient descent optimization algorithms
Neural Network Introduce nonlinearity: Sigmoid non-linearity squashes real numbers to range between [0,1] Sigmoids saturate and kill gradients Sigmoid outputs are not zero-centered -> zig-zagging dynamics ReLU: Alexnet, accelerates convergence, less expensive operation, simpler derivative
Neural Network
Neural Network #parameters = 200x784 + 200 + 10x200 + 10 =159,010
Neural Network
Neural Network
Neural Network b = tf.get_variable(‘bias', initializer=tf.zeros(200)) W = tf.get_variable('weight', shape=(784, 200), initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.01)) X = tf.placeholder(tf.float32, shape=[None, 784], name='input') Y = tf.placeholder(tf.float32, shape=[None, 10], name='label')
Deep Learning with TensorFlow Lecture 3: TensorBoard www.easy-tensorflow.com
Tensorboard is a flashlight for our Neural Net's black box. 1.What does the network graph look like? 2.What is the best network configuration? 3.How does the data look in high dimension?
What does the network graph look like? Understanding connection of nodes and layers
What does the network graph look like? Visualizing multiple runs simultaneously
How does the data look in high dimension? Understanding relationship between samples
Deep Learning with TensorFlow Lecture 4: Classification using a Convolutional Neural Network www.easy-tensorflow.com
Feed-forward Neural Network (NN)
NN problems: 1. Doesn’t use data structure! Translation invariance
NN problems: CNNs: Solution: Weight sharing 1. Doesn’t use data structure! W Solution: Weight sharing W CNNs: NNs that share their parameters across space
NN problems: 2. Doesn’t scale well to full images 784 Units 500 Units #parameters = 784 x 500 + 500 = 392K !!!
NN problems: Solution: Weight sharing + 3D volume of neurons 2. Doesn’t scale well to full images Solution: Weight sharing + 3D volume of neurons Sharing the same set of weights (and biases)
Layers used to build CNNs
Convolution Layer What is convolution? 1. slide 2. multiply a function derived from two given functions by integration that expresses how the shape of one is modified by the other. 1. slide 2. multiply 3. integrate (i.e. sum)
Convolution Layer - Spatial dimensions: 32x32 - Depth: 3 Feature maps (R+G+B)
Convolution Layer (Filter = Kernel = Patch) Convolve the filter with the image i.e. “slide over the image spatially, Computing dot products, And sum over all”
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer 14,455,392 !!! vs 456
Max-Pooling Layer - To reduce the spatial dimension of feature maps
Max-Pooling Layer - To reduce the spatial dimension of feature maps