Deep Learning with TensorFlow

Deep Learning with TensorFlow

Easy-TensorFlow Team Aryan Mohammad Jahandar

Outline 1st Lecture (9:15 a.m. - 10:30 a.m.)
Intro. to Machine Learning Intro. to TensorFlow 2nd Lecture (10:45 a.m. - 12:00 p.m.) Neural Network 3rd Lecture (1:15 p.m. - 2:30 p.m.) Neural Network in TensorFlow & Keras Visualization in TensorBoard 4th Lecture (2:45 p.m. - 4:00 p.m.) Convolutional Neural Network (CNN) CNNs in TensorFlow/Keras

Lecture 1: Introduction to Machine Learning & TensorFlow

Machine Learning Design machines that automatically learn from data and experience

Machine Learning restores the color of old black & white photos

Machine Learning Google Translate "reads" the text and replaces it with a text in English in real-time

Machine Learning Generating new photos

Machine Learning Self-driving cars

Machine Learning Playing video games

Machine Learning Healthcare

Machine Learning Generate Music

Deep Learning A class of Machine Learning algorithms
Multiple cascade processing stages Each stage learns a representation of the data Oleksiy Ivakhnenko

What is TensorFlow? Created by researchers at Google “TensorFlow™ is an open source software library for numerical computation using data flow graphs.” “… software library for Machine Intelligence” TensorFlow has APIs available in several languages (Python, C++, Java, etc.) The Python API is at present the most complete and the easiest to use

Companies using TensorFlow

Why TensorFlow? Developed and maintained by Google
Very large and active community + Nice documentation Python API Multi-GPU support Tensorboard (A very Powerful visualization tool) Faster model compilation than Theano-based options High level APIs built on top of TensorFlow (such as Keras and TFlearn)

How to set it up?! Python – programming language
Anaconda - package manager (Optional; instead of installing Python directly) TensorFlow IDE – software application (preferably PyCharm)

Intro to TensorFlow What is a Tensor? Importing the library
Multi-dimensional array 0-d tensor: scalar (number) 1-d tensor: vector 2-d tensor: matrix Importing the library import tensorflow as tf Key feature:"computational graph" approach Part 1: building the graph which represents the data flow of the computations Part 2: running a session which executes the operations in the graph TensorFlow separates definition of computations from their execution

Graph and Session Graph Nodes = operations

Graph and Session Graph Nodes = operations Edges = Tensors

Graph and Session Graph Nodes = operations Edges = Tensors Session

Graph and Session Example 1: Graph
import tensorflow as tf c = tf.add(2, 3, name='Add') print(c) TF automatically names the nodes when you don’t explicitly name them. x = 3 y = 5

? Graph and Session Example 1: Graph
import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') print(c) ? Tensor("Add:0", shape=(), dtype=int32) Variables TF automatically names the nodes when you don’t explicitly name them. x = 3 y = 5

Graph and Session Example 1: Graph Variables 5
import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') sess = tf.Session() print(sess.run(c)) sess.close() Variables 5 Create a session, assign it to variable sess so we can call it later Within the session, evaluate the graph to fetch the value of c

Graph and Session Example 1: Graph Variables 5
import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5

Graph and Session Example 2: Graph Variables
import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') with tf.Session() as sess: pow_out = sess.run(pow_op) Variables

import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') useless_op = tf.multiply(x, add_op, name='Useless') with tf.Session() as sess: pow_out = sess.run(pow_op) Variables

import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') useless_op = tf.multiply(x, add_op, name='Useless') with tf.Session() as sess: pow_out, useless_out = sess.run([pow_op, useless_op]) Variables

Data types 1. Constants are used to create constant values
tf.constant(value, dtype=None, shape=None, name='Const', verify_shape=False) Example: s = tf.constant(2, name='scalar') m = tf.constant([[1, 2], [3, 4]], name='matrix')

Data types 1. Constants are used to create constant values Graph
Before: Graph import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5

Data types 1. Constants are used to create constant values Graph Now:
import tensorflow as tf a = tf.constant(2, name='A') b = tf.constant(3, name='B') c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5

Data types 2. Variables are stateful nodes (=ops) which output their current value They Can be saved and restored Gradient updates will apply to all variables in the graph get_variable( name, shape=None, dtype=None, initializer=None, regularizer=None, trainable=True, collections=None, caching_device=None, partitioner=None, validate_shape=True, use_resource=None, custom_getter=None, constraint=None) ⇒ Network Parameters (weights and biases) Example: s1 = tf.get_variable(name='scalar1', initializer=2) s2 = tf.get_variable(name='scalar2', initializer=tf.constant(2)) m = tf.get_variable('matrix', initializer=tf.constant([[0, 1], [2, 3]])) M = tf.get_variable('big_matrix', shape=(784, 10), initializer=tf.zeros_initializer()) W = tf.get_variable('weight', shape=(784, 10), initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.01)) meaning that they can retain their value over multiple executions of a graph.

Data types 2. Variables Graph Variables
import tensorflow as tf # create graph a = tf.get_variable(name="A", initializer=tf.constant([[0, 1], [2, 3]])) b = tf.get_variable(name="B", initializer=tf.constant([[4, 5], [6, 7]])) c = tf.add(a, b, name="Add") # launch the graph in a session with tf.Session() as sess: # now we can run the desired operation print(sess.run(c)) Variables FailedPreconditionError: Attempting to use uninitialized value meaning that they can retain their value over multiple executions of a graph.

Data types Graph Variables
import tensorflow as tf # create graph a = tf.get_variable(name="A", initializer=tf.constant([[0, 1], [2, 3]])) b = tf.get_variable(name="B", initializer=tf.constant([[4, 5], [6, 7]])) c = tf.add(a, b, name="Add") # Add an Op to initialize variables init_op = tf.global_variables_initializer() # launch the graph in a session with tf.Session() as sess: # run the variable initializer sess.run(init_op) # now we can run the desired operation print(sess.run(c)) [[ 4 6] [ 8 10]] Graph Variables meaning that they can retain their value over multiple executions of a graph.

Data types 3. Placeholders are nodes whose value is fed in at execution time. ⇒ - Assemble the graph without knowing the values needed for computation - We can later supply the data at the execution time. ⇒ Input data (in classification task: Inputs and labels) tf.placeholder(dtype, shape=None, name=None) a = tf.placeholder(tf.float32, shape=[5]) b = tf.placeholder(dtype=tf.float32, shape=None, name=None) X = tf.placeholder(tf.float32, shape=[None, 784], name='input') Y = tf.placeholder(tf.float32, shape=[None, 10], name='label') meaning that they can retain their value over multiple executions of a graph.

Data types 3. Placeholders Graph Variables
import tensorflow as tf a = tf.constant([5, 5, 5], tf.float32, name='A') b = tf.placeholder(tf.float32, shape=[3], name='B') c = tf.add(a, b, name="Add") with tf.Session() as sess: print(sess.run(c)) Variables You must feed a value for placeholder tensor 'B' with dtype float and shape [3] meaning that they can retain their value over multiple executions of a graph.

Data types 3. Placeholders Graph Variables [6. 7. 8.]
import tensorflow as tf a = tf.constant([5, 5, 5], tf.float32, name='A') b = tf.placeholder(tf.float32, shape=[3], name='B') c = tf.add(a, b, name="Add") with tf.Session() as sess: # create a dictionary: d = {b: [1, 2, 3]} # feed it to the placeholder print(sess.run(c, feed_dict=d)) Variables [ ] meaning that they can retain their value over multiple executions of a graph.

Example meaning that they can retain their value over multiple executions of a graph.

Lecture 2: Classification using a Neural Network

Neural Network MODEL Output Input data
meaning that they can retain their value over multiple executions of a graph.

Neural Network meaning that they can retain their value over multiple executions of a graph.

MNIST Data 28 28

Logistic Classifier (linear classifier)
Set of N labeled inputs: D = {(X1, y1), …, (XN, yN)} WX+b = y 0.0 1.5 0.7 0.4 0.1 0.2 3.5 0.02 0.09 0.04 0.03 0.70 SOFTMAX Weight (784,10) Bias (1,10) (Superscript: index of elements) 28 (1,784) Linear model: takes the input, applies a linear function to generate its predictions 28 Logits Probs.

0.0 1.5 0.7 0.4 0.1 0.2 3.5 0.02 0.09 0.04 0.03 0.70 1 SOFTMAX WXn+b Cross-entropy (Superscript: index of elements) Logits (yn) Probs. (S(yn)) One-hot encoded labels (Ln) Input (Xn)

0.02 0.09 0.04 0.03 0.70 1 Cross-entropy … Probs. (S(yn)) Yn=WXn+b One-hot encoded labels (Ln)

Gradient Descent Batch Gradient descent: Learning rate
Batch Gradient descent: Learning rate calculate the gradients for the whole dataset to perform just one update -> computationally expensive can be very slow and is intractable for datasets that don't fit in memory Instead of computing this loss, we’re going to estimate it Stochastic gradient descent, feed one example and update, loss func. Fluctuates a lot! - enables it to jump to new and potentially better local minima complicates convergence to the exact minimum, as SGD will keep overshooting. when we slowly decrease the learning rate, SGD shows the same convergence behaviour as batch gradient descent

Gradient Descent Mini-batch Gradient descent: Learning rate
Mini-batch Gradient descent: Learning rate Instead of computing this loss, we’re going to estimate it Mini-batch gradient descent: takes the best of both worlds Take a small small batch of training samples randomly, compute L, compute derivative, and pretend that this derivative is the right direction to choose. (sometimes it’s not the correct direction and increases the loss). We’re going to compensate for that by running this procedure many many times. One challenge: find proper learning rate

Gradient Descent Momentum: Take advantage of accumulated knowledge Keep a running average of the gradients An overview of gradient descent optimization algorithms

Neural Network Introduce nonlinearity:
Sigmoid non-linearity squashes real numbers to range between [0,1] Sigmoids saturate and kill gradients Sigmoid outputs are not zero-centered -> zig-zagging dynamics ReLU: Alexnet, accelerates convergence, less expensive operation, simpler derivative

Neural Network

Neural Network #parameters = 200x x =159,010

Neural Network

Neural Network b = tf.get_variable(‘bias', initializer=tf.zeros(200)) W = tf.get_variable('weight', shape=(784, 200), initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.01)) X = tf.placeholder(tf.float32, shape=[None, 784], name='input') Y = tf.placeholder(tf.float32, shape=[None, 10], name='label')

Lecture 3: TensorBoard

Tensorboard is a flashlight for our Neural Net's black box.
1.What does the network graph look like? 2.What is the best network configuration? 3.How does the data look in high dimension?

What does the network graph look like?
Understanding connection of nodes and layers

What does the network graph look like?
Visualizing multiple runs simultaneously

How does the data look in high dimension?
Understanding relationship between samples

Lecture 4: Classification using a Convolutional Neural Network

Feed-forward Neural Network (NN)

NN problems: 1. Doesn’t use data structure! Translation invariance

NN problems: CNNs: Solution: Weight sharing
1. Doesn’t use data structure! W Solution: Weight sharing W CNNs: NNs that share their parameters across space

NN problems: 2. Doesn’t scale well to full images
784 Units 500 Units #parameters = 784 x = 392K !!!

NN problems: Solution: Weight sharing + 3D volume of neurons
2. Doesn’t scale well to full images Solution: Weight sharing + 3D volume of neurons Sharing the same set of weights (and biases)

Layers used to build CNNs

Convolution Layer What is convolution? 1. slide 2. multiply
a function derived from two given functions by integration that expresses how the shape of one is modified by the other. 1. slide 2. multiply 3. integrate (i.e. sum)

Convolution Layer - Spatial dimensions: 32x32 - Depth: 3
Feature maps (R+G+B)

Convolution Layer (Filter = Kernel = Patch)
Convolve the filter with the image i.e. “slide over the image spatially, Computing dot products, And sum over all”

Convolution Layer

Convolution Layer 14,455,392 !!! vs 456

Max-Pooling Layer - To reduce the spatial dimension of feature maps

Deep Learning with TensorFlow

Similar presentations

Presentation on theme: "Deep Learning with TensorFlow"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Learning with TensorFlow

Similar presentations

Presentation on theme: "Deep Learning with TensorFlow"— Presentation transcript:

Similar presentations

About project

Feedback