Deep Learning with TensorFlow

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Lecture 14 – Neural Networks
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Big data classification using neural network
TensorFlow CS 5665 F16 practicum Karun Joseph, A Reference:
Deep Learning Software: TensorFlow
Zheng ZHANG 1-st year PhD candidate Group ILES, LIMSI
Convolutional Neural Network
Deep Feedforward Networks
Artificial Neural Networks
Tensorflow Tutorial Homin Yoon.
Data Mining, Neural Network and Genetic Programming
ECE 5424: Introduction to Machine Learning
Computer Science and Engineering, Seoul National University
DeepCount Mark Lenson.
COMP24111: Machine Learning and Optimisation
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Deep Learning Libraries
Intro to NLP and Deep Learning
CS 224S: TensorFlow Tutorial
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Intelligent Information System Lab
Overview of TensorFlow
A VERY Brief Introduction to Convolutional Neural Network using TensorFlow 李 弘
Convolution Neural Networks
TensorFlow and Clipper (Lecture 24, cs262a)
Introduction to CuDNN (CUDA Deep Neural Nets)
Neural Networks and Backpropagation
A brief introduction to neural network
Tensorflow in Deep Learning
INF 5860 Machine learning for image classification
Introduction to TensorFlow
Tensorflow in Deep Learning
Introduction to Neural Networks
Deep Learning Packages
Tensorflow in Deep Learning
Introduction to Tensorflow
An open-source software library for Machine Intelligence
MXNet Internals Cyrus M. Vahid, Principal Solutions Architect,
Logistic Regression & Parallel SGD
Very Deep Convolutional Networks for Large-Scale Image Recognition
Smart Robots, Drones, IoT
MNIST Dataset Training with Tensorflow
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
LECTURE 35: Introduction to EEG Processing
Neural Networks Geoff Hulten.
Deep Learning for Non-Linear Control
Vinit Shah, Joseph Picone and Iyad Obeid
Coding neural networks: A gentle Introduction to keras
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Tensorflow Tutorial Presented By :- Ankur Mali
Convolutional Neural Networks
实习生汇报 ——北邮 张安迪.
Image Classification & Training of Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Introduction to TensorFlow
Automatic Handwriting Generation
Deep Learning Libraries
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Image recognition.
CSC 578 Neural Networks and Deep Learning
Tensorflow in Deep Learning
An introduction to neural network and machine learning
Overall Introduction for the Lecture
Machine Learning for Cyber
Presentation transcript:

Deep Learning with TensorFlow www.easy-tensorflow.com www.github.com/easy-tensorflow

Easy-TensorFlow Team Aryan Mohammad Jahandar www.easy-tensorflow.com

Outline 1st Lecture (9:15 a.m. - 10:30 a.m.) Intro. to Machine Learning Intro. to TensorFlow 2nd Lecture (10:45 a.m. - 12:00 p.m.) Neural Network 3rd Lecture (1:15 p.m. - 2:30 p.m.) Neural Network in TensorFlow & Keras Visualization in TensorBoard 4th Lecture (2:45 p.m. - 4:00 p.m.) Convolutional Neural Network (CNN) CNNs in TensorFlow/Keras

Deep Learning with TensorFlow Lecture 1: Introduction to Machine Learning & TensorFlow www.easy-tensorflow.com

Machine Learning Design machines that automatically learn from data and experience

Machine Learning restores the color of old black & white photos

Machine Learning Google Translate "reads" the text and replaces it with a text in English in real-time

Machine Learning Generating new photos

Machine Learning Self-driving cars

Machine Learning Playing video games

Machine Learning Playing video games

Machine Learning Healthcare

Machine Learning Generate Music

Deep Learning A class of Machine Learning algorithms Multiple cascade processing stages Each stage learns a representation of the data Oleksiy Ivakhnenko

What is TensorFlow? Created by researchers at Google “TensorFlow™ is an open source software library for numerical computation using data flow graphs.” “… software library for Machine Intelligence” TensorFlow has APIs available in several languages (Python, C++, Java, etc.) The Python API is at present the most complete and the easiest to use

Companies using TensorFlow

Why TensorFlow? Developed and maintained by Google Very large and active community + Nice documentation Python API Multi-GPU support Tensorboard (A very Powerful visualization tool) Faster model compilation than Theano-based options High level APIs built on top of TensorFlow (such as Keras and TFlearn)

How to set it up?! Python – programming language Anaconda - package manager (Optional; instead of installing Python directly) TensorFlow IDE – software application (preferably PyCharm) http://www.easy-tensorflow.com/tf-tutorials/install

Intro to TensorFlow What is a Tensor? Importing the library Multi-dimensional array 0-d tensor: scalar (number) 1-d tensor: vector 2-d tensor: matrix Importing the library import tensorflow as tf Key feature:"computational graph" approach Part 1: building the graph which represents the data flow of the computations Part 2: running a session which executes the operations in the graph TensorFlow separates definition of computations from their execution

Graph and Session Graph Nodes = operations

Graph and Session Graph Nodes = operations Edges = Tensors

Graph and Session Graph Nodes = operations Edges = Tensors Session

Graph and Session Example 1: Graph import tensorflow as tf c = tf.add(2, 3, name='Add') print(c) TF automatically names the nodes when you don’t explicitly name them. x = 3 y = 5

? Graph and Session Example 1: Graph import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') print(c) ? Tensor("Add:0", shape=(), dtype=int32) Variables TF automatically names the nodes when you don’t explicitly name them. x = 3 y = 5

Graph and Session Example 1: Graph Variables 5 import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') sess = tf.Session() print(sess.run(c)) sess.close() Variables 5 Create a session, assign it to variable sess so we can call it later Within the session, evaluate the graph to fetch the value of c

Graph and Session Example 1: Graph Variables 5 import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5

Graph and Session Example 2: Graph Variables import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') with tf.Session() as sess: pow_out = sess.run(pow_op) Variables

Graph and Session Example 3: Graph Variables import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') useless_op = tf.multiply(x, add_op, name='Useless') with tf.Session() as sess: pow_out = sess.run(pow_op) Variables

Graph and Session Example 3: Graph Variables import tensorflow as tf x = 2 y = 3 add_op = tf.add(x, y, name='Add') mul_op = tf.multiply(x, y, name='Multiply') pow_op = tf.pow(add_op, mul_op, name='Power') useless_op = tf.multiply(x, add_op, name='Useless') with tf.Session() as sess: pow_out, useless_out = sess.run([pow_op, useless_op]) Variables

Data types 1. Constants are used to create constant values tf.constant(value, dtype=None, shape=None, name='Const', verify_shape=False) Example: s = tf.constant(2, name='scalar') m = tf.constant([[1, 2], [3, 4]], name='matrix')

Data types 1. Constants are used to create constant values Graph Before: Graph import tensorflow as tf a = 2 b = 3 c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5

Data types 1. Constants are used to create constant values Graph Now: import tensorflow as tf a = tf.constant(2, name='A') b = tf.constant(3, name='B') c = tf.add(a, b, name='Add') with tf.Session() as sess: print(sess.run(c)) Variables 5

Data types 2. Variables are stateful nodes (=ops) which output their current value They Can be saved and restored Gradient updates will apply to all variables in the graph get_variable(     name,     shape=None,     dtype=None,     initializer=None,     regularizer=None,     trainable=True,     collections=None,    caching_device=None,     partitioner=None,     validate_shape=True,     use_resource=None,     custom_getter=None,     constraint=None) ⇒ Network Parameters (weights and biases) Example: s1 = tf.get_variable(name='scalar1', initializer=2) s2 = tf.get_variable(name='scalar2', initializer=tf.constant(2)) m = tf.get_variable('matrix', initializer=tf.constant([[0, 1], [2, 3]])) M = tf.get_variable('big_matrix', shape=(784, 10), initializer=tf.zeros_initializer()) W = tf.get_variable('weight', shape=(784, 10), initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.01)) meaning that they can retain their value over multiple executions of a graph. 

Data types 2. Variables Graph Variables import tensorflow as tf # create graph a = tf.get_variable(name="A", initializer=tf.constant([[0, 1], [2, 3]])) b = tf.get_variable(name="B", initializer=tf.constant([[4, 5], [6, 7]])) c = tf.add(a, b, name="Add") # launch the graph in a session with tf.Session() as sess: # now we can run the desired operation print(sess.run(c)) Variables FailedPreconditionError: Attempting to use uninitialized value meaning that they can retain their value over multiple executions of a graph. 

Data types Graph Variables import tensorflow as tf # create graph a = tf.get_variable(name="A", initializer=tf.constant([[0, 1], [2, 3]])) b = tf.get_variable(name="B", initializer=tf.constant([[4, 5], [6, 7]])) c = tf.add(a, b, name="Add") # Add an Op to initialize variables init_op = tf.global_variables_initializer() # launch the graph in a session with tf.Session() as sess: # run the variable initializer sess.run(init_op) # now we can run the desired operation print(sess.run(c)) [[ 4  6]  [ 8 10]] Graph Variables meaning that they can retain their value over multiple executions of a graph. 

Data types 3. Placeholders are nodes whose value is fed in at execution time. ⇒ - Assemble the graph without knowing the values needed for computation - We can later supply the data at the execution time. ⇒ Input data (in classification task: Inputs and labels) tf.placeholder(dtype, shape=None, name=None) a = tf.placeholder(tf.float32, shape=[5]) b = tf.placeholder(dtype=tf.float32, shape=None, name=None) X = tf.placeholder(tf.float32, shape=[None, 784], name='input') Y = tf.placeholder(tf.float32, shape=[None, 10], name='label') meaning that they can retain their value over multiple executions of a graph. 

Data types 3. Placeholders Graph Variables import tensorflow as tf a = tf.constant([5, 5, 5], tf.float32, name='A') b = tf.placeholder(tf.float32, shape=[3], name='B') c = tf.add(a, b, name="Add") with tf.Session() as sess: print(sess.run(c)) Variables You must feed a value for placeholder tensor 'B' with dtype float and shape [3] meaning that they can retain their value over multiple executions of a graph. 

Data types 3. Placeholders Graph Variables [6. 7. 8.] import tensorflow as tf a = tf.constant([5, 5, 5], tf.float32, name='A') b = tf.placeholder(tf.float32, shape=[3], name='B') c = tf.add(a, b, name="Add") with tf.Session() as sess: # create a dictionary: d = {b: [1, 2, 3]} # feed it to the placeholder print(sess.run(c, feed_dict=d)) Variables [6. 7. 8.] meaning that they can retain their value over multiple executions of a graph. 

Example meaning that they can retain their value over multiple executions of a graph. 

Deep Learning with TensorFlow Lecture 2: Classification using a Neural Network www.easy-tensorflow.com

Neural Network MODEL Output Input data meaning that they can retain their value over multiple executions of a graph. 

Neural Network meaning that they can retain their value over multiple executions of a graph. 

MNIST Data 28 28

Logistic Classifier (linear classifier) Set of N labeled inputs: D = {(X1, y1), …, (XN, yN)} WX+b = y 0.0 1.5 0.7 0.4 0.1 0.2 3.5 0.02 0.09 0.04 0.03 0.70 SOFTMAX Weight (784,10) Bias (1,10)   (Superscript: index of elements) 28 (1,784) Linear model: takes the input, applies a linear function to generate its predictions 28 Logits Probs.

Logistic Classifier (linear classifier) 0.0 1.5 0.7 0.4 0.1 0.2 3.5 0.02 0.09 0.04 0.03 0.70 1 SOFTMAX WXn+b   Cross-entropy (Superscript: index of elements) Logits (yn) Probs. (S(yn)) One-hot encoded labels (Ln) Input (Xn)

Logistic Classifier (linear classifier) 0.02 0.09 0.04 0.03 0.70 1   Cross-entropy …     Probs. (S(yn)) Yn=WXn+b One-hot encoded labels (Ln)

Gradient Descent Batch Gradient descent: Learning rate   Batch Gradient descent:   Learning rate calculate the gradients for the whole dataset to perform just one update -> computationally expensive can be very slow and is intractable for datasets that don't fit in memory Instead of computing this loss, we’re going to estimate it Stochastic gradient descent, feed one example and update, loss func. Fluctuates a lot! - enables it to jump to new and potentially better local minima complicates convergence to the exact minimum, as SGD will keep overshooting. when we slowly decrease the learning rate, SGD shows the same convergence behaviour as batch gradient descent

Gradient Descent Mini-batch Gradient descent: Learning rate   Mini-batch Gradient descent:   Learning rate Instead of computing this loss, we’re going to estimate it  Mini-batch gradient descent: takes the best of both worlds Take a small small batch of training samples randomly, compute L, compute derivative, and pretend that this derivative is the right direction to choose. (sometimes it’s not the correct direction and increases the loss). We’re going to compensate for that by running this procedure many many times. One challenge: find proper learning rate

Gradient Descent Momentum: Take advantage of accumulated knowledge Keep a running average of the gradients An overview of gradient descent optimization algorithms

Neural Network Introduce nonlinearity: Sigmoid non-linearity squashes real numbers to range between [0,1] Sigmoids saturate and kill gradients Sigmoid outputs are not zero-centered -> zig-zagging dynamics ReLU: Alexnet, accelerates convergence, less expensive operation, simpler derivative

Neural Network

Neural Network #parameters = 200x784 + 200 + 10x200 + 10 =159,010

Neural Network

Neural Network

Neural Network b = tf.get_variable(‘bias', initializer=tf.zeros(200)) W = tf.get_variable('weight', shape=(784, 200), initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.01)) X = tf.placeholder(tf.float32, shape=[None, 784], name='input') Y = tf.placeholder(tf.float32, shape=[None, 10], name='label')

Deep Learning with TensorFlow Lecture 3: TensorBoard www.easy-tensorflow.com

Tensorboard is a flashlight for our Neural Net's black box. 1.What does the network graph look like? 2.What is the best network configuration? 3.How does the data look in high dimension?

What does the network graph look like? Understanding connection of nodes and layers

What does the network graph look like? Visualizing multiple runs simultaneously

How does the data look in high dimension? Understanding relationship between samples

Deep Learning with TensorFlow Lecture 4: Classification using a Convolutional Neural Network www.easy-tensorflow.com

Feed-forward Neural Network (NN)

NN problems: 1. Doesn’t use data structure! Translation invariance

NN problems: CNNs: Solution: Weight sharing 1. Doesn’t use data structure! W Solution: Weight sharing W CNNs: NNs that share their parameters across space

NN problems: 2. Doesn’t scale well to full images 784 Units 500 Units #parameters = 784 x 500 + 500 = 392K !!!

NN problems: Solution: Weight sharing + 3D volume of neurons 2. Doesn’t scale well to full images Solution: Weight sharing + 3D volume of neurons Sharing the same set of weights (and biases)

Layers used to build CNNs

Convolution Layer What is convolution? 1. slide 2. multiply a function derived from two given functions by integration that expresses how the shape of one is modified by the other.     1. slide 2. multiply 3. integrate (i.e. sum)

Convolution Layer - Spatial dimensions: 32x32 - Depth: 3 Feature maps (R+G+B)

Convolution Layer (Filter = Kernel = Patch) Convolve the filter with the image i.e. “slide over the image spatially, Computing dot products, And sum over all”

Convolution Layer

Convolution Layer

Convolution Layer

Convolution Layer

Convolution Layer

Convolution Layer

Convolution Layer

Convolution Layer 14,455,392 !!! vs 456

Max-Pooling Layer - To reduce the spatial dimension of feature maps

Max-Pooling Layer - To reduce the spatial dimension of feature maps