Principal Solutions AWS Deep Learning

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
ImageNet Classification with Deep Convolutional Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Machine learning Image source:
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Akram Bitar and Larry Manevitz Department of Computer Science
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Learning Neural Networks (NN) Christina Conati UBC
Introduction to Neural Networks Freek Stulp. 2 Overview Biological Background Artificial Neuron Classes of Neural Networks 1. Perceptrons 2. Multi-Layered.
Perceptrons Michael J. Watts
Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.
Convolutional Neural Network
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Neural networks and support vector machines
CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
DeepCount Mark Lenson.
COMP24111: Machine Learning and Optimisation
Soft Computing Applied to Finite Element Tasks
CSE 473 Introduction to Artificial Intelligence Neural Networks
Intelligent Information System Lab
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Neural Networks 2 CS446 Machine Learning.
Convolutional Networks
CS621: Artificial Intelligence
Introduction to Neural Networks
RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect
Chapter 3. Artificial Neural Networks - Introduction -
Deep learning Introduction Classes of Deep Learning Networks
of the Artificial Neural Networks.
Very Deep Convolutional Networks for Large-Scale Image Recognition
network of simple neuron-like computing elements
CSC 578 Neural Networks and Deep Learning
LECTURE 35: Introduction to EEG Processing
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
LECTURE 33: Alternative OPTIMIZERS
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC 578 Neural Networks and Deep Learning
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Introduction to Neural Networks
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Image recognition.
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.
CSC 578 Neural Networks and Deep Learning
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

Principal Solutions Architect @ AWS Deep Learning Deed Learning Basics and Intuition Cyrus M. Vahid, Principal Solutions Architect @ AWS Deep Learning cyrusmv@amazon.com June 2017

Deductive Logic How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? https://en.wikiquote.org/wiki/Sherlock_Holmes

Deductive Logic P Q P ∧ Q P ∨ Q P ∴ Q T F

Deductive Logic P Q P ∧ Q P ∨ Q P ∴ Q T F 𝑃=𝑇∧𝑄=𝑇∴𝑃∧𝑄=𝑇 𝑃∧𝑄∴𝑃→𝑄; ∼𝑃∴𝑃→𝑄 P → Q P _________ ∴ Q

# of rows in truth table ( 2 𝑛 ) Complexities of Deductive Logic Predicates (n) # of rows in truth table ( 2 𝑛 ) 1 2 4 3 8 … 10 1024 20 1,048,576 30 1,073,741,824

# of rows in truth table ( 2 𝑛 ) Complexities of Deductive Logic Predicates (n) # of rows in truth table ( 2 𝑛 ) 1 2 4 3 8 … 10 1024 20 1,048,576 30 1,073,741,824 [Ω→ Ε∧Η ∨(Γ∨~Γ)] Φ∧ ~Φ _____________________ ∴ F

# of rows in truth table ( 2 𝑛 ) Complexities of Deductive Logic Useless for dealing with partial information. Predicates (n) # of rows in truth table ( 2 𝑛 ) 1 2 4 3 8 … 10 1024 20 1,048,576 30 1,073,741,824 [Ω→ Ε∧Η ∨(Γ∨~Γ)]∧(Φ∧ ~Φ)∴𝐹

# of rows in truth table ( 2 𝑛 ) Complexities of Deductive Logic Useless for dealing with partial information. Predicates (n) # of rows in truth table ( 2 𝑛 ) 1 2 4 3 8 … 10 1024 20 1,048,576 30 1,073,741,824 Useless for dealing with non-deterministic inferrence. [Ω→ Ε∧Η ∨(Γ∨~Γ)]∧(Φ∧ ~Φ)∴𝐹

# of rows in truth table ( 2 𝑛 ) Complexities of Deductive Logic Useless for dealing with partial information. Predicates (n) # of rows in truth table ( 2 𝑛 ) 1 2 4 3 8 … 10 1024 20 1,048,576 30 1,073,741,824 Useless for dealing with non-deterministic inferrence. We perfoerm very poorly in computing the complex logical statemetns intuitively. [Ω→ Ε∧Η ∨(Γ∨~Γ)]∧(Φ∧ ~Φ)∴𝐹

http://bit.ly/2rwtb0o

Search Trees and Graphs http://aima.cs.berkeley.edu/

Search Trees and Graphs http://aima.cs.berkeley.edu/

http://bit.ly/2rwtb0o

Neurobiology

𝑓 𝑥 𝑖 , 𝑤 𝑖 =Φ(𝑏+ Σ 𝑖 ( 𝑤 𝑖 . 𝑥 𝑖 )) Φ 𝑥 = 1, 𝑖𝑓 𝑥≥0.5 0, 𝑖𝑓 𝑥<0.5 Perceptron 𝑓 𝑥 𝑖 , 𝑤 𝑖 =Φ(𝑏+ Σ 𝑖 ( 𝑤 𝑖 . 𝑥 𝑖 )) Φ 𝑥 = 1, 𝑖𝑓 𝑥≥0.5 0, 𝑖𝑓 𝑥<0.5

Example of Perceptron 𝐼1=𝐼2=𝐵1=1 𝑂1= 1𝑥1 + 1𝑥1 + −1.5 =0.5∴Φ(𝑂1)=1 𝐼2=0 ; 𝐼1=𝐵1=1 𝑂1= 1𝑥1 + 0𝑥1 + −1.5 =−0.5∴Φ(𝑂1)=0

Linearity and Non-Linea Solutions P Q P ∧ Q T F Q x P

Linearity and Non-Linea Solutions P Q P ∧ Q P ⨁ Q T F Q Q x x P P x

Multi-Layer Perceptron A feedforward neural network is a biologically inspired classification algorithm. It consist of a (possibly large) number of simple neuron-like processing units, organized in layers. Every unit in a layer is connected with all the units in the previous layer. Each connection has a weight that encodes the knowledge of the network. Data enters from the input layer and arrives at the output through hidden layers. There is no feed back between the layers Fully-Connected Multi-Layer Feed-Forward Neural Network Topology

Activation Function (Φ)

Training an ANN A well-trained ANN has weights that amplify the signal and dampen the noise. A bigger weight signifies a tighter correlation between a signal and the network’s outcome. The process of learning for any learning algorithm using weights is the process of re-adjusting the weights and biases Back Propagation is a popular training algorithm and is based on distributing the blame for the error and divide it between the contributing weights.

Error Surface https://commons.wikimedia.org/wiki/File:Error_surface_of_a_linear_neuron_with_two_input_weights.png

Gradient Descent

SDG Visualization http://sebastianruder.com/optimizing-gradient-descent/

Learning Parameters Learning Rate: It is a real number that decides how far to move down in the direction of steepest gradient. Online Learning: Weights are updated at each step. Often slow to learn. Batch Learning: Weights are updated after the whole of training data. Often males it hard to optmize. Mini-Batch: Combination of both when we break up the training set into smaller batches and update the weights after each mini-batch.

Overview of Learning Rate

Data Encoding Source: Alec Radford https://www.tensorflow.org/get_started/mnist/beginners Source: Alec Radford

Why Apache MXNet? Most Open Best On AWS Accepted into the Apache Incubator Optimized for deep learning on AWS (Integration with AWS)

Building a Fully Connected Network in Apache MXNet 128 A convolution is defined as a mathematical operation describing a rule for how to merge two sets of information. Filter Size Every filter is small spatially with respect to the width and height of the filter size. An example of this is how the first convolutional layer may have a 5x5x3 sized filter. This would mean the filter is 5 pixels wide, by 5 pixels tall, and 3 represents the color channels assuming the input image was in 3-channel RGB color. Output Depth We can manually pick the depth of the output volume. The depth hyperparameter controls the neuron count in the convolutional layer that is connected to the same region of the input volume. Edges and Activation Different neurons along the depth dimension learn to activate when stimulated with created input data (eg: color or edges). We consider a set of neurons that all look at the same region of input volume as a depth column. Stride Stride configures how far our sliding filter window (described above) will move per application of the filter function. Each time we apply the filter function to the input column we create a new depth column in the output volume. Lower settings for stride (eg: 1 gives only a single unit step) will allocate more depth columns in the output volume. This will also yield more heavily overlapping receptive fields between the columns, leading to larger output volumes. The opposite is true for when we specify higher stride values. These higher stride values give us less overlap and smaller output volumes spatially. Zero-Padding The last hyperparameter is zero-padding and allows us to control the spatial size of the output volumes. We’d want to do this in cases where we want to maintain the spatial size of the input volume in the output volume.

Building a Fully Connected Network in Apache MXNet A convolution is defined as a mathematical operation describing a rule for how to merge two sets of information. Filter Size Every filter is small spatially with respect to the width and height of the filter size. An example of this is how the first convolutional layer may have a 5x5x3 sized filter. This would mean the filter is 5 pixels wide, by 5 pixels tall, and 3 represents the color channels assuming the input image was in 3-channel RGB color. Output Depth We can manually pick the depth of the output volume. The depth hyperparameter controls the neuron count in the convolutional layer that is connected to the same region of the input volume. Edges and Activation Different neurons along the depth dimension learn to activate when stimulated with created input data (eg: color or edges). We consider a set of neurons that all look at the same region of input volume as a depth column. Stride Stride configures how far our sliding filter window (described above) will move per application of the filter function. Each time we apply the filter function to the input column we create a new depth column in the output volume. Lower settings for stride (eg: 1 gives only a single unit step) will allocate more depth columns in the output volume. This will also yield more heavily overlapping receptive fields between the columns, leading to larger output volumes. The opposite is true for when we specify higher stride values. These higher stride values give us less overlap and smaller output volumes spatially. Zero-Padding The last hyperparameter is zero-padding and allows us to control the spatial size of the output volumes. We’d want to do this in cases where we want to maintain the spatial size of the input volume in the output volume.

Training a Fully Connected Network in Apache MXNet A convolution is defined as a mathematical operation describing a rule for how to merge two sets of information. Filter Size Every filter is small spatially with respect to the width and height of the filter size. An example of this is how the first convolutional layer may have a 5x5x3 sized filter. This would mean the filter is 5 pixels wide, by 5 pixels tall, and 3 represents the color channels assuming the input image was in 3-channel RGB color. Output Depth We can manually pick the depth of the output volume. The depth hyperparameter controls the neuron count in the convolutional layer that is connected to the same region of the input volume. Edges and Activation Different neurons along the depth dimension learn to activate when stimulated with created input data (eg: color or edges). We consider a set of neurons that all look at the same region of input volume as a depth column. Stride Stride configures how far our sliding filter window (described above) will move per application of the filter function. Each time we apply the filter function to the input column we create a new depth column in the output volume. Lower settings for stride (eg: 1 gives only a single unit step) will allocate more depth columns in the output volume. This will also yield more heavily overlapping receptive fields between the columns, leading to larger output volumes. The opposite is true for when we specify higher stride values. These higher stride values give us less overlap and smaller output volumes spatially. Zero-Padding The last hyperparameter is zero-padding and allows us to control the spatial size of the output volumes. We’d want to do this in cases where we want to maintain the spatial size of the input volume in the output volume.

Demo Time http://localhost:9999/notebooks/mxnet-notebooks/python/tutorials/mnist.ipynb A convolution is defined as a mathematical operation describing a rule for how to merge two sets of information. Filter Size Every filter is small spatially with respect to the width and height of the filter size. An example of this is how the first convolutional layer may have a 5x5x3 sized filter. This would mean the filter is 5 pixels wide, by 5 pixels tall, and 3 represents the color channels assuming the input image was in 3-channel RGB color. Output Depth We can manually pick the depth of the output volume. The depth hyperparameter controls the neuron count in the convolutional layer that is connected to the same region of the input volume. Edges and Activation Different neurons along the depth dimension learn to activate when stimulated with created input data (eg: color or edges). We consider a set of neurons that all look at the same region of input volume as a depth column. Stride Stride configures how far our sliding filter window (described above) will move per application of the filter function. Each time we apply the filter function to the input column we create a new depth column in the output volume. Lower settings for stride (eg: 1 gives only a single unit step) will allocate more depth columns in the output volume. This will also yield more heavily overlapping receptive fields between the columns, leading to larger output volumes. The opposite is true for when we specify higher stride values. These higher stride values give us less overlap and smaller output volumes spatially. Zero-Padding The last hyperparameter is zero-padding and allows us to control the spatial size of the output volumes. We’d want to do this in cases where we want to maintain the spatial size of the input volume in the output volume.

References How ANN predicts Stock Market Indices? by Vidya Sagar Reddy Gopala, Deepak Ramu Deep Learning, O’Reilly Media Inc. ISBN: 9781491914250; By: Josh Patterson and Adam Gibson Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks, CreateSpace Independent Publishing Platform; ISBN: 1505714346; By: Jeff Heaton Coursera; Neural Networks for Machine Learning by Jeoff Hinton at University of Toronto https://www.researchgate.net/figure/267214531_fig1_Figure-1-RNN-unfolded-in-time-Adapted-from-Sutskever-2013-with-permission www.yann-lecun.com http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ https://github.com/cyrusmvahid/mxnet-notebooks/tree/master/python

Cyrus M. Vahid cyrusmv@amazon.com