LeCun, Bengio, And Hinton doi: /nature14539

Slides:

Advertisements

Similar presentations

Neural networks Introduction Fitting neural networks

Advertisements

Advanced topics.

Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

|| Dmitry Laptev, Joachim M. Buhmann Machine Learning Lab, ETH Zurich 05/09/14Dmitry Laptev1 Convolutional Decision Trees.

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Machine Learning and having it deep and structured

Artificial Neural Networks KONG DA, XUEYU LEI & PAUL MCKAY.

Comp 5013 Deep Learning Architectures Daniel L. Silver March,

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)

A shallow introduction to Deep Learning

Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.

Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.

Deep Convolutional Nets

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.

Introduction to Deep Learning

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Introduction to Convolutional Neural Networks

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

1 Convolutional neural networks Abin - Roozgard. 2  Introduction  Drawbacks of previous neural networks  Convolutional neural networks  LeNet 5 

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Vision-inspired classification

Big data classification using neural network

Convolutional Neural Network

Deep Learning Amin Sobhani.

Randomness in Neural Networks

Data Mining, Neural Network and Genetic Programming

Goodfellow: Chap 1 Introduction

Deep Learning Insights and Open-ended Questions

Learning Mid-Level Features For Recognition

Matt Gormley Lecture 16 October 24, 2016

Restricted Boltzmann Machines for Classification

References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,

CH. 1: Introduction 1.1 What is Machine Learning Example:

Deep Learning Fundamentals online Training at GoLogica

Deep Learning with TensorFlow online Training at GoLogica Technologies

A VERY Brief Introduction to Convolutional Neural Network using TensorFlow 李弘

Azure Machine Learning Noam Brezis Madeira Data Solutions

Deep learning and applications to Natural language processing

AV Autonomous Vehicles.

Deep Learning Workshop

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Goodfellow: Chap 1 Introduction

Handwritten Digits Recognition

Non-linear hypotheses

Deep learning Introduction Classes of Deep Learning Networks

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

Convolutional neural networks Abin - Roozgard.

Long Short Term Memory within Recurrent Neural Networks

On Convolutional Neural Network

Lecture: Deep Convolutional Neural Networks

Zhedong Zheng, Liang Zheng and Yi Yang

实习生汇报 ——北邮张安迪.

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Electrical and Electronic Engineering

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Reuben Feinman Research advised by Brenden Lake

CS855 Overview Dr. Charles Tappert.

Presentation transcript:

LeCun, Bengio, And Hinton doi:10.1038/nature14539 Deep Learning LeCun, Bengio, And Hinton doi:10.1038/nature14539

Complex Interaction Prediction Speech Recognition Image Recognition Complex Interaction Prediction Wat? Deep learning is a series of machine learning techniques that are being widely employed in image recognition/categorization, speech recognition and translation- including semantic processing. Also, predicting the behavior of complex interactions such of drugs and bio-chemicals, and the effects of mutations in non-coding DNA on gene expression.

Moar Wat? X Y Cat The topic we will be concentrating on the most will be image and object recognition. Most widely covered: cats. And in one case a dog.

Teaching to Learn Supervised / Unsupervised Representation Learning Kernel Methods Convolution Neural Networks Recurrent Neural Networks Supervised Learning – involves presenting data to the machine that has been previously collected, categorized, and labeled. The machine produces a vector of scores for each category. Unsupervised Learning – presenting unlabeled data to the machine and allowing it to self-categorize and internally correlate information. This process leads to better generalization and prevents overfitting (when a model describes error/noise instead of the underlying relationship – usually due to excessive complexity of parameters relative to observations). Representation Learning – also called ‘feature learning’, is a set of techniques that learn a feature: a transformation of raw data input to a representation that can be effectively exploited in machine learning tasks. This allows a machine to both learn a specific task and learn the features themselves: basically – it teaches them to learn how to learn. Kernel Methods – Examples: Scalable Vector Machine (supervised learning techniques that use learning algorithms to solve classification and regression problems), Polynomial Kernals (for learning non-linear models), Fisher Kernel (classification and information retrieval) Convolution Neural Networks – widely deployed for image recognition >>> a feed-forward neural network where neurons are tiled such that they respond to the overlapping regions in the visual field. (more on this later) Recurrent Neural Networks – an architecture for sequence recognition and reproduction / temporal association and prediction (more on this later)

How is this possible? Loads of data Time (and/or) Colossal amounts of computational horsepower Almost every supervised machine learning technique requires a mountain of data to learn how to classify with any significant degree of accuracy. There are exceptions and tradeoffs for each technique. Some require fewer/greater amounts of data and are faster/slower, more/less accurate in certain arenas.

More specifically, how? Generally speaking; an input image is ingested, processed, and output with a vector of scores, one per category. Units that are not input or output are usually referred to as hidden. Error is then calculated based on the output versus the desired pattern scores. The machine is then able to modify the weights accordingly. Weights can be thought of as the tunable elements in the machine. In practice, most practitioners use stochastic gradient descent to pre-train a machine in order to gauge it’s ability to generalize images.

MOAR SPECIFICALLY! Even more specifically – because machine learning isn’t difficult enough yet… The math necessary for creating a machine like this greatly depends on Jacobian partial derivatives. If you’re familiar with the chain rule of derivatives, you can see how a change of value to the variable x on y, and subsequently, that of y on z yields a cascading effect on the network. If you’re not familiar, just notice how all of the functions are related; changing one of them will change them all. Look at the left-graph, center screen; you can see how the graph from the input image creates nearly homogenous curves. Then when the machine applies a non-linear function of weights to the data set, it not only affects the curves, but it affects the entire graph manifold, warping and distorting the input space – this process makes the data sets LINEARLY SEPERABLE. Dare I say: almost binary. The bottom two images help to visualize each layer: both their values as well as their abstract network topology.

Feed Forward network with one input layer of 3 units, 2 hidden layers with 4 and 3 units respectively, and a single output layer with 2 units. (Starting from the bottom right of the screen moving up) First thing – compute the input to each unit (z), which is the weighted sum of the previous layers output. Next – apply the non-linear function ( f ( Zl ) ) to calculate the output, in this case the Rectified Linear Unit f(z) = max(0, z). Repeat this process for all subsequent layers until the output unit is achieved.

Backward Propagation of Errors – This is the same topology, just backwards. The error of each unit is calculated layer by layer and then eventually returned to the original input layer for tuning of each layer’s weights. This is done by computing the partial derivative of (E)rror with respect to the output of each unit, which is in turn a weighted sum of the partial derivatives of error with respect to the total inputs to the units in the layer above. Computationally, this is nothing more than the chain rule of derivatives, again. “We then convert the error derivative with respect to the output into the error derivative with respect to the input by multiplying it by the gradient of f(z) = ∂zk [middle left screen bottom line] . At the output layer, the error derivative with respect to the output of a unit is computed by differentiating the cost function. This gives yl−tl if the cost function for unit l is 0.5(yl−tl )^2 , where tl is the target value. Once the ∂E/∂zk is known, the error-derivative for the weight wjk on the connection from unit j in the layer below is just yj ∂E/∂zk.”

Any questions?

A brief history Revival surged in 2006 thanks to CIFAR The researchers introduced unsupervised learning without labelled data Remarkable performance when detecting pedestrians and recognizing handwritten digits The revival of interest in deep learning feed forward networks happened in 2006 when the Canadian Institute for Advanced Research succeeded in creating an unsupervised learning machine that performed extremely at well at recognizing handwritten digits as well as pedestrians using very limited labelled data.

Pre-training isn’t as necessary as originally thought Cheap GPUs 20 fold increase in speed Pre-training isn’t as necessary as originally thought One type of network stood out as easier to train and performed better when generalizing.... -Graphical Processing Units became cheaper and easier to program -Their power coupled with more modern deep learning algorithms led to a 10-20 fold increase in speed -Shortly after rehabilitation, deep learning advocates learned that pre-training was only needed for smaller data sets

The Convolution Neural Network Inspired by the human visual cortex High-level features are composed of lower-level ones Backprop gradients through a CNN is as simple as a regular Deep Neural Network. ConvNets process input data as multiple arrays – a good example is that of an image with three color intensity channels: red, green, and blue. The CNNs digest the data in a series of stages, moving from a convolutional layer to a pooling layer and then repeating that cycle. - Units in the convolutional layer are feature maps that are connected by a filter bank (the weights in previous diagrams) – each layer of filter banks is different than the previous to aid in distinguishing local motifs as well as finding duplicate motifs among the same image in different locations. - The pooling layer is then used to semantically merge each feature into a single one. Varying local maps are computed to merge via a coarse-graining of each features position.

Convolution Neural Network Let’s look at a Convolution Neural Networks – From the bottom; the network is given an image which it begins processing upward layer by layer. The features uncovered at the lower levels act as edge detectors for each subsequent layer whereby a score is computed for each image class in the output. Edges are detected to form motifs, motifs them form parts, and parts make up objects. ReLU=Rectified Linear Unit.

Traffic Sign Recognition Into the future… Traffic Sign Recognition Biological Segmentation Facial Recognition So the present state of deep learning is already somewhat sobering. One can imagine how all of these self-learning mechanisms can be employed and what they may even evolve into. Between self-driving cars, natural language processing, and general semantic understanding, it’s no wonder why convolution networks have become the de facto standard for recognition and detection tasks.

Now I want to talk to you guys about a powerful CNN that has been blowing up the internet lately: Google Deep Dream. It’s a type of convolution neural network with an algorithm that searches according to a series of input base cases – in this case: an image and a description of what is in it. The algorithm attempts to search the image space for objects that are provided in the description and in the process, projects images that meet the weighted criteria; it acts as a form of pareidolia, the psychological phenomenon where a pattern is perceived when no such pattern is actually there. (i.e. the face on mars, the man in the moon). Google recently open sourced the algorithms – since then, a few sites have popped up that allow you to make your own deep dream distortions.

Any questions?

A brief overview of Recurrent Neural Networks Another architecture worth mentioning, due to it’s uncanny power, is the recurrent neural network. The neurons act according to a typical deep learning network, but with one tricky difference; one of the output connections returns to itself. This allows the neuron to maintain an understanding of all of the previous state vectors that contained information about the history of past elements. A naïve method that competes with state of the art translation algorithms

They’re hard. Problems Backprop gradients grow out of scope Backprop gradients shrink out of scope Architecturally difficult to train Recurrent Neural Networks have provided some significant challenges in the past. Such that, the architecture of the network itself and the training regimes have had to adapt in order to overcome their once previously limited scope of use. Now however, thanks to advances in architecture and training, they’ve been found to be the best predictors of characters in text, word in sequence, and even language translation.

Now for the future! Who thinks robots will take the place of human? And why not? If time permits: https://www.youtube.com/watch?v=7Pq-S557XQU&feature=youtu.be&t=7m25s Where else can convolution networks be applied? All Images are public record/open source