Deep Learning with Symbols

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Neural networks Introduction Fitting neural networks
Advanced topics.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
POSTER TEMPLATE BY: Multi-Sensor Health Diagnosis Using Deep Belief Network Based State Classification Prasanna Tamilselvan.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Deep Belief Networks for Spam Filtering
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
A shallow introduction to Deep Learning
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Analysis of Classification Algorithms In Handwritten Digit Recognition Logan Helms Jon Daniele.
Introduction to Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Welcome deep loria !.
Big data classification using neural network
Learning Deep Generative Models by Ruslan Salakhutdinov
Debesh Jha and Kwon Goo-Rak
Deep Learning Amin Sobhani.
Machine Learning & Deep Learning
Energy models and Deep Belief Networks
Data Mining, Neural Network and Genetic Programming
Deep Learning Insights and Open-ended Questions
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Generative Adversarial Networks
Deep Learning Yoshua Bengio, U. Montreal
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.
Deep learning and applications to Natural language processing
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Deep Learning Qing LU, Siyuan CAO.
Structure learning with deep autoencoders
Unsupervised Learning and Autoencoders
Deep Learning Workshop
Dipartimento di Ingegneria «Enzo Ferrari»
Attentional Neural Network: Feature Selection Using Cognitive Feedback
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Restricted Boltzman Machines
A brief introduction to neural network
State-of-the-art face recognition systems
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
INF 5860 Machine learning for image classification
Deep learning Introduction Classes of Deep Learning Networks
[Figure taken from googleblog
Neural Networks and Deep Learning
Representation Learning with Deep Auto-Encoder
Autoencoders hi shea autoencoders Sys-AI.
Neural networks (3) Regularization Autoencoder
Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Introduction to Neural Networks
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
An introduction to neural network and machine learning
Presentation transcript:

Deep Learning with Symbols Daniel L. Silver Acadia University, Wolfville, NS, Canada HL NeSy Seminar - Dagsthul, April 2017

Introduction Shameer Iqbal Ahmed Galilia

Motivation Humans are multimodal learners We are able to associate one modality with another Conjecture: knowing a concept has a lot to do with the fusion of sensory/motor channels

Motivation Further Conjecture: Symbols allow us to share complex concepts quickly, concisely A human communications tool A course approximation of a noisy concept 0 1 2 3 4 5 6 7 8 9 Also help us to escape local minima when learning And their sounds: “one” “two” …

Objective 1 To develop a multimodal system A generative deep learning architecture Trained using unsupervised algorithms That scales linearly in the number of channels Can reconstruct missing modalities Train and test it on digits 0-9 Four channels: Image, Audio, Motor, … Symbolic Classification

Background Deep Belief Networks Stacked auto-encoders develop a rich feature space from unlabelled examples using unsupervised algorithms [Source: Caner Hazibas – slideshare]

Background Deep Belief Networks RBM = Restricted Boltzman Machines

Background Multimodal Learning MML Approach has been adopted by several deep learning researchers: (Srivastava and Salakhutdinov 2012) (Ngiam et al. 2011), (Socher et al. 2014) (Kiros, Salakhutdinov, and Zemel 2014) (Karpathy and Fei-Fei 2015) However tend to associate only 2 modalities Association layer is fine-tuned using supervised techniques such as back-prop

Background Problem Refinement

Background Problem Refinement

Background Problem Refinement #1 Supervised fine-tuning does not scale well to three or more modalities Must fine-tune all possible input-output modality combinations Grows exponentially (2n-2), where n is number of channels

Background Problem Refinement Example: n=3 channels, 6 configurations (23-2) = 6 (24-2) = 14 (25-2) = 30

Background Problem Refinement #2 Standard unsupervised learning using RBM approaches yields poor reconstruction A channel that provides a simple, noise free signal will dominate over other channels at the associative layer Difficult for another channel to generate correct features at the associate layer

Theory and Approach Network Architecture Propose a MML deep belief network that scales linearly in the number of channels Provides a concise symbolic rep of AM

Theory and Approach RBM training of DBN Stack 8 eight

Theory and Approach RBM training of DBN Stack 8 eight

Theory and Approach Fine-tuning with Iterative Back-fitting Create and save hi wr wg Split weights 8 eight

Theory and Approach Fine-tuning with Iterative Back-fitting Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2 Create new hj’ wr wg vi vi' Split weight 8 eight

Theory and Approach Fine-tuning with Iterative Back-fitting Update weights: ∆wr = ∊(<vihj> - <vi’hj’>) … minimize ∑k∑m(vi – vi’)2 Create new hj’ wr wg vi vi' Split weight 8 eight

Emperical Studies Data and Method 10 reps x 20 male Canadian students Handwritten digits 0-9 Audio recordings Vector of noisey motor coordinates Classifications 2000 examples in total 100 examples per student x 20 students Conducted 10-fold cross validation 18 subjects in training set (1800 examples) 2 subjects in test set (200 examples) DEMO

Deep Learning and LML http://ml3cpu.acadiau.ca [Iqbal and Silver, in press]

Emperical Studies Data and Method Evaluation: Examine reconstruction error of each channel given input on another channel Error measure differs for each channel: Class: misclassification error Image: agreement with ANN classifier (99% acc) Audio: STFT signal is not reversible so as to create sound, agreement with RF classifier (93% acc) Motor: error = distance to target vector template; (anything < 2.2 is human readable)

Emperical Studies Results Reconstruction of Classification Channel

Emperical Studies Results Reconstruction of Image Channel

Emperical Studies Results Reconstruction of Motor Channel

Emperical Studies Discussion Elimination of channel dominance is not perfect, but significantly better Reconstruction error of missing channel decreases as available channels increase NOTE: The classification channel is not needed Introduced to clarify the concept in assoc. mem. A symbol for a noisy concept

Objective 2 To show that learning with symbols is easier and more accurate than without A deep supervised learning architecture Develop a model to add two MNIST digits With and without symbolic inputs Test on previously unseen examples To examine what is happening in the network Is network learning addition or just a mapping function?

Challenge in Training Deep Architectures Many tricks are used to overcome local minimum, most are a form of Inductive bias that favours portions of weights space where good solutions tend to be found.

Two learners are better than one ! Consider you’re in the jungle … Learning concepts … Then you meet another person You share symbols Accuarcy improves Learn rate increases

Challenge in Training Deep Architectures A single learner is hampered by the presence of local minima within its rep. space Overcoming this difficulty requires a lot of training examples Instead an agent’s learning effectiveness can be significantly improved with symbols Inspires social interaction and dev. of culture Bengio, Y.: Evolving culture vs local minima, In ArXiv 1203.2990v1. Springer (2013), http://arxiv.org/abs/1203.2990

Emperical Studies: Learning to Add MNIST digits Google Tensorflow Noisy: Input: 2 MNIST digit images (784 x 2 values) Output: 2 MNIST digit images (784 x 2 values) With binary symbolic values for each digit: Input: 1568 (images) + 10 + 10 (symbols) values Output: 1568 + 10 + 10 values 1-3 hidden layers of ReLU units

DL Model – Without Symbols

DL Model – With Symbols

Most recent results: Without symbols With Symbols

Discussion: Improved results of about 10% with symbolic outputs (based on classification of the output digits by a highly accurate convolution network) Believe we can do much better Lab working on: Different architectures Varying number of training examples with symbols Interpreting hidden node features

Thank You! QUESTONS? https://ml3cpu.acadiau.ca/ danny.silver@acadiau.ca http://tinyurl/dsilver

References Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, Now Publishers, 2009. Bengio, Y. and LeCun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, editors, Large Scale Kernel Machines. MIT Press. Bengio, Y.: Evolving culture vs local minima, In ArXiv 1203.2990v1. Springer (2013), http://arxiv.org/abs/1203.2990 Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. Cambridge, MA: MIT Press, 2017. Print.