Structure learning with deep autoencoders

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Kostas Kontogiannis E&CE
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Machine Learning Neural Networks
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
How to do backpropagation in a brain
Artificial Neural Networks
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Explorations in Neural Networks Tianhui Cai Period 3.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
NEURAL NETWORKS FOR DATA MINING
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Non-Bayes classifiers. Linear discriminants, neural networks.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Web-Mining Agents Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Übungen)
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
Artificial Neural Networks
Energy models and Deep Belief Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Multimodal Learning with Deep Boltzmann Machines
Deep Learning Qing LU, Siyuan CAO.
CSSE463: Image Recognition Day 17
Prof. Carolina Ruiz Department of Computer Science
of the Artificial Neural Networks.
Goodfellow: Chapter 14 Autoencoders
Deep Belief Nets and Ising Model-Based Network Construction
Intelligent Leaning -- A Brief Introduction to Artificial Neural Networks Chiung-Yao Fang.
Artificial Neural Networks
Regulation Analysis using Restricted Boltzmann Machines
Multilayer Perceptron & Backpropagation
CSSE463: Image Recognition Day 17
Ch4: Backpropagation (BP)
CSSE463: Image Recognition Day 18
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CSSE463: Image Recognition Day 17
CSSE463: Image Recognition Day 17
Seminar on Machine Learning Rada Mihalcea
Ch4: Backpropagation (BP)
CSC 578 Neural Networks and Deep Learning
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Structure learning with deep autoencoders Network Modeling Seminar, 30/4/2013 Patrick Michl

Agenda Autoencoders Biological Model Validation & Implementation

Real world data usually is high dimensional … Autoencoders Dataset Model x2 x1 Real world data usually is high dimensional …

… which makes structural analysis and modeling complicated! Autoencoders Dataset Model x1 x2 𝐹( 𝑥 1 ,𝑥2) ? x2 x1 … which makes structural analysis and modeling complicated!

Dimensionality reduction techinques like PCA … Autoencoders Dataset Model PCA x2 x1 Dimensionality reduction techinques like PCA …

… can not preserve complex structures! Autoencoders Dataset Model PCA x1 x2 𝑥2=α 𝑥 1 +β x2 x1 … can not preserve complex structures!

Therefore the analysis of unknown structures … Autoencoders Dataset Model x2 x1 Therefore the analysis of unknown structures …

… needs more considerate nonlinear techniques! Autoencoders Dataset Model x1 x2 𝑥2=𝑓( 𝑥 1 ) x2 x1 … needs more considerate nonlinear techniques!

Autoencoders are artificial neuronal networks … input data X Autoencoder Artificial Neuronal Network Perceptrons Gaussian Units output data X‘ Autoencoders are artificial neuronal networks …

Autoencoders are artificial neuronal networks … Perceptron input data X Autoencoder 1 Artificial Neuronal Network Perceptrons Gaussian Units Gauss Units R output data X‘ Autoencoders are artificial neuronal networks …

Autoencoders are artificial neuronal networks … input data X Autoencoder Artificial Neuronal Network Perceptrons Gaussian Units output data X‘ Autoencoders are artificial neuronal networks …

… with multiple hidden layers. Autoencoders input data X output data X‘ Perceptrons (Visible layers) (Hidden layers) Autoencoder Artificial Neuronal Network Multiple hidden layers Gaussian Units … with multiple hidden layers.

Such networks are called deep networks. Autoencoders input data X output data X‘ Perceptrons (Visible layers) (Hidden layers) Autoencoder Artificial Neuronal Network Multiple hidden layers Gaussian Units Such networks are called deep networks.

Such networks are called deep networks. Autoencoders input data X output data X‘ Perceptrons (Visible layers) (Hidden layers) Autoencoder Artificial Neuronal Network Multiple hidden layers Definition (deep network) Deep networks are artificial neuronal networks with multiple hidden layers Gaussian Units Such networks are called deep networks.

Such networks are called deep networks. Autoencoders input data X output data X‘ Perceptrons (Visible layers) (Hidden layers) Autoencoder Deep network Gaussian Units Such networks are called deep networks.

Autoencoders have a symmetric topology … input data X output data X‘ Perceptrons (Visible layers) (Hidden layers) Autoencoder Deep network Symmetric topology Gaussian Units Autoencoders have a symmetric topology …

… with an odd number of hidden layers. Autoencoders input data X output data X‘ Perceptrons (Visible layers) (Hidden layers) Autoencoder Deep network Symmetric topology Gaussian Units … with an odd number of hidden layers.

The small layer in the center works lika an information bottleneck Autoencoders input data X output data X‘ Autoencoder Deep network Symmetric topology Information bottleneck Bottleneck The small layer in the center works lika an information bottleneck

Autoencoders Autoencoder input data X output data X‘ Autoencoder Deep network Symmetric topology Information bottleneck Bottleneck ... that creates a low dimensional code for each sample in the input data.

The upper stack does the encoding … Autoencoders input data X output data X‘ Autoencoder Encoder Deep network Symmetric topology Information bottleneck Encoder The upper stack does the encoding …

… and the lower stack does the decoding. Autoencoders input data X output data X‘ Autoencoder Encoder Deep network Symmetric topology Information bottleneck Encoder Decoder Decoder … and the lower stack does the decoding.

… and the lower stack does the decoding. Autoencoders input data X output data X‘ Autoencoder Encoder Deep network Symmetric topology Information bottleneck Encoder Decoder Definition (autoencoder) Autoencoders are deep networks with a symmetric topology and an odd number of hiddern layers, containing a encoder, a low dimensional representation and a decoder. Definition (deep network) Deep networks are artificial neuronal networks with multiple hidden layers Decoder … and the lower stack does the decoding.

Autoencoders can be used to reduce the dimension of data … input data X output data X‘ Autoencoder Problem: dimensionality of data Idea: Train autoencoder to minimize the distance between input X and output X‘ Encode X to low dimensional code Y Decode low dimensional code Y to output X‘ Output X‘ is low dimensional Autoencoders can be used to reduce the dimension of data …

Autoencoders Autoencoder … if we can train them! input data X output data X‘ Autoencoder Problem: dimensionality of data Idea: Train autoencoder to minimize the distance between input X and output X‘ Encode X to low dimensional code Y Decode low dimensional code Y to output X‘ Output X‘ is low dimensional … if we can train them!

In feedforward ANNs backpropagation is a good approach. Autoencoders input data X output data X‘ Autoencoder Training Backpropagation In feedforward ANNs backpropagation is a good approach.

In feedforward ANNs backpropagation is a good approach. Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Backpropagation Definition (autoencoder) The distance (error) between current output X‘ and wanted output Y is computed. This gives a error function 𝑋 ′ =𝐹 𝑋 error = 𝑋 ′2 −𝑌 In feedforward ANNs backpropagation is a good approach.

In feedforward ANNs backpropagation is the choice Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Backpropagation Definition (autoencoder) The distance (error) between current output X‘ and wanted output Y is computed. This gives a error function Example (linear neuronal unit with two inputs) In feedforward ANNs backpropagation is the choice

In feedforward ANNs backpropagation is a good approach. Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Backpropagation Definition (autoencoder) The distance (error) between current output X‘ and wanted output Y is computed. This gives a error function By calculating −𝛻𝑒𝑟𝑟𝑜𝑟 we get a vector that shows in a direction which decreases the error We update the parameters to decrease the error In feedforward ANNs backpropagation is a good approach.

In feedforward ANNs backpropagation is the choice Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Backpropagation Definition (autoencoder) The distance (error) between current output X‘ and wanted output Y is computed. This gives a error function By calculating −𝛻𝑒𝑟𝑟𝑜𝑟 we get a vector that shows in a direction which decreases the error We update the parameters to decrease the error We repeat that In feedforward ANNs backpropagation is the choice

… the problem are the multiple hidden layers! Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Problem: Deep Network … the problem are the multiple hidden layers!

Backpropagation is known to be slow far away from the output layer … Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Problem: Deep Network Very slow training Backpropagation is known to be slow far away from the output layer …

… and can converge to poor local minima. Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Problem: Deep Network Very slow training Maybe bad solution … and can converge to poor local minima.

The task is to initialize the parameters close to a good solution! Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Problem: Deep Network Very slow training Maybe bad solution Idea: Initialize close to a good solution The task is to initialize the parameters close to a good solution!

Therefore the training of autoencoders has a pretraining phase … input data X output data X‘ Autoencoder Training Backpropagation Problem: Deep Network Very slow training Maybe bad solution Idea: Initialize close to a good solution Pretraining Therefore the training of autoencoders has a pretraining phase …

… which uses Restricted Boltzmann Machines (RBMs) Autoencoders input data X output data X‘ Autoencoder Training Backpropagation Problem: Deep Network Very slow training Maybe bad solution Idea: Initialize close to a good solution Pretraining Restricted Boltzmann Machines … which uses Restricted Boltzmann Machines (RBMs)

… which uses Restricted Boltzmann Machines (RBMs) Autoencoders input data X output data X‘ Autoencoder Training Restricted Boltzmann Machine RBMs are Markov Random Fields Backpropagation Problem: Deep Network Very slow training Maybe bad solution Idea: Initialize close to a good solution Pretraining Restricted Boltzmann Machines … which uses Restricted Boltzmann Machines (RBMs)

… which uses Restricted Boltzmann Machines (RBMs) Autoencoders input data X output data X‘ Autoencoder Training Restricted Boltzmann Machine RBMs are Markov Random Fields Backpropagation Problem: Deep Network Very slow training Maybe bad solution Idea: Initialize close to a good solution Pretraining Restricted Boltzmann Machines Markov Random Field Every unit influences every neighbor The coupling is undirected Motivation (Ising Model) A set of magnetic dipoles (spins) is arranged in a graph (lattice) where neighbors are coupled with a given strengt … which uses Restricted Boltzmann Machines (RBMs)

… which uses Restricted Boltzmann Machines (RBMs) Autoencoders input data X output data X‘ Autoencoder Training Restricted Boltzmann Machine RBMs are Markov Random Fields Bipartite topology: visible (v), hidden (h) Use local energy to calculate the probabilities of values Training: contrastive divergency (Gibbs Sampling) Backpropagation Problem: Deep Network Very slow training Maybe bad solution Idea: Initialize close to a good solution Pretraining Restricted Boltzmann Machines h1 v1 v2 v3 v4 h2 h3 … which uses Restricted Boltzmann Machines (RBMs)

… which uses Restricted Boltzmann Machines (RBMs) Autoencoders input data X output data X‘ Autoencoder Training Restricted Boltzmann Machine Gibbs Sampling Backpropagation Problem: Deep Network Very slow training Maybe bad solution Idea: Initialize close to a good solution Pretraining Restricted Boltzmann Machines … which uses Restricted Boltzmann Machines (RBMs)

The top layer RBM transforms real value data into binary codes. Autoencoders Autoencoder Training Top 𝑉 ≔set of visible units 𝑥 𝑣 ≔value of unit 𝑣,∀𝑣∈𝑉 𝑥 𝑣 ∈𝑹, ∀𝑣∈𝑉 𝐻 ≔set of hidden units 𝑥 ℎ ≔value of unit ℎ, ∀ℎ∈𝐻 𝑥 ℎ ∈{𝟎, 𝟏}, ∀ℎ∈𝐻 The top layer RBM transforms real value data into binary codes.

Therefore visible units are modeled with gaussians to encode data … Autoencoders Autoencoder Training h2 v1 v2 v3 v4 h3 h4 h5 h1 Top 𝑥 𝑣 ~𝑁 𝑏 𝑣 + ℎ 𝑤 𝑣ℎ 𝑥 ℎ , 𝜎 𝑣 𝜎 𝑣 ≔ std. dev. of unit 𝑣 𝑏 𝑣 ≔bias of unit 𝑣 𝑤 𝑣ℎ ≔weight of edge (𝑣,ℎ) Therefore visible units are modeled with gaussians to encode data …

… and many hidden units with simoids to encode dependencies Autoencoders Autoencoder Training h2 v1 v2 v3 v4 h3 h4 h5 h1 Top 𝑥 ℎ ~sigm 𝑏 ℎ + 𝑣 𝑤 𝑣ℎ 𝑥 𝑣 𝜎 𝑣 𝜎 𝑣 ≔ std. dev. of unit 𝑣 𝑏 ℎ ≔bias of unit ℎ 𝑤 𝑣ℎ ≔weight of edge (𝑣,ℎ) … and many hidden units with simoids to encode dependencies

The objective function is the sum of the local energies. Autoencoders Autoencoder Training h2 v1 v2 v3 v4 h3 h4 h5 h1 Top Local Energy 𝐸 𝑣 ≔− ℎ 𝑤 𝑣ℎ 𝑥 𝑣 𝜎 𝑣 𝑥 ℎ + 𝑥 𝑣 − 𝑏 𝑣 2 2 𝜎 𝑣 2 𝐸 ℎ ≔− 𝑣 𝑤 𝑣ℎ 𝑥 𝑣 𝜎 𝑣 𝑥 ℎ + 𝑥 ℎ 𝑏 ℎ The objective function is the sum of the local energies.

The next RBM layer maps the dependency encoding… Autoencoders Autoencoder Training Reduction 𝑉 ≔set of visible units 𝑥 𝑣 ≔value of unit 𝑣,∀𝑣∈𝑉 𝑥 𝑣 ∈{𝟎, 𝟏}, ∀𝑣∈𝑉 𝐻 ≔set of hidden units 𝑥 ℎ ≔value of unit ℎ, ∀ℎ∈𝐻 𝑥 ℎ ∈{𝟎, 𝟏}, ∀ℎ∈𝐻 The next RBM layer maps the dependency encoding…

Autoencoders Autoencoder Training h1 v1 v2 v3 v4 h2 h3 Reduction 𝑥 𝑣 ~sigm 𝑏 𝑣 + ℎ 𝑤 𝑣ℎ 𝑥 ℎ 𝑏 𝑣 ≔bias of unit v 𝑤 𝑣ℎ ≔weight of edge (𝑣,ℎ) … from the upper layer …

… to a smaller number of simoids … Autoencoders Autoencoder Training h1 v1 v2 v3 v4 h2 h3 Reduction 𝑥 ℎ ~sigm 𝑏 ℎ + 𝑣 𝑤 𝑣ℎ 𝑥 𝑣 𝑏 ℎ ≔bias of unit h 𝑤 𝑣ℎ ≔weight of edge (𝑣,ℎ) … to a smaller number of simoids …

… which can be trained faster than the top layer Autoencoders Autoencoder Training h1 v1 v2 v3 v4 h2 h3 Reduction Local Energy 𝐸 𝑣 ≔− ℎ 𝑤 𝑣ℎ 𝑥 𝑣 𝑥 ℎ + 𝑥 ℎ 𝑏 ℎ 𝐸 ℎ ≔− 𝑣 𝑤 𝑣ℎ 𝑥 𝑣 𝑥 ℎ + 𝑥 𝑣 𝑏 𝑣 … which can be trained faster than the top layer

The symmetric topology allows us to skip further training. Autoencoders Autoencoder Training Unrolling The symmetric topology allows us to skip further training.

The symmetric topology allows us to skip further training. Autoencoders Autoencoder Training Unrolling The symmetric topology allows us to skip further training.

After pretraining backpropagation usually finds good solutions Autoencoders Autoencoder Training Pretraining Top RBM (GRBM) Reduction RBMs Unrolling Finetuning Backpropagation After pretraining backpropagation usually finds good solutions

The algorithmic complexity of RBM training depends on the network size Autoencoders Autoencoder Training Complexity: O(inw) i: number of iterations n: number of nodes w: number of weights Memory Complexity: O(w) The algorithmic complexity of RBM training depends on the network size

Agenda Autoencoders Biological Model Validation & Implementation

Network Modeling Restricted Boltzmann Machines (RBM) TF E How to model the topological structure?

Network Modeling Restricted Boltzmann Machines (RBM) TF E We define S and E as visible data Layer …

Network Modeling Restricted Boltzmann Machines (RBM) TF We identify S and E with the visible layer …

Network Modeling Restricted Boltzmann Machines (RBM) TF … and the TFs with the hidden layer in a RBM

Network Modeling Restricted Boltzmann Machines (RBM) TF The training of the RBM gives us a model

Agenda Autoencoder Biological Model Implementation & Results

Results Validation of the results Needs information about the true regulation Needs information about the descriptive power of the data

Results Validation of the results Needs information about the true regulation Needs information about the descriptive power of the data Without this infomation validation can only be done, using artificial datasets!

Results Artificial datasets We simulate data in three steps:

Results Artificial datasets We simulate data in three steps Step 1 Choose number of Genes (E+S) and create random bimodal distributed data

Results Artificial datasets We simulate data in three steps Step 1 Choose number of Genes (E+S) and create random bimodal distributed data Step 2 Manipulate data in a fixed order

Results Artificial datasets We simulate data in three steps Step 1 Choose number of Genes (E+S) and create random bimodal distributed data Step 2 Manipulate data in a fixed order Step 3 Add noise to manipulated data and normalize data

Results Simulation Step 1 Number of visible nodes 8 (4E, 4S) Create random data: Random {-1, +1} + N(0, 𝜎=0.5)

Results Simulation Step 2 Manipulate data 𝑒 1 =0.25 𝑠 1 +0.25 𝑠 2 +0.25 𝑠 3 + 0.25𝑠 4 𝑒 2 =0.5 𝑠 1 +0.5 Noise 𝑒 3 =0.5 𝑠 1 +0.5 𝑁𝑜𝑖𝑠𝑒 4 𝑒 4 =0.5 𝑠 1 +0.5 𝑁𝑜𝑖𝑠𝑒

Results Simulation Step 3 Add noise: N(0, 𝜎=0.5)

Results We analyse the data X with an RBM

We train an autoencoder with 9 hidden layers and 165 nodes: Results We train an autoencoder with 9 hidden layers and 165 nodes: Layer 1 & 9: 32 hidden units Layer 2 & 8: 24 hidden units Layer 3 & 7: 16 hidden units Layer 4 & 6: 8 hidden units Layer 5: 5 hidden units input data X output data X‘

We transform the data from X to X‘ And reduce the dimensionality Results We transform the data from X to X‘ And reduce the dimensionality

Results We analyse the transformed data X‘ with an RBM

Lets compare the models Results Lets compare the models

Another Example with more nodes and larger autoencoder Results Another Example with more nodes and larger autoencoder

Conclusion Conclusion Autoencoders can improve modeling significantly by reducing the dimensionality of data Autoencoders preserve complex structures in their multilayer perceptron network. Analysing those networks (for example with knockout tests) could give more structural information The drawback are high computational costs Since the field of deep learning is getting more popular (Face recognition / Voice recognition, Image transformation). Many new improvements in facing the computational costs have been made.

Acknowledgement eilsLABS PD Dr. Rainer König Prof. Dr Roland Eils Network Modeling Group