Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Deep Learning Bing-Chen Tsai 1/21.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Deep Learning.
Chapter 5 NEURAL NETWORKS
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
Artificial Neural Networks
A shallow introduction to Deep Learning
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Introduction to Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Variational Autoencoders Theory and Extensions
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Neural Network Dong Wang CSLT ML Summer Seminar (3)
Generative Adversarial Network (GAN)
Learning Deep Generative Models by Ruslan Salakhutdinov
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
Automatic Lung Cancer Diagnosis from CT Scans (Week 2)
ECE 5424: Introduction to Machine Learning
A practical guide to learning Autoencoders
Deep Predictive Model for Autonomous Driving
Deep Learning Insights and Open-ended Questions
Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Generative Adversarial Networks
Deep Learning Yoshua Bengio, U. Montreal
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Supervised Training of Deep Networks
Deep learning and applications to Natural language processing
Deep Learning Qing LU, Siyuan CAO.
Deep Belief Networks Psychology 209 February 22, 2013.
Deep Learning Workshop
Dipartimento di Ingegneria «Enzo Ferrari»
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Department of Electrical and Computer Engineering
Deep Learning based Machine Translation
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Deep learning Introduction Classes of Deep Learning Networks
Deep Architectures for Artificial Intelligence
Goodfellow: Chapter 14 Autoencoders
Transformation-invariant clustering using the EM algorithm
CSCI 5822 Probabilistic Models of Human and Machine Learning
[Figure taken from googleblog
Deep Learning for Non-Linear Control
Representation Learning with Deep Auto-Encoder
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Advances in Deep Audio and Audio-Visual Processing
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Introduction to Neural Networks
Presented by: Anurag Paul
Autoencoders David Dohan.
CSLT ML Summer Seminar (2)
Peng Cui Tsinghua University
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5) Many slides are from Bengio and Lecun’s NIPS15 Tutoring A few slides are from Deng’s ICASSP’16 keynote speech

Content Special nets Regularization in deep models Generative model Achievements

CNN Converlution: An adpative filtering/feature learning Pooling: A way of invariance constraint A parameter sharing A form of structure knowledge A way of implementing human perceptron

http://cs231n.github.io/convolutional-networks/

Recurrent net A way of parameter sharing A way of learning temporal principles A way of learning logical programming (inference) A way of implementing human memory and inference (but mixed)

Content Special nets Regularization in deep models Generative model Achievements

Regularization DNN is a very flexible structure Easy to overfitting: bias-variance Easy to underfitting: complex Hessian:local minima, saddle point, slow training... Introduce regularization to Balance description & generalization Introduce human knowledge (Bayesian prior?)

Various regularizations Parameter norm penalty (L1,L2) Data augmentation Noise injection Hidden unit noise injection (drop out) Target noise injection

Parameter sharing CNN RNN NADE

Associated training Semi-supervised learning Multitask learning Collaborative learning Supervised nets Chen-Yu Lee, Deeply-Supervised Nets, DL workshop 2014.

Sparse representation L0 and L1 norm Penalty on hidden units Pretraining leads to sparsity CAE, sparse RBM, sparse DBN, denoise auto-encoder Winner-take-all convolution J. Li et al., Sparseness Analysis in the Pretraining of Deep Neural Networks, 2016

Adversial training Find the misclassified examples Train with these data to avoid boder effect

Manifold regularization Let the tangent direction of the manifold orthogonal to the gradident of the cost Tanget can be obtained from unsupervised learning

Dark knowledge transfer Lili Mou, Ge Li, Yan Xu, Lu Zhang, Zhi Jin, DistillingWord Embeddings: An Encoding Approach, 2015. Train a teacher, and let the teacher to teach the child Can be used to train simple model using complex model, e.g., ensemble learning Can be used to train complex model from simple model Long, Mingsheng and Wang, Jianmin, Learning Transferable Features with Deep Adaptation Networks, ICML2015

Content Special nets Regularization in deep models Generative model Achievements

DNN is basicially not very generative G=(V,E) represents deterministic inference Probabilistic interpolation: Gaussian, Binomial, or MDN. Some ‘randomness’ on input, label, hidden units Basically not generative models

Graphical model is generative G=(V,E) represent joint probabilities of V, with conditionals or potentials represented by E Probabilistic variables Probabilistic inference Probabilistic generation Graphical models and variational methods: Message-passing and relaxations , ICML-2008 Tutorial http://www.eecs.berkeley.edu/~wainwrig/icml08/tutorial_icml08.html

Introducing latent variable for NN Let NN as a feature transform In the transform space, the variable is simple, e.g., Gaussian The generation will be simple by running the latent variable

Variational Bayesian with Auto-encoder Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In ICLR. Danilo J. Rezende, Shakir Mohamed, Daan Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICMS 2014.

Variational Auto-encoder

What have been changed? Bayesian perspective Neural model perspective A encoder (parametric function) is used to map input x to code z, where the variation p(z) is simpler than p(x). With x, p(z|x) keeps simple With z, conditional probability p(x|z) is simpler than x. All seems simpler! Model training becomes parameter adjustment, using BP. Neural model perspective Randomness Can we BP? Using MCMC, on the simple p(x|z). Seems a variational + MCMC

Extend to other encoder-decoder models

Stochastic Recurrent Networks (STORNs) Bayer, J. and Osendorfer, C. (2014). Learning stochastic recurrent networks. In NIPS Workshop on Advances in Variational Inference

Variational Recurrent AE (VRAE) Fabius, O. and van Amersfoort, J. R. (2014). Variational recurrent auto- encoders. arXiv:1412.6581. Music generation

Variable Encoder-Decoder RNN Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio, A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues, 2016/05/20

Variational RNN LM Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. (2015). Generating sentences from a continuous space. arXiv:1511.06349.

Variational image generation DRAW: A Recurrent Neural Network For Image Generation

Generation using noise corruption Injection noise, and let the NN recover the input This leads to learning a manifold where the data distributes This manifold can be used to generate X

Denoise AE

DAE learns manifold and scores (gradients) Guillaume Alain and Yoshua Bengio, What Regularized Auto-Encoders Learn from the Data Generating Distribution

DAE can be used to sampling x

Any corruption + any cost Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent, Generalized Denoising Auto-Encoders as Generative Models.

Introducing latent variables Yoshua Bengio et al., Deep Generative Stochastic Networks Trainable by Backprop.

Multi-step generation Train DAE with random corruption Reconstruct iteratively until converge Equals to get stuck to minimum engergy, or max p(x) It can be proved that with symmetric corruption, the conergence is a stationary point.

Content Special nets Regularization in deep models Generative model Achievements

Speech recognition

Facebook structure

Natural language processing

Joint semantic learning Combine text knokwledge and graph knowledge Embed word in the co-space Dongxu Zhang, Dong Wang, Rong Liu, "Joint Semantic Relevance Learning with Text Data and Graph Knowledge", ACL 2015, workshop CVSC

Relation classification Using RNN to classify relatoins

Poem generation

Another generation https://papers.nips.cc/paper/5633-texture-synthesis-using-convolutional-neural-networks.pdf

Wrap up Deep learning research is most focusing on imposing structures Deep learning can be used as both predicitve and descriptive. The former is more related to function approximation, and the later is more related to abstraction and memorization. Deep learning achieves brilliant improvement so far. It opens the door of true AI. Certainly, true AI is not only deep learning, as we will see.