Learning Deep L0 Encoders

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Support Vector Machines
Head First Dropout Naiyan Wang.
What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Neural Networks Marco Loog.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Artificial Neural Networks
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Adapting Deep RankNet for Personalized Search
Playing with features for learning and prediction Jongmin Kim Seoul National University.
Classification / Regression Neural Networks 2
From Machine Learning to Deep Learning. Topics that I will Cover (subject to some minor adjustment) Week 2: Introduction to Deep Learning Week 3: Logistic.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Multivariate Dyadic Regression Trees for Sparse Learning Problems Xi Chen Machine Learning Department Carnegie Mellon University (joint work with Han Liu)
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Big data classification using neural network
Learning to Compare Image Patches via Convolutional Neural Networks
Faster R-CNN – Concepts
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
The Relationship between Deep Learning and Brain Function
Artificial Neural Networks
Large-scale Machine Learning
Data Mining, Neural Network and Genetic Programming
Multiplicative updates for L1-regularized regression
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Computing and Compressive Sensing in Wireless Sensor Networks
Learning Mid-Level Features For Recognition
Article Review Todd Hricik.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CSE 4705 Artificial Intelligence
Matt Gormley Lecture 16 October 24, 2016
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
ECE 6504 Deep Learning for Perception
Training Techniques for Deep Neural Networks
CVPR 2017 (in submission) Genetic CNN
A brief introduction to neural network
CNNs and compressive sensing Theoretical analysis
Goodfellow: Chap 6 Deep Feedforward Networks
CMPT 733, SPRING 2016 Jiannan Wang
PixelGAN Autoencoders
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
ALL YOU NEED IS A GOOD INIT
Tips for Training Deep Network
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Neural Networks Geoff Hulten.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
SVM-based Deep Stacking Networks
Problems with CNNs and recent innovations 2/13/19
实习生汇报 ——北邮 张安迪.
CMPT 733, SPRING 2017 Jiannan Wang
Introduction to Neural Networks
Deep learning enhanced Markov State Models (MSMs)
An Efficient Projection for L1-∞ Regularization
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Rong Ge, Duke University
Goodfellow: Chapter 14 Autoencoders
An introduction to neural network and machine learning
Presentation transcript:

Learning Deep L0 Encoders Qing Ling Department of Automation, University of Science and Technology of China (USTC) Joint work with Zhangyang Wang and Thomas Huang (UIUC) The 2016 AAAI Conference on Artificial Intelligence (AAAI 2016) The 2015 Youth Symposium of Scientific and Engineering Computation (YSSEC 2015) 2015/12/11 1

Starter: A Joke about Deep Learning The way to do machine learning research 5 years ago Collect Data Analyze Data Design Feature Build Model Verify Model Optimize Model Evaluate Model The way to do machine learning research now Collect Data Tune Network Collect Data Tune Network Collect Data Tune Network ……

Theme of This Talk Behind the success of deep learning are difficulties in Structure design & network initialization & parameter tuning Incorporation of problem-level prior & interpretation From engineering (or art) to science Statistic bounds Convergence analysis Bridging deep (big data) & shallow (small data) models Our goal: connection between deep learning & sparse coding 3

Outline A brief introduction to deep learning Connection between deep learning & sparse coding Deep L0-regularized encoder Deep M-sparse L0 encoder Numerical experiments Conclusions 4 4

Learning Deep Representations/Features Example: a feed-forward network Train with BIG input & output Inference from input to output In the train stage Learn nonlinear features Fi Linear weight + nonlinear neuron Stochastic (sub)gradient In the inference stage Transform with learned features Fast end-to-end inference 5 5

Power of Being Deep: Example of ILSVRC Human 5.1% Data Algorithm System 6 6

Sparse Coding Revisited Train in sparse coding Given (X,Y) where Y is a sparse representation of X Learn a dictionary D such that X = DY by some approach Inference in sparse coding Y = argminY ||X-DY||2 + r(Y) Regularization r(Y): enforce sparsity of Y View sparse coding from the perspective of deep learning Train & inference are done over different architectures Iterative algorithm for inference that is often slow Not end-to-end (classification, etc) 7 7

Connect Sparse Coding & Deep Learning Idea: truncate iterative algorithm for train & inference Train & inference are done in the same architecture Fast & end-to-end inference (add a new operator/neuron) O1 O2 + X Y O3 Example: unfolded & truncated up to the second iteration O1 X Y O2 O3 O2 O3 O2 + + 8 8

Case Study: Deep L0-Regularized Encoder L0-regularized least squares Y = argminY ||X-DY||2 + c2||Y||0 IHT: Yk+1 = hc(DTX+(I-DTD)Yk) = hc(DTX+WYk) O1 O2 DT hc + + X Y X Y O3 W Trained as a deep network; fast & end-to-end inference DT X Y hc W hc W hc + + 9 9

HELU: A New Nonlinear Neuron hc: tolerate large values, strongly penalize small values HELU: compared with logistic, sigmoid & ReLU Discontinuous & hard to train with stochastic (sub)gradient HELU HELUd HELUd: close to HELU when d goes to 0; dynamic during train 10 10

Case Study: Deep M-Sparse L0 Encoder M-sparse constrained least squares Y = argminY ||X-DY||2, s.t. ||Y||0 ≤ M PGD: Yk+1 = pM(DTX+(I-DTD)Yk) = pM(DTX+WYk) O1 O2 DT pM + + X Y X Y O3 W Similar train & inference as in deep L0-regularized encoder DT X Y pM W pM W pM + + 11 11

Interpreting Max-M Pooling/Unpooling pM: keep coefficients with the top M largest absolute values Indeed, the well-known max-M pooling/unpooling operator Explains its success in deep learning: sparse representation Max-2 Pooling Unpooling Compare Max-M pooling/unpooling & HELU Different sparsification approaches Exact sparsity level or trained through samples 12 12

Implementation Issues Use (small scale) sparse coding to initialize deep learning Simplify initialization of deep learning Make sparse coding scalable Train & test data follow the same distribution: no magic here 13 13

Numerical Experiments MNIST dataset: 60,000 for train & 10,000 for test Outperform iterative sparse coding & existing deep networks L0-regularized encoder learns regularization parameter M-sparse L0 encoder incorporates prior of sparsity level M 14 14

Concluding Remarks Bridge deep learning & sparse coding Explain & exploit structure design of deep learning Incorporate problem-level prior & interpret neurons Give an effective initialization strategy for deep learning A general coding scheme including sparse & nonsparse? Deep L1 encoder (LeCun et al 2012) Learning sparse & low-rank (Sprechmann et al 2015) Laplacian regularization (Wang et al 2015) Design & explanation of more general encoders 15 15

Thank you for your attention 凌青 中国科学技术大学自动化系 http://home.ustc.edu.cn/~qingling qingling@mail.ustc.edu.cn 16