Lecture 8 Why deep? We explain deep learning from two aspects

Lecture 8 Why deep? We explain deep learning from two aspects
Experimental evidence Theoretical proof

1. Experiments show deeper is better
Layer X Size Word Error Rate (%) 1 X 2k 24.2 2 X 2k 20.4 3 X 2k 18.4 4 X 2k 17.8 5 X 2k 17.2 1 X 3772 22.5 7 X 2k 17.1 1 X 4634 22.6 1 X 16k 22.1 Not surprised, more parameters, better performance? Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech

Fat + Short v.s. Thin + Tall
The same number of parameters …… Shallow Which one is better? …… Deep

Fat + Short v.s. Thin + Tall
Layer X Size Word Error Rate (%) 1 X 2k 24.2 2 X 2k 20.4 3 X 2k 18.4 4 X 2k 17.8 5 X 2k 17.2 1 X 3772 22.5 7 X 2k 17.1 1 X 4634 22.6 1 X 16k 22.1 Why? Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech

They call this “modularization”
Classifier 1 Girls with long hair Classifier 2 Boys with long hair Image weak Lacking data Classifier 3 Girls with short hair Classifier 4 Boys with short hair

Modularization Intuitive example:
Each basic classifier can have sufficient training examples. Intuitive example: Boy or Girl? Image Basic Classifier Long or short? Classifiers for the attributes

Modularization or deeper reasons?
can be trained by little data Classifier 1 Girls with long hair Boy or Girl? Classifier 2 Boys with long hair Image fine Little data Basic Classifier Classifier 3 Girls with short hair Long or short? Classifier 4 Boys with short hair Sharing by the following classifiers as module

Hidden nodes, modularization, features
→ Less training data? …… The modularization or hidden nodes is automatically learned Poepel said AI＝　ｄｅｅｐ　＋　ｂｉｇｄａｔａ Bigdata -> deep Deep is not for big data Deep Learning also works on small data set. Need less data!!!!! The most basic classifiers Use 1st layer as module to build classifiers Use 2nd layer as module ……

Image understanding Levels of features …… The most basic classifiers
Less training data The most basic classifiers Use 1st layer as module to build classifiers Use 2nd layer as module for objects Reference: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp )

2. Theoretical proof: Deeper is better M
2. Theoretical proof: Deeper is better M. Telgarsky: The benefit of depth in neural networks, 2016. We give an informal argument of the Telgarsky’s proof. Claim 1. Few oscillations can’t fit many oscillations. Proof by picture. Stars mark disagree regions * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Claim 2.ReLU can make exponentially many oscillations
(1/2,1) ReLU(x) := max {0,x}, and Let h(x) := ReLU(ReLU(2x) – ReLU(4x-2)) (0,0) (1,0) h(x) 2x x ε [0, ½] 2(1-x) x ε [½, 1] otherwise h(x) = h  h  h (x) h  h (x) h has 1 peak  hk has 2k-1 peaks.

Claim 3. Few layers implies few oscillations.
g f is s-affine, g is t-affine s-affine + t-affine ≤ (s+t−1)-affine same layer s-affine  t-affine ≤ (st)-affine composition = next layer ReLU is 2-affine, after k levels it is exp(k) affine. Hence with O(1) layers, one needs exponentially many nodes to approximate k layers.

What does this mean? There exists a function that can be learned by a “deep” neural network with a polynomial number of nodes, but it needs exponentially many nodes for any “shallow” neural network. Open Question: However, this only says there exists a function, but does not tell us what function. This function might be something we do not care.

Lecture 8 Why deep? We explain deep learning from two aspects

Similar presentations

Presentation on theme: "Lecture 8 Why deep? We explain deep learning from two aspects"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 8 Why deep? We explain deep learning from two aspects

Similar presentations

Presentation on theme: "Lecture 8 Why deep? We explain deep learning from two aspects"— Presentation transcript:

Similar presentations

About project

Feedback