CSCE 2017 ICAI 2017 Las Vegas July. 17.

CSCE 2017 ICAI 2017 Las Vegas July. 17

Loading Discriminative Feature Representations in Hidden Layer
Daw-Ran Liou Yang-En Chen Cheng-Yuan Liou Dept of Computer Sci and Information Eng National Taiwan University

Hinton, et al The “optimal spaced codes” is unclear and can hardly be accomplished by any learning algorithms for reduced Boltzmann machine.

Loading similar features. (partial plus global features)
Deep learning Loading similar features. (partial plus global features)

Proposed object fctn: different classes
𝐸 𝑟𝑒𝑝 =− 𝑝 1 𝑝 𝑝 2 𝑝 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝

Object fctn: same class
𝐸 𝑎𝑡𝑡 = 𝑝 𝑘 1 =1 𝑃 𝑘 𝑝 𝑘 2 =1 𝑃 𝑘 𝑑 𝐲 𝑝 𝑘 1 , 𝐲 𝑝 𝑘 = 𝑝 𝑘 1 =1 𝑃 𝑘 𝑝 𝑘 2 =1 𝑃 𝑘 𝐸 𝑝 𝑘 1 𝑝 𝑘 2

Similar architecture

Becker and Hinton, 1992

Becker and Hinton, 1992 mutual information two modules

Proposed: single module

Three hidden layers 𝑦 𝑧 𝑜 𝑙 𝑘 𝑗 𝑖 𝑢 𝑣 𝑤

Different class patterns
𝐸 𝑟𝑒𝑝 =− 𝑝 1 =1 𝑃 𝑝 2 =1 𝑃 𝑖=1 𝐼 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝

Training formulas ∆ 𝑤 𝑖𝑗 = 𝛿 𝑜 𝑖 𝑝 1 𝑧 𝑗 𝑝 1 − 𝛿 𝑜 𝑖 𝑝 2 𝑧 𝑗 𝑝 2 ∆ 𝑣 𝑗𝑘 = 𝛿 𝑧 𝑗 𝑝 1 𝑦 𝑘 𝑝 1 − 𝛿 𝑧 𝑗 𝑝 2 𝑦 𝑘 𝑝 2 ∆ 𝑢 𝑘𝑙 = 𝛿 𝑦 𝑘 𝑝 1 𝑥 𝑙 𝑝 1 − 𝛿 𝑦 𝑘 𝑝 2 𝑥 𝑙 𝑝 2

52 patterns 16X16 pixels (26+26)

Sorted minimum Hamming distances for the 52 representations

-output 1- obtained by orthogonal initial weights.
The minimum distances for all patterns are all less than 90 (the curve marked with -input-). Single layer -output 1- obtained by orthogonal initial weights. -output 2- obtained by small random initial weights. 3 hidden layers -output 3- is obtained by orthogonal initial weights -output 4- is obtained by random initial weights

Sorted maximum Hamming distance between a representation and all others

Sorted averaged Hamming distance for each representation

Restoration of noisy patterns

Restoration

Single layer Set weights as logic combinations of two patterns

Logic combinations of two patterns as discriminative weights Wij

Object fctn: two different patterns
𝐸 𝑟𝑒𝑝 =− 𝑝 1 𝑝 𝑝 2 𝑝 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝

Two different patterns white pixel represented by 1 black pixel represented by -1

Define four logic operations
Not: −1,1 16×16 → −1,1 16× Not 𝐴 = Not 𝐴 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 =− 𝐴 𝑖𝑗 ∀𝑖,𝑗=1,…,16

Or: −1,1 16×16 × −1,1 16×16 → −1,1 16× Or 𝐴 , 𝐵 = 𝐴 Or 𝐵 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 = max 𝐴 𝑖𝑗 , 𝐵 𝑖𝑗 ∀𝑖,𝑗=1,…,16

And: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 And 𝐴 , 𝐵 = 𝐴 And 𝐵 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 = min 𝐴 𝑖𝑗 , 𝐵 𝑖𝑗 ∀𝑖,𝑗=1,…,16

Xor: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 Xor 𝐴 , 𝐵 = 𝐴 Xor 𝐵 = 𝑅 = 𝐴 And Not 𝐵 Or 𝐵 And Not 𝐴

Total 16 logic combinations of {“A”,“B”}

Set the 256 weights as one of the 16 combinations Output value E of the output neuron

Pre-Activation: Difference
# Function of 𝐴 and 𝐵 Pre-Activation: A Pre-Activation: B Pre-Activation: Difference Sigmoid: A B Difference 𝑖 𝐰 𝑖 𝐰 𝑖 ∙ 𝐱 𝐴 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝐴 𝐰 𝑖 ∙ 𝐱 𝐵 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝐵 𝐰 𝑖 ∙ 𝐱 𝐴 − 𝐰 𝑖 ∙ 𝐱 𝐵 𝑦 𝑖 𝐴 𝑦 𝑖 𝐵 𝑦 𝑖 𝐴 − 𝑦 𝑖 𝐵 1 𝐴 And Not 𝐴 186 152 34 0.621 0.533 0.088 2 Not 𝐴 Or 𝐵 -166 -200 -0.571 -0.653 0.083 3 𝐵 And Not 𝐴 96 242 -146 0.358 0.738 -0.379 4 Not 𝐴 -256 -110 -0.762 -0.405 -0.357 5 𝐴 And Not 𝐵 146 0.379 6 Not 𝐵 0.357 7 𝐴 Xor 𝐵 -34 -0.088 8 Not 𝐴 And 𝐵 -0.083 9 𝐴 And 𝐵 200 166 0.653 0.571 10 Not 𝐴 Xor 𝐵 -152 -186 -0.533 -0.621 11 𝐵 110 256 0.405 0.762 12 𝐵 Or Not 𝐴 -242 -96 -0.738 -0.358 13 𝐴 14 𝐴 Or Not 𝐵 15 𝐴 Or 𝐵 16 𝐴 Or Not 𝐴

𝐰 ′ = 𝐵 And Not 𝐴 , A And Not B , 𝐵 Or Not 𝐴 , A Or Not B 𝐵 And Not 𝐴 , A And Not B , 𝐵 Or Not 𝐴 , A Or Not B 𝐰 𝟑 = 𝐵 And Not 𝐴 𝐰 𝟓 = A And Not B 𝐰 𝟏𝟐 = 𝐵 Or Not 𝐴 𝐰 𝟏𝟒 = A Or Not B

Figures 3-8 black-white images black for -1 and white for 1 red-black-green figures intensity of green for the values from 0 to 1 black for 0 intensity of red for the values from 0 to -1

Single layer Ten hidden neurons

Similarity of two images [U] and [V]

Initial weights with random numbers [-1,1]
Trained weight matrix in upper row Its similar function/16 in bottom row Trained weights similar to discriminative fctns 3,12.

Row # 1. initial weights 2. most similar logic functions 3. (yA-yB)^2 / 4 : 1 (green) or 0 (black)

Row # 1. initial weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 : 值為1 (green) or 0 (black) 4. Similarity to A-B

Initial weights in W’ Weight unchanged during training (technique problem with ~hard limited fctn)

Initial weights in W’ Row # 1. Initial weights 2. Trained weights
3. Logic functions中最相似的function 4. (yA-yB)^2 / 4 5. Similarity to A-B

Initial weights in 0.001W’ Bottom row, similar discriminative fctns 3,5,12,14. Trained weights=[A]-[B] or =its negative version.

Initial weights in 0.001W’ Row # 1. Initial weights 2. Trained weights
3. most similar logic functions + similarities 4. (yA-yB)^2 / 4 5. Similarity to A-B

Initial W: small random number
Bottom row: similar fctns #3,5,14; Trained weights = [A]-[B] or its negative version.

Initial W: small random number between [-0.01, 0.01]
Row # 1. Trained weights 2. most similar logic functions + similarities 3. (yA-yB)^2 / 4 4. Similarity to A-B

Initial weights as [A]-[B] or negative version
Optimal discriminative weights for distinguishing {A ,B}. Weights unchanged.

Initial weights as [A]-[B] or negative version
Row # 1. initial weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 : 全部為 1 (green)

Autoencoder (BP) Autoencoder : Train an autoencoder - MATLAB trainAutoencoder.

Autoencoder (BP) Bottom row: several similar features of the -10- hidden neurons

Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row #
1. Trained weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 4. Similarity to A-B

Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row #
1. Trained weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4

Exceeds that of logic operations.
[A]-[B] or [B]-[A] Optimal discriminative weights. Cannot be reached by logic combinations.

Optimal weights for [A] and [B] with three pixels

Optimal discriminative weights See Fig
Optimal discriminative weights See Fig.1 in: Cheng-Yuan Liou (2006), Backbone structure of hairy memory, ICANN, The 16th International Conference on Artificial Neural Networks, September 10-14, in edited book published by LNCS 4131, Part I, pp

Biological plausibility

Biological plausibility
Hebbian learning

Resemble Hebbian learning
Single layer

Similar hypothesis Covariance hypothesis Sejnowski TJ, 1977
Covariance hypothesis Sejnowski TJ, 1977 Sejnowski TJ, 1997

Have a nice day. Code in website

eq2 𝜕 𝐸 𝑝1𝑝2 𝜕 𝑤 𝑖𝑗 =− 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝 𝜕 𝑦 𝑚 𝑝 𝜕𝑤 𝑖𝑗 − 𝜕 𝑦 𝑚 𝑝 𝜕𝑤 𝑖𝑗 =− 𝑦 𝑖 𝑝 1 − 𝑦 𝑖 𝑝 2 × 1− 𝑦 𝑖 𝑝 𝑥 𝑗 𝑝 1 − 1− 𝑦 𝑖 𝑝 𝑥 𝑗 𝑝 2

Eq 2 (continued) 𝑦 𝑖 =𝑓 𝑛𝑒𝑡 𝑖 𝑛𝑒𝑡 𝑖 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝑓 𝑛𝑒𝑡 𝑖 = tanh 0.5 𝑛𝑒𝑡 𝑖 = 1− exp − 𝑛𝑒𝑡 𝑖 1+ exp − 𝑛𝑒𝑡 𝑖

Eq 2 (continued) 𝜕( 𝑛𝑒𝑡 𝑚 ( 𝑝 1 ) ) 𝜕 𝑤 𝑖𝑗 = 𝜕( 𝑛𝑒𝑡 𝑚 ( 𝑝 2 ) ) 𝜕 𝑤 𝑖𝑗 =0, for 𝑚≠𝑖 updating equations for the weights 𝑤 𝑖𝑗 ← 𝑤 𝑖𝑗 −𝜂 𝜕𝐸 𝜕 𝑤 𝑖𝑗

𝑤 𝑖 𝑁+1 ← 𝑤 𝑖 𝑁+1 + 𝜂 2 𝑦 𝑖 𝑝 1 − 𝑦 𝑖 𝑝 2 𝑦 𝑖 𝑝 1 2 − 𝑦 𝑖 𝑝 2 2

Eq 4 𝜕 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 𝜕 𝑤 𝑖𝑗 = 1− 𝑦 𝑖 𝑝 𝑘 𝑥 𝑗 𝑝 𝑘 1 − 1− 𝑦 𝑖 𝑝 𝑘 𝑥 𝑗 𝑝 𝑘 − 𝑦 𝑖 𝑝 𝑘 𝑥 𝑗 𝑝 𝑘 1 − 1− 𝑦 𝑖 𝑝 𝑘 𝑥 𝑗 𝑝 𝑘 × 𝑦 𝑖 𝑝 𝑘 1 − 𝑦 𝑖 𝑝 𝑘 2

Eq 4 (continued) 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 = 𝑑 𝐲 𝑝 𝑘 1 , 𝐲 𝑝 𝑘 = 1 2 𝑖=1 𝑀 𝑦 𝑖 𝑝 𝑘 1 − 𝑦 𝑖 𝑝 𝑘

Eq 5 𝑤 𝑖𝑗 ← 𝑤 𝑖𝑗 −𝜂 𝜕 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 𝜕 𝑤 𝑖𝑗

𝑦 𝑖 𝐴 =𝑡𝑎𝑛ℎ 𝑗=1 𝑁=256 𝑤 𝑖𝑗 𝑥 𝑗 𝐴 , ∀𝑖=1,…,𝑀

𝐸= 𝐸 𝐴𝐵 𝑟𝑒𝑝 = 1 2 𝑖=1 𝑀=10 𝑦 𝑖 𝐴 − 𝑦 𝑖 𝐵 2

𝑠 𝑈 , 𝑉 = 𝐱 𝑈 𝐱 𝑈 ∙ 𝐱 𝑉 𝐱 𝑉

𝜹 𝑜 𝑖 = 𝜕𝐸 𝜕 𝑜 𝑖 𝜕 𝑜 𝑖 𝜕 𝑛𝑒𝑡 𝑖 𝒐𝑖 is obtained much as in Eq.2

𝜹 𝑜 𝑖 𝑝 1 = 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 1 2 1− 𝑜 𝑖 𝑝 1 2 𝜹 𝑜 𝑖 𝑝 2 = 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 1 2 1− 𝑜 𝑖 𝑝 2 2

𝛿 𝑧 𝑗 ( 𝑝 1 ) = 1 2 1− 𝑧 𝑗 ( 𝑝 1 ) 2 𝑟 𝛿 𝑜 𝑟 ( 𝑝 1 ) 𝑤 𝑟𝑗 𝛿 𝑧 𝑗 ( 𝑝 2 ) = 1 2 1− 𝑧 𝑗 ( 𝑝 2 ) 2 𝑟 𝛿 𝑜 𝑟 ( 𝑝 2 ) 𝑤 𝑟𝑗 𝛿 𝑦 𝑘 ( 𝑝 1 ) = 1 2 1− 𝑦 𝑘 ( 𝑝 1 ) 2 𝑟 𝛿 𝑧 𝑟 ( 𝑝 1 ) 𝑣 𝑟𝑘 𝛿 𝑦 𝑘 ( 𝑝 2 ) = 1 2 1− 𝑦 𝑘 ( 𝑝 2 ) 2 𝑟 𝛿 𝑧 𝑟 ( 𝑝 2 ) 𝑣 𝑟𝑘

ch5 {{ 𝑣 𝑘𝑖 ← 𝑣 𝑘𝑖 +𝜂 𝑥 𝑘 𝑝 − 𝑥 𝑘 ′ 1− 𝑥 𝑘 ′ 𝑦 𝑖 𝑝 }}

Ch5 eq7 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑝 1 𝑛 − 𝑦 𝑖 𝑝 2 𝑛 𝑥 𝑖 𝑝 1 𝑛 − 𝑥 𝑖 𝑝 2 𝑛 𝑥 𝑖 𝑝 1 𝑛 − 𝑥 𝑖 𝑝 2 𝑛

Ch5 eq8 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑛 𝑥 𝑗 𝑛

Ch5 eq9 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑛 − 𝑦 𝑛 𝑥 𝑖 𝑛 − 𝑥 𝑛

CSCE 2017 ICAI 2017 Las Vegas July. 17.

Similar presentations

Presentation on theme: "CSCE 2017 ICAI 2017 Las Vegas July. 17."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCE 2017 ICAI 2017 Las Vegas July. 17.

Similar presentations

Presentation on theme: "CSCE 2017 ICAI 2017 Las Vegas July. 17."— Presentation transcript:

Similar presentations

About project

Feedback