CSCE 2017 ICAI 2017 Las Vegas July. 17
Loading Discriminative Feature Representations in Hidden Layer Daw-Ran Liou Yang-En Chen Cheng-Yuan Liou Dept of Computer Sci and Information Eng National Taiwan University
Hinton, et al 2006 The “optimal spaced codes” is unclear and can hardly be accomplished by any learning algorithms for reduced Boltzmann machine.
Loading similar features. (partial plus global features) Deep learning Loading similar features. (partial plus global features)
Proposed object fctn: different classes 𝐸 𝑟𝑒𝑝 =− 1 2 𝑝 1 𝑝 𝑝 2 𝑝 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝 2 2
Object fctn: same class 𝐸 𝑎𝑡𝑡 = 1 2 𝑝 𝑘 1 =1 𝑃 𝑘 𝑝 𝑘 2 =1 𝑃 𝑘 𝑑 𝐲 𝑝 𝑘 1 , 𝐲 𝑝 𝑘 2 2 = 𝑝 𝑘 1 =1 𝑃 𝑘 𝑝 𝑘 2 =1 𝑃 𝑘 𝐸 𝑝 𝑘 1 𝑝 𝑘 2
Similar architecture
Becker and Hinton, 1992
Becker and Hinton, 1992 mutual information two modules
Proposed: single module
Three hidden layers 𝑦 𝑧 𝑜 𝑙 𝑘 𝑗 𝑖 𝑢 𝑣 𝑤
Different class patterns 𝐸 𝑟𝑒𝑝 =− 1 2 𝑝 1 =1 𝑃 𝑝 2 =1 𝑃 𝑖=1 𝐼 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 2
Training formulas ∆ 𝑤 𝑖𝑗 = 𝛿 𝑜 𝑖 𝑝 1 𝑧 𝑗 𝑝 1 − 𝛿 𝑜 𝑖 𝑝 2 𝑧 𝑗 𝑝 2 ∆ 𝑣 𝑗𝑘 = 𝛿 𝑧 𝑗 𝑝 1 𝑦 𝑘 𝑝 1 − 𝛿 𝑧 𝑗 𝑝 2 𝑦 𝑘 𝑝 2 ∆ 𝑢 𝑘𝑙 = 𝛿 𝑦 𝑘 𝑝 1 𝑥 𝑙 𝑝 1 − 𝛿 𝑦 𝑘 𝑝 2 𝑥 𝑙 𝑝 2
52 patterns 16X16 pixels (26+26)
Sorted minimum Hamming distances for the 52 representations
-output 1- obtained by orthogonal initial weights. The minimum distances for all patterns are all less than 90 (the curve marked with -input-). Single layer -output 1- obtained by orthogonal initial weights. -output 2- obtained by small random initial weights. 3 hidden layers -output 3- is obtained by orthogonal initial weights -output 4- is obtained by random initial weights
Sorted maximum Hamming distance between a representation and all others
Sorted averaged Hamming distance for each representation
Restoration of noisy patterns
Restoration
Single layer Set weights as logic combinations of two patterns
Logic combinations of two patterns as discriminative weights Wij
Object fctn: two different patterns 𝐸 𝑟𝑒𝑝 =− 1 2 𝑝 1 𝑝 𝑝 2 𝑝 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝 2 2
Two different patterns white pixel represented by 1 black pixel represented by -1
Define four logic operations Not: −1,1 16×16 → −1,1 16×16 Not 𝐴 = Not 𝐴 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 =− 𝐴 𝑖𝑗 ∀𝑖,𝑗=1,…,16
Define four logic operations Or: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 Or 𝐴 , 𝐵 = 𝐴 Or 𝐵 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 = max 𝐴 𝑖𝑗 , 𝐵 𝑖𝑗 ∀𝑖,𝑗=1,…,16
Define four logic operations And: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 And 𝐴 , 𝐵 = 𝐴 And 𝐵 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 = min 𝐴 𝑖𝑗 , 𝐵 𝑖𝑗 ∀𝑖,𝑗=1,…,16
Define four logic operations Xor: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 Xor 𝐴 , 𝐵 = 𝐴 Xor 𝐵 = 𝑅 = 𝐴 And Not 𝐵 Or 𝐵 And Not 𝐴
Total 16 logic combinations of {“A”,“B”}
Total 16 logic combinations of {“A”,“B”}
Total 16 logic combinations of {“A”,“B”}
Total 16 logic combinations of {“A”,“B”}
Total 16 logic combinations of {“A”,“B”}
Set the 256 weights as one of the 16 combinations Output value E of the output neuron
Pre-Activation: Difference # Function of 𝐴 and 𝐵 Pre-Activation: A Pre-Activation: B Pre-Activation: Difference Sigmoid: A B Difference 𝑖 𝐰 𝑖 𝐰 𝑖 ∙ 𝐱 𝐴 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝐴 𝐰 𝑖 ∙ 𝐱 𝐵 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝐵 𝐰 𝑖 ∙ 𝐱 𝐴 − 𝐰 𝑖 ∙ 𝐱 𝐵 𝑦 𝑖 𝐴 𝑦 𝑖 𝐵 𝑦 𝑖 𝐴 − 𝑦 𝑖 𝐵 1 𝐴 And Not 𝐴 186 152 34 0.621 0.533 0.088 2 Not 𝐴 Or 𝐵 -166 -200 -0.571 -0.653 0.083 3 𝐵 And Not 𝐴 96 242 -146 0.358 0.738 -0.379 4 Not 𝐴 -256 -110 -0.762 -0.405 -0.357 5 𝐴 And Not 𝐵 146 0.379 6 Not 𝐵 0.357 7 𝐴 Xor 𝐵 -34 -0.088 8 Not 𝐴 And 𝐵 -0.083 9 𝐴 And 𝐵 200 166 0.653 0.571 10 Not 𝐴 Xor 𝐵 -152 -186 -0.533 -0.621 11 𝐵 110 256 0.405 0.762 12 𝐵 Or Not 𝐴 -242 -96 -0.738 -0.358 13 𝐴 14 𝐴 Or Not 𝐵 15 𝐴 Or 𝐵 16 𝐴 Or Not 𝐴
𝐰 ′ = 𝐵 And Not 𝐴 , A And Not B , 𝐵 Or Not 𝐴 , A Or Not B 𝐵 And Not 𝐴 , A And Not B , 𝐵 Or Not 𝐴 , A Or Not B 𝐰 𝟑 = 𝐵 And Not 𝐴 𝐰 𝟓 = A And Not B 𝐰 𝟏𝟐 = 𝐵 Or Not 𝐴 𝐰 𝟏𝟒 = A Or Not B
Figures 3-8 black-white images black for -1 and white for 1 red-black-green figures intensity of green for the values from 0 to 1 black for 0 intensity of red for the values from 0 to -1
Single layer Ten hidden neurons
Similarity of two images [U] and [V]
Initial weights with random numbers [-1,1] Trained weight matrix in upper row Its similar function/16 in bottom row Trained weights similar to discriminative fctns 3,12.
Initial weights with random numbers [-1,1] Row # 1. initial weights 2. most similar logic functions 3. (yA-yB)^2 / 4 : 1 (green) or 0 (black)
Initial weights with random numbers [-1,1] Row # 1. initial weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 : 值為1 (green) or 0 (black) 4. Similarity to A-B
Initial weights in W’ Weight unchanged during training (technique problem with ~hard limited fctn)
Initial weights in W’ Row # 1. Initial weights 2. Trained weights 3. Logic functions中最相似的function 4. (yA-yB)^2 / 4 5. Similarity to A-B
Initial weights in 0.001W’ Bottom row, similar discriminative fctns 3,5,12,14. Trained weights=[A]-[B] or =its negative version.
Initial weights in 0.001W’ Row # 1. Initial weights 2. Trained weights 3. most similar logic functions + similarities 4. (yA-yB)^2 / 4 5. Similarity to A-B
Initial W: small random number Bottom row: similar fctns #3,5,14; Trained weights = [A]-[B] or its negative version.
Initial W: small random number between [-0.01, 0.01] Row # 1. Trained weights 2. most similar logic functions + similarities 3. (yA-yB)^2 / 4 4. Similarity to A-B
Initial weights as [A]-[B] or negative version Optimal discriminative weights for distinguishing {A ,B}. Weights unchanged.
Initial weights as [A]-[B] or negative version Row # 1. initial weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 : 全部為 1 (green)
Autoencoder 256-10-256 (BP) Autoencoder : Train an autoencoder - MATLAB trainAutoencoder.
Autoencoder 256-10-256 (BP) Bottom row: several similar features of the -10- hidden neurons
Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row # 1. Trained weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 4. Similarity to A-B
Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row # 1. Trained weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4
Exceeds that of logic operations. [A]-[B] or [B]-[A] Optimal discriminative weights. Cannot be reached by logic combinations.
Optimal weights for [A] and [B] with three pixels
Optimal discriminative weights See Fig Optimal discriminative weights See Fig.1 in: Cheng-Yuan Liou (2006), Backbone structure of hairy memory, ICANN, The 16th International Conference on Artificial Neural Networks, September 10-14, in edited book published by LNCS 4131, Part I, pp 688-697
Biological plausibility
Biological plausibility Hebbian learning
Resemble Hebbian learning Single layer
Similar hypothesis Covariance hypothesis Sejnowski TJ, 1977 Covariance hypothesis Sejnowski TJ, 1977 Sejnowski TJ, 1997
Have a nice day. Code in website http://red.csie.ntu.edu.tw/NN/Classinfo/classinfo_eng.html
eq2 𝜕 𝐸 𝑝1𝑝2 𝜕 𝑤 𝑖𝑗 =− 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝 2 𝜕 𝑦 𝑚 𝑝 1 𝜕𝑤 𝑖𝑗 − 𝜕 𝑦 𝑚 𝑝 2 𝜕𝑤 𝑖𝑗 =− 1 2 𝑦 𝑖 𝑝 1 − 𝑦 𝑖 𝑝 2 × 1− 𝑦 𝑖 𝑝 1 2 𝑥 𝑗 𝑝 1 − 1− 𝑦 𝑖 𝑝 2 2 𝑥 𝑗 𝑝 2
Eq 2 (continued) 𝑦 𝑖 =𝑓 𝑛𝑒𝑡 𝑖 𝑛𝑒𝑡 𝑖 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝑓 𝑛𝑒𝑡 𝑖 = tanh 0.5 𝑛𝑒𝑡 𝑖 = 1− exp − 𝑛𝑒𝑡 𝑖 1+ exp − 𝑛𝑒𝑡 𝑖
Eq 2 (continued) 𝜕( 𝑛𝑒𝑡 𝑚 ( 𝑝 1 ) ) 𝜕 𝑤 𝑖𝑗 = 𝜕( 𝑛𝑒𝑡 𝑚 ( 𝑝 2 ) ) 𝜕 𝑤 𝑖𝑗 =0, for 𝑚≠𝑖 updating equations for the weights 𝑤 𝑖𝑗 ← 𝑤 𝑖𝑗 −𝜂 𝜕𝐸 𝜕 𝑤 𝑖𝑗
𝑤 𝑖 𝑁+1 ← 𝑤 𝑖 𝑁+1 + 𝜂 2 𝑦 𝑖 𝑝 1 − 𝑦 𝑖 𝑝 2 𝑦 𝑖 𝑝 1 2 − 𝑦 𝑖 𝑝 2 2
Eq 4 𝜕 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 𝜕 𝑤 𝑖𝑗 = 1− 𝑦 𝑖 𝑝 𝑘 1 2 𝑥 𝑗 𝑝 𝑘 1 − 1− 𝑦 𝑖 𝑝 𝑘 2 2 𝑥 𝑗 𝑝 𝑘 2 1− 𝑦 𝑖 𝑝 𝑘 1 2 𝑥 𝑗 𝑝 𝑘 1 − 1− 𝑦 𝑖 𝑝 𝑘 2 2 𝑥 𝑗 𝑝 𝑘 2 × 𝑦 𝑖 𝑝 𝑘 1 − 𝑦 𝑖 𝑝 𝑘 2
Eq 4 (continued) 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 = 1 2 𝑑 𝐲 𝑝 𝑘 1 , 𝐲 𝑝 𝑘 2 2 = 1 2 𝑖=1 𝑀 𝑦 𝑖 𝑝 𝑘 1 − 𝑦 𝑖 𝑝 𝑘 2 2
Eq 5 𝑤 𝑖𝑗 ← 𝑤 𝑖𝑗 −𝜂 𝜕 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 𝜕 𝑤 𝑖𝑗
𝑦 𝑖 𝐴 =𝑡𝑎𝑛ℎ 𝑗=1 𝑁=256 𝑤 𝑖𝑗 𝑥 𝑗 𝐴 , ∀𝑖=1,…,𝑀
𝐸= 𝐸 𝐴𝐵 𝑟𝑒𝑝 = 1 2 𝑖=1 𝑀=10 𝑦 𝑖 𝐴 − 𝑦 𝑖 𝐵 2
𝑠 𝑈 , 𝑉 = 𝐱 𝑈 𝐱 𝑈 ∙ 𝐱 𝑉 𝐱 𝑉
𝜹 𝑜 𝑖 = 𝜕𝐸 𝜕 𝑜 𝑖 𝜕 𝑜 𝑖 𝜕 𝑛𝑒𝑡 𝑖 𝒐𝑖 is obtained much as in Eq.2
𝜹 𝑜 𝑖 𝑝 1 = 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 1 2 1− 𝑜 𝑖 𝑝 1 2 𝜹 𝑜 𝑖 𝑝 2 = 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 1 2 1− 𝑜 𝑖 𝑝 2 2
𝛿 𝑧 𝑗 ( 𝑝 1 ) = 1 2 1− 𝑧 𝑗 ( 𝑝 1 ) 2 𝑟 𝛿 𝑜 𝑟 ( 𝑝 1 ) 𝑤 𝑟𝑗 𝛿 𝑧 𝑗 ( 𝑝 2 ) = 1 2 1− 𝑧 𝑗 ( 𝑝 2 ) 2 𝑟 𝛿 𝑜 𝑟 ( 𝑝 2 ) 𝑤 𝑟𝑗 𝛿 𝑦 𝑘 ( 𝑝 1 ) = 1 2 1− 𝑦 𝑘 ( 𝑝 1 ) 2 𝑟 𝛿 𝑧 𝑟 ( 𝑝 1 ) 𝑣 𝑟𝑘 𝛿 𝑦 𝑘 ( 𝑝 2 ) = 1 2 1− 𝑦 𝑘 ( 𝑝 2 ) 2 𝑟 𝛿 𝑧 𝑟 ( 𝑝 2 ) 𝑣 𝑟𝑘
ch5 {{ 𝑣 𝑘𝑖 ← 𝑣 𝑘𝑖 +𝜂 1 2 𝑥 𝑘 𝑝 − 𝑥 𝑘 ′ 1− 𝑥 𝑘 ′ 2 𝑦 𝑖 𝑝 }}
Ch5 eq7 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑝 1 𝑛 − 𝑦 𝑖 𝑝 2 𝑛 𝑥 𝑖 𝑝 1 𝑛 − 𝑥 𝑖 𝑝 2 𝑛 𝑥 𝑖 𝑝 1 𝑛 − 𝑥 𝑖 𝑝 2 𝑛
Ch5 eq8 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑛 𝑥 𝑗 𝑛
Ch5 eq9 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑛 − 𝑦 𝑛 𝑥 𝑖 𝑛 − 𝑥 𝑛