Download presentation
Presentation is loading. Please wait.
1
CSCE 2017 ICAI 2017 Las Vegas July. 17
2
Loading Discriminative Feature Representations in Hidden Layer
Daw-Ran Liou Yang-En Chen Cheng-Yuan Liou Dept of Computer Sci and Information Eng National Taiwan University
3
Hinton, et al The βoptimal spaced codesβ is unclear and can hardly be accomplished by any learning algorithms for reduced Boltzmann machine.
4
Loading similar features. (partial plus global features)
Deep learning Loading similar features. (partial plus global features)
5
Proposed object fctn: different classes
πΈ πππ =β π 1 π π 2 π π=1 π π¦ π π 1 β π¦ π π
6
Object fctn: same class
πΈ ππ‘π‘ = π π 1 =1 π π π π 2 =1 π π π π² π π 1 , π² π π = π π 1 =1 π π π π 2 =1 π π πΈ π π 1 π π 2
7
Similar architecture
8
Becker and Hinton, 1992
9
Becker and Hinton, 1992 mutual information two modules
10
Proposed: single module
11
Three hidden layers π¦ π§ π π π π π π’ π£ π€
12
Different class patterns
πΈ πππ =β π 1 =1 π π 2 =1 π π=1 πΌ π π π 1 β π π π
13
Training formulas β π€ ππ = πΏ π π π 1 π§ π π 1 β πΏ π π π 2 π§ π π 2 β π£ ππ = πΏ π§ π π 1 π¦ π π 1 β πΏ π§ π π 2 π¦ π π 2 β π’ ππ = πΏ π¦ π π 1 π₯ π π 1 β πΏ π¦ π π 2 π₯ π π 2
14
52 patterns 16X16 pixels (26+26)
15
Sorted minimum Hamming distances for the 52 representations
16
-output 1- obtained by orthogonal initial weights.
The minimum distances for all patterns are all less than 90 (the curve marked with -input-). Single layer -output 1- obtained by orthogonal initial weights. -output 2- obtained by small random initial weights. 3 hidden layers -output 3- is obtained by orthogonal initial weights -output 4- is obtained by random initial weights
17
Sorted maximum Hamming distance between a representation and all others
18
Sorted averaged Hamming distance for each representation
19
Restoration of noisy patterns
20
Restoration
21
Single layer Set weights as logic combinations of two patterns
22
Logic combinations of two patterns as discriminative weights Wij
23
Object fctn: two different patterns
πΈ πππ =β π 1 π π 2 π π=1 π π¦ π π 1 β π¦ π π
24
Two different patterns white pixel represented by 1 black pixel represented by -1
25
Define four logic operations
Not: β1,1 16Γ16 β β1,1 16Γ Not π΄ = Not π΄ = π
, π€βπππ π
ππ =β π΄ ππ βπ,π=1,β¦,16
26
Define four logic operations
Or: β1,1 16Γ16 Γ β1,1 16Γ16 β β1,1 16Γ Or π΄ , π΅ = π΄ Or π΅ = π
, π€βπππ π
ππ = max π΄ ππ , π΅ ππ βπ,π=1,β¦,16
27
Define four logic operations
And: β1,1 16Γ16 Γ β1,1 16Γ16 β β1,1 16Γ16 And π΄ , π΅ = π΄ And π΅ = π
, π€βπππ π
ππ = min π΄ ππ , π΅ ππ βπ,π=1,β¦,16
28
Define four logic operations
Xor: β1,1 16Γ16 Γ β1,1 16Γ16 β β1,1 16Γ16 Xor π΄ , π΅ = π΄ Xor π΅ = π
= π΄ And Not π΅ Or π΅ And Not π΄
29
Total 16 logic combinations of {βAβ,βBβ}
30
Total 16 logic combinations of {βAβ,βBβ}
31
Total 16 logic combinations of {βAβ,βBβ}
32
Total 16 logic combinations of {βAβ,βBβ}
33
Total 16 logic combinations of {βAβ,βBβ}
34
Set the 256 weights as one of the 16 combinations Output value E of the output neuron
35
Pre-Activation: Difference
# Function of π΄ and π΅ Pre-Activation: A Pre-Activation: B Pre-Activation: Difference Sigmoid: A B Difference π π° π π° π β π± π΄ = π=1 π π€ ππ π₯ π π΄ π° π β π± π΅ = π=1 π π€ ππ π₯ π π΅ π° π β π± π΄ β π° π β π± π΅ π¦ π π΄ π¦ π π΅ π¦ π π΄ β π¦ π π΅ 1 π΄ And Not π΄ 186 152 34 0.621 0.533 0.088 2 Not π΄ Or π΅ -166 -200 -0.571 -0.653 0.083 3 π΅ And Not π΄ 96 242 -146 0.358 0.738 -0.379 4 Not π΄ -256 -110 -0.762 -0.405 -0.357 5 π΄ And Not π΅ 146 0.379 6 Not π΅ 0.357 7 π΄ Xor π΅ -34 -0.088 8 Not π΄ And π΅ -0.083 9 π΄ And π΅ 200 166 0.653 0.571 10 Not π΄ Xor π΅ -152 -186 -0.533 -0.621 11 π΅ 110 256 0.405 0.762 12 π΅ Or Not π΄ -242 -96 -0.738 -0.358 13 π΄ 14 π΄ Or Not π΅ 15 π΄ Or π΅ 16 π΄ Or Not π΄
36
π° β² = π΅ And Not π΄ , A And Not B , π΅ Or Not π΄ , A Or Not B π΅ And Not π΄ , A And Not B , π΅ Or Not π΄ , A Or Not B π° π = π΅ And Not π΄ π° π = A And Not B π° ππ = π΅ Or Not π΄ π° ππ = A Or Not B
37
Figures 3-8 black-white images black for -1 and white for 1 red-black-green figures intensity of green for the values from 0 to 1 black for 0 intensity of red for the values from 0 to -1
38
Single layer Ten hidden neurons
39
Similarity of two images [U] and [V]
40
Initial weights with random numbers [-1,1]
Trained weight matrix in upper row Its similar function/16 in bottom row Trained weights similar to discriminative fctns 3,12.
41
Initial weights with random numbers [-1,1]
Row # 1. initial weights 2. most similar logic functions 3. (yA-yB)^2 / 4 : 1 (green) or 0 (black)
44
Initial weights with random numbers [-1,1]
Row # 1. initial weights 2. Logic functionsδΈζηΈδΌΌηfunction 3. (yA-yB)^2 / 4 : εΌηΊ1 (green) or 0 (black) 4. Similarity to A-B
47
Initial weights in Wβ Weight unchanged during training (technique problem with ~hard limited fctn)
48
Initial weights in Wβ Row # 1. Initial weights 2. Trained weights
3. Logic functionsδΈζηΈδΌΌηfunction 4. (yA-yB)^2 / 4 5. Similarity to A-B
51
Initial weights in 0.001Wβ Bottom row, similar discriminative fctns 3,5,12,14. Trained weights=[A]-[B] or =its negative version.
52
Initial weights in 0.001Wβ Row # 1. Initial weights 2. Trained weights
3. most similar logic functions + similarities 4. (yA-yB)^2 / 4 5. Similarity to A-B
55
Initial W: small random number
Bottom row: similar fctns #3,5,14; Trained weights = [A]-[B] or its negative version.
56
Initial W: small random number between [-0.01, 0.01]
Row # 1. Trained weights 2. most similar logic functions + similarities 3. (yA-yB)^2 / 4 4. Similarity to A-B
59
Initial weights as [A]-[B] or negative version
Optimal discriminative weights for distinguishing {A ,B}. Weights unchanged.
60
Initial weights as [A]-[B] or negative version
Row # 1. initial weights 2. Logic functionsδΈζηΈδΌΌηfunction 3. (yA-yB)^2 / 4 : ε
¨ι¨ηΊ 1 (green)
63
Autoencoder (BP) Autoencoder :Β Train an autoencoder - MATLAB trainAutoencoder.
64
Autoencoder (BP) Bottom row: several similar features of the -10- hidden neurons
65
Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row #
1. Trained weights 2. Logic functionsδΈζηΈδΌΌηfunction 3. (yA-yB)^2 / 4 4. Similarity to A-B
68
Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row #
1. Trained weights 2. Logic functionsδΈζηΈδΌΌηfunction 3. (yA-yB)^2 / 4
71
Exceeds that of logic operations.
[A]-[B] or [B]-[A] Optimal discriminative weights. Cannot be reached by logic combinations.
72
Optimal weights for [A] and [B] with three pixels
73
Optimal discriminative weights See Fig
Optimal discriminative weights See Fig.1 in: Cheng-Yuan Liou (2006), Backbone structure of hairy memory, ICANN, The 16th International Conference on Artificial Neural Networks, September 10-14, in edited book published by LNCS 4131, Part I, pp
74
Biological plausibility
75
Biological plausibility
Hebbian learning
76
Resemble Hebbian learning
Single layer
77
Similar hypothesis Covariance hypothesis Sejnowski TJ, 1977
Covariance hypothesis Sejnowski TJ, 1977 Sejnowski TJ, 1997
78
Have a nice day. Code in website
79
eq2 π πΈ π1π2 π π€ ππ =β π=1 π π¦ π π 1 β π¦ π π π π¦ π π ππ€ ππ β π π¦ π π ππ€ ππ =β π¦ π π 1 β π¦ π π 2 Γ 1β π¦ π π π₯ π π 1 β 1β π¦ π π π₯ π π 2
80
Eq 2 (continued) π¦ π =π πππ‘ π πππ‘ π = π=1 π π€ ππ π₯ π π πππ‘ π = tanh 0.5 πππ‘ π = 1β exp β πππ‘ π 1+ exp β πππ‘ π
81
Eq 2 (continued) π( πππ‘ π ( π 1 ) ) π π€ ππ = π( πππ‘ π ( π 2 ) ) π π€ ππ =0, for πβ π updating equations for the weights π€ ππ β π€ ππ βπ ππΈ π π€ ππ
82
π€ π π+1 β π€ π π+1 + π 2 π¦ π π 1 β π¦ π π 2 π¦ π π 1 2 β π¦ π π 2 2
83
Eq 4 π πΈ π π 1 π π 2 π π€ ππ = 1β π¦ π π π π₯ π π π 1 β 1β π¦ π π π π₯ π π π β π¦ π π π π₯ π π π 1 β 1β π¦ π π π π₯ π π π Γ π¦ π π π 1 β π¦ π π π 2
84
Eq 4 (continued) πΈ π π 1 π π 2 = π π² π π 1 , π² π π = 1 2 π=1 π π¦ π π π 1 β π¦ π π π
85
Eq 5 π€ ππ β π€ ππ βπ π πΈ π π 1 π π 2 π π€ ππ
86
π¦ π π΄ =π‘ππβ π=1 π=256 π€ ππ π₯ π π΄ , βπ=1,β¦,π
87
πΈ= πΈ π΄π΅ πππ = 1 2 π=1 π=10 π¦ π π΄ β π¦ π π΅ 2
88
π π , π = π± π π± π β π± π π± π
89
πΉ π π = ππΈ π π π π π π π πππ‘ π ππ is obtained much as in Eq.2
90
πΉ π π π 1 = π π π 1 β π π π 2 1 2 1β π π π 1 2 πΉ π π π 2 = π π π 1 β π π π 2 1 2 1β π π π 2 2
91
πΏ π§ π ( π 1 ) = 1 2 1β π§ π ( π 1 ) 2 π πΏ π π ( π 1 ) π€ ππ πΏ π§ π ( π 2 ) = 1 2 1β π§ π ( π 2 ) 2 π πΏ π π ( π 2 ) π€ ππ πΏ π¦ π ( π 1 ) = 1 2 1β π¦ π ( π 1 ) 2 π πΏ π§ π ( π 1 ) π£ ππ πΏ π¦ π ( π 2 ) = 1 2 1β π¦ π ( π 2 ) 2 π πΏ π§ π ( π 2 ) π£ ππ
92
ch5 {{ π£ ππ β π£ ππ +π π₯ π π β π₯ π β² 1β π₯ π β² π¦ π π }}
93
Ch5 eq7 π€ ππ π+1 β π€ ππ π +π π¦ π π 1 π β π¦ π π 2 π π₯ π π 1 π β π₯ π π 2 π π₯ π π 1 π β π₯ π π 2 π
94
Ch5 eq8 π€ ππ π+1 β π€ ππ π +π π¦ π π π₯ π π
95
Ch5 eq9 π€ ππ π+1 β π€ ππ π +π π¦ π π β π¦ π π₯ π π β π₯ π
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.