Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCE 2017 ICAI 2017 Las Vegas July. 17.

Similar presentations


Presentation on theme: "CSCE 2017 ICAI 2017 Las Vegas July. 17."β€” Presentation transcript:

1 CSCE 2017 ICAI 2017 Las Vegas July. 17

2 Loading Discriminative Feature Representations in Hidden Layer
Daw-Ran Liou Yang-En Chen Cheng-Yuan Liou Dept of Computer Sci and Information Eng National Taiwan University

3 Hinton, et al The β€œoptimal spaced codes” is unclear and can hardly be accomplished by any learning algorithms for reduced Boltzmann machine.

4 Loading similar features. (partial plus global features)
Deep learning Loading similar features. (partial plus global features)

5 Proposed object fctn: different classes
𝐸 π‘Ÿπ‘’π‘ =βˆ’ 𝑝 1 𝑝 𝑝 2 𝑝 π‘š=1 𝑀 𝑦 π‘š 𝑝 1 βˆ’ 𝑦 π‘š 𝑝

6 Object fctn: same class
𝐸 π‘Žπ‘‘π‘‘ = 𝑝 π‘˜ 1 =1 𝑃 π‘˜ 𝑝 π‘˜ 2 =1 𝑃 π‘˜ 𝑑 𝐲 𝑝 π‘˜ 1 , 𝐲 𝑝 π‘˜ = 𝑝 π‘˜ 1 =1 𝑃 π‘˜ 𝑝 π‘˜ 2 =1 𝑃 π‘˜ 𝐸 𝑝 π‘˜ 1 𝑝 π‘˜ 2

7 Similar architecture

8 Becker and Hinton, 1992

9 Becker and Hinton, 1992 mutual information two modules

10 Proposed: single module

11 Three hidden layers 𝑦 𝑧 π‘œ 𝑙 π‘˜ 𝑗 𝑖 𝑒 𝑣 𝑀

12 Different class patterns
𝐸 π‘Ÿπ‘’π‘ =βˆ’ 𝑝 1 =1 𝑃 𝑝 2 =1 𝑃 𝑖=1 𝐼 π‘œ 𝑖 𝑝 1 βˆ’ π‘œ 𝑖 𝑝

13 Training formulas βˆ† 𝑀 𝑖𝑗 = 𝛿 π‘œ 𝑖 𝑝 1 𝑧 𝑗 𝑝 1 βˆ’ 𝛿 π‘œ 𝑖 𝑝 2 𝑧 𝑗 𝑝 2 βˆ† 𝑣 π‘—π‘˜ = 𝛿 𝑧 𝑗 𝑝 1 𝑦 π‘˜ 𝑝 1 βˆ’ 𝛿 𝑧 𝑗 𝑝 2 𝑦 π‘˜ 𝑝 2 βˆ† 𝑒 π‘˜π‘™ = 𝛿 𝑦 π‘˜ 𝑝 1 π‘₯ 𝑙 𝑝 1 βˆ’ 𝛿 𝑦 π‘˜ 𝑝 2 π‘₯ 𝑙 𝑝 2

14 52 patterns 16X16 pixels (26+26)

15 Sorted minimum Hamming distances for the 52 representations

16 -output 1- obtained by orthogonal initial weights.
The minimum distances for all patterns are all less than 90 (the curve marked with -input-). Single layer -output 1- obtained by orthogonal initial weights. -output 2- obtained by small random initial weights. 3 hidden layers -output 3- is obtained by orthogonal initial weights -output 4- is obtained by random initial weights

17 Sorted maximum Hamming distance between a representation and all others

18 Sorted averaged Hamming distance for each representation

19 Restoration of noisy patterns

20 Restoration

21 Single layer Set weights as logic combinations of two patterns

22 Logic combinations of two patterns as discriminative weights Wij

23 Object fctn: two different patterns
𝐸 π‘Ÿπ‘’π‘ =βˆ’ 𝑝 1 𝑝 𝑝 2 𝑝 π‘š=1 𝑀 𝑦 π‘š 𝑝 1 βˆ’ 𝑦 π‘š 𝑝

24 Two different patterns white pixel represented by 1 black pixel represented by -1

25 Define four logic operations
Not: βˆ’1,1 16Γ—16 β†’ βˆ’1,1 16Γ— Not 𝐴 = Not 𝐴 = 𝑅 , π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑅 𝑖𝑗 =βˆ’ 𝐴 𝑖𝑗 βˆ€π‘–,𝑗=1,…,16

26 Define four logic operations
Or: βˆ’1,1 16Γ—16 Γ— βˆ’1,1 16Γ—16 β†’ βˆ’1,1 16Γ— Or 𝐴 , 𝐡 = 𝐴 Or 𝐡 = 𝑅 , π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑅 𝑖𝑗 = max 𝐴 𝑖𝑗 , 𝐡 𝑖𝑗 βˆ€π‘–,𝑗=1,…,16

27 Define four logic operations
And: βˆ’1,1 16Γ—16 Γ— βˆ’1,1 16Γ—16 β†’ βˆ’1,1 16Γ—16 And 𝐴 , 𝐡 = 𝐴 And 𝐡 = 𝑅 , π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑅 𝑖𝑗 = min 𝐴 𝑖𝑗 , 𝐡 𝑖𝑗 βˆ€π‘–,𝑗=1,…,16

28 Define four logic operations
Xor: βˆ’1,1 16Γ—16 Γ— βˆ’1,1 16Γ—16 β†’ βˆ’1,1 16Γ—16 Xor 𝐴 , 𝐡 = 𝐴 Xor 𝐡 = 𝑅 = 𝐴 And Not 𝐡 Or 𝐡 And Not 𝐴

29 Total 16 logic combinations of {β€œA”,β€œB”}

30 Total 16 logic combinations of {β€œA”,β€œB”}

31 Total 16 logic combinations of {β€œA”,β€œB”}

32 Total 16 logic combinations of {β€œA”,β€œB”}

33 Total 16 logic combinations of {β€œA”,β€œB”}

34 Set the 256 weights as one of the 16 combinations Output value E of the output neuron

35 Pre-Activation: Difference
# Function of 𝐴 and 𝐡 Pre-Activation: A Pre-Activation: B Pre-Activation: Difference Sigmoid: A B Difference 𝑖 𝐰 𝑖 𝐰 𝑖 βˆ™ 𝐱 𝐴 = 𝑗=1 𝑁 𝑀 𝑖𝑗 π‘₯ 𝑗 𝐴 𝐰 𝑖 βˆ™ 𝐱 𝐡 = 𝑗=1 𝑁 𝑀 𝑖𝑗 π‘₯ 𝑗 𝐡 𝐰 𝑖 βˆ™ 𝐱 𝐴 βˆ’ 𝐰 𝑖 βˆ™ 𝐱 𝐡 𝑦 𝑖 𝐴 𝑦 𝑖 𝐡 𝑦 𝑖 𝐴 βˆ’ 𝑦 𝑖 𝐡 1 𝐴 And Not 𝐴 186 152 34 0.621 0.533 0.088 2 Not 𝐴 Or 𝐡 -166 -200 -0.571 -0.653 0.083 3 𝐡 And Not 𝐴 96 242 -146 0.358 0.738 -0.379 4 Not 𝐴 -256 -110 -0.762 -0.405 -0.357 5 𝐴 And Not 𝐡 146 0.379 6 Not 𝐡 0.357 7 𝐴 Xor 𝐡 -34 -0.088 8 Not 𝐴 And 𝐡 -0.083 9 𝐴 And 𝐡 200 166 0.653 0.571 10 Not 𝐴 Xor 𝐡 -152 -186 -0.533 -0.621 11 𝐡 110 256 0.405 0.762 12 𝐡 Or Not 𝐴 -242 -96 -0.738 -0.358 13 𝐴 14 𝐴 Or Not 𝐡 15 𝐴 Or 𝐡 16 𝐴 Or Not 𝐴

36 𝐰 β€² = 𝐡 And Not 𝐴 , A And Not B , 𝐡 Or Not 𝐴 , A Or Not B 𝐡 And Not 𝐴 , A And Not B , 𝐡 Or Not 𝐴 , A Or Not B 𝐰 πŸ‘ = 𝐡 And Not 𝐴 𝐰 πŸ“ = A And Not B 𝐰 𝟏𝟐 = 𝐡 Or Not 𝐴 𝐰 πŸπŸ’ = A Or Not B

37 Figures 3-8 black-white images black for -1 and white for 1 red-black-green figures intensity of green for the values from 0 to 1 black for 0 intensity of red for the values from 0 to -1

38 Single layer Ten hidden neurons

39 Similarity of two images [U] and [V]

40 Initial weights with random numbers [-1,1]
Trained weight matrix in upper row Its similar function/16 in bottom row Trained weights similar to discriminative fctns 3,12.

41 Initial weights with random numbers [-1,1]
Row # 1. initial weights 2. most similar logic functions 3. (yA-yB)^2 / 4 : 1 (green) or 0 (black)

42

43

44 Initial weights with random numbers [-1,1]
Row # 1. initial weights 2. Logic functionsδΈ­ζœ€η›ΈδΌΌηš„function 3. (yA-yB)^2 / 4 : ε€Όη‚Ί1 (green) or 0 (black) 4. Similarity to A-B

45

46

47 Initial weights in W’ Weight unchanged during training (technique problem with ~hard limited fctn)

48 Initial weights in W’ Row # 1. Initial weights 2. Trained weights
3. Logic functionsδΈ­ζœ€η›ΈδΌΌηš„function 4. (yA-yB)^2 / 4 5. Similarity to A-B

49

50

51 Initial weights in 0.001W’ Bottom row, similar discriminative fctns 3,5,12,14. Trained weights=[A]-[B] or =its negative version.

52 Initial weights in 0.001W’ Row # 1. Initial weights 2. Trained weights
3. most similar logic functions + similarities 4. (yA-yB)^2 / 4 5. Similarity to A-B

53

54

55 Initial W: small random number
Bottom row: similar fctns #3,5,14; Trained weights = [A]-[B] or its negative version.

56 Initial W: small random number between [-0.01, 0.01]
Row # 1. Trained weights 2. most similar logic functions + similarities 3. (yA-yB)^2 / 4 4. Similarity to A-B

57

58

59 Initial weights as [A]-[B] or negative version
Optimal discriminative weights for distinguishing {A ,B}. Weights unchanged.

60 Initial weights as [A]-[B] or negative version
Row # 1. initial weights 2. Logic functionsδΈ­ζœ€η›ΈδΌΌηš„function 3. (yA-yB)^2 / 4 : 全部為 1 (green)

61

62

63 Autoencoder (BP) Autoencoder :Β Train an autoencoder - MATLAB trainAutoencoder.

64 Autoencoder (BP) Bottom row: several similar features of the -10- hidden neurons

65 Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row #
1. Trained weights 2. Logic functionsδΈ­ζœ€η›ΈδΌΌηš„function 3. (yA-yB)^2 / 4 4. Similarity to A-B

66

67

68 Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row #
1. Trained weights 2. Logic functionsδΈ­ζœ€η›ΈδΌΌηš„function 3. (yA-yB)^2 / 4

69

70

71 Exceeds that of logic operations.
[A]-[B] or [B]-[A] Optimal discriminative weights. Cannot be reached by logic combinations.

72 Optimal weights for [A] and [B] with three pixels

73 Optimal discriminative weights See Fig
Optimal discriminative weights See Fig.1 in: Cheng-Yuan Liou (2006), Backbone structure of hairy memory, ICANN, The 16th International Conference on Artificial Neural Networks, September 10-14, in edited book published by LNCS 4131, Part I, pp

74 Biological plausibility

75 Biological plausibility
Hebbian learning

76 Resemble Hebbian learning
Single layer

77 Similar hypothesis Covariance hypothesis Sejnowski TJ, 1977
Covariance hypothesis Sejnowski TJ, 1977 Sejnowski TJ, 1997

78 Have a nice day. Code in website

79 eq2 πœ• 𝐸 𝑝1𝑝2 πœ• 𝑀 𝑖𝑗 =βˆ’ π‘š=1 𝑀 𝑦 π‘š 𝑝 1 βˆ’ 𝑦 π‘š 𝑝 πœ• 𝑦 π‘š 𝑝 πœ•π‘€ 𝑖𝑗 βˆ’ πœ• 𝑦 π‘š 𝑝 πœ•π‘€ 𝑖𝑗 =βˆ’ 𝑦 𝑖 𝑝 1 βˆ’ 𝑦 𝑖 𝑝 2 Γ— 1βˆ’ 𝑦 𝑖 𝑝 π‘₯ 𝑗 𝑝 1 βˆ’ 1βˆ’ 𝑦 𝑖 𝑝 π‘₯ 𝑗 𝑝 2

80 Eq 2 (continued) 𝑦 𝑖 =𝑓 𝑛𝑒𝑑 𝑖 𝑛𝑒𝑑 𝑖 = 𝑗=1 𝑁 𝑀 𝑖𝑗 π‘₯ 𝑗 𝑓 𝑛𝑒𝑑 𝑖 = tanh 0.5 𝑛𝑒𝑑 𝑖 = 1βˆ’ exp βˆ’ 𝑛𝑒𝑑 𝑖 1+ exp βˆ’ 𝑛𝑒𝑑 𝑖

81 Eq 2 (continued) πœ•( 𝑛𝑒𝑑 π‘š ( 𝑝 1 ) ) πœ• 𝑀 𝑖𝑗 = πœ•( 𝑛𝑒𝑑 π‘š ( 𝑝 2 ) ) πœ• 𝑀 𝑖𝑗 =0, for π‘šβ‰ π‘– updating equations for the weights 𝑀 𝑖𝑗 ← 𝑀 𝑖𝑗 βˆ’πœ‚ πœ•πΈ πœ• 𝑀 𝑖𝑗

82 𝑀 𝑖 𝑁+1 ← 𝑀 𝑖 𝑁+1 + πœ‚ 2 𝑦 𝑖 𝑝 1 βˆ’ 𝑦 𝑖 𝑝 2 𝑦 𝑖 𝑝 1 2 βˆ’ 𝑦 𝑖 𝑝 2 2

83 Eq 4 πœ• 𝐸 𝑝 π‘˜ 1 𝑝 π‘˜ 2 πœ• 𝑀 𝑖𝑗 = 1βˆ’ 𝑦 𝑖 𝑝 π‘˜ π‘₯ 𝑗 𝑝 π‘˜ 1 βˆ’ 1βˆ’ 𝑦 𝑖 𝑝 π‘˜ π‘₯ 𝑗 𝑝 π‘˜ βˆ’ 𝑦 𝑖 𝑝 π‘˜ π‘₯ 𝑗 𝑝 π‘˜ 1 βˆ’ 1βˆ’ 𝑦 𝑖 𝑝 π‘˜ π‘₯ 𝑗 𝑝 π‘˜ Γ— 𝑦 𝑖 𝑝 π‘˜ 1 βˆ’ 𝑦 𝑖 𝑝 π‘˜ 2

84 Eq 4 (continued) 𝐸 𝑝 π‘˜ 1 𝑝 π‘˜ 2 = 𝑑 𝐲 𝑝 π‘˜ 1 , 𝐲 𝑝 π‘˜ = 1 2 𝑖=1 𝑀 𝑦 𝑖 𝑝 π‘˜ 1 βˆ’ 𝑦 𝑖 𝑝 π‘˜

85 Eq 5 𝑀 𝑖𝑗 ← 𝑀 𝑖𝑗 βˆ’πœ‚ πœ• 𝐸 𝑝 π‘˜ 1 𝑝 π‘˜ 2 πœ• 𝑀 𝑖𝑗

86 𝑦 𝑖 𝐴 =π‘‘π‘Žπ‘›β„Ž 𝑗=1 𝑁=256 𝑀 𝑖𝑗 π‘₯ 𝑗 𝐴 , βˆ€π‘–=1,…,𝑀

87 𝐸= 𝐸 𝐴𝐡 π‘Ÿπ‘’π‘ = 1 2 𝑖=1 𝑀=10 𝑦 𝑖 𝐴 βˆ’ 𝑦 𝑖 𝐡 2

88 𝑠 π‘ˆ , 𝑉 = 𝐱 π‘ˆ 𝐱 π‘ˆ βˆ™ 𝐱 𝑉 𝐱 𝑉

89 𝜹 π‘œ 𝑖 = πœ•πΈ πœ• π‘œ 𝑖 πœ• π‘œ 𝑖 πœ• 𝑛𝑒𝑑 𝑖 𝒐𝑖 is obtained much as in Eq.2

90 𝜹 π‘œ 𝑖 𝑝 1 = π‘œ 𝑖 𝑝 1 βˆ’ π‘œ 𝑖 𝑝 2 1 2 1βˆ’ π‘œ 𝑖 𝑝 1 2 𝜹 π‘œ 𝑖 𝑝 2 = π‘œ 𝑖 𝑝 1 βˆ’ π‘œ 𝑖 𝑝 2 1 2 1βˆ’ π‘œ 𝑖 𝑝 2 2

91 𝛿 𝑧 𝑗 ( 𝑝 1 ) = 1 2 1βˆ’ 𝑧 𝑗 ( 𝑝 1 ) 2 π‘Ÿ 𝛿 π‘œ π‘Ÿ ( 𝑝 1 ) 𝑀 π‘Ÿπ‘— 𝛿 𝑧 𝑗 ( 𝑝 2 ) = 1 2 1βˆ’ 𝑧 𝑗 ( 𝑝 2 ) 2 π‘Ÿ 𝛿 π‘œ π‘Ÿ ( 𝑝 2 ) 𝑀 π‘Ÿπ‘— 𝛿 𝑦 π‘˜ ( 𝑝 1 ) = 1 2 1βˆ’ 𝑦 π‘˜ ( 𝑝 1 ) 2 π‘Ÿ 𝛿 𝑧 π‘Ÿ ( 𝑝 1 ) 𝑣 π‘Ÿπ‘˜ 𝛿 𝑦 π‘˜ ( 𝑝 2 ) = 1 2 1βˆ’ 𝑦 π‘˜ ( 𝑝 2 ) 2 π‘Ÿ 𝛿 𝑧 π‘Ÿ ( 𝑝 2 ) 𝑣 π‘Ÿπ‘˜

92 ch5 {{ 𝑣 π‘˜π‘– ← 𝑣 π‘˜π‘– +πœ‚ π‘₯ π‘˜ 𝑝 βˆ’ π‘₯ π‘˜ β€² 1βˆ’ π‘₯ π‘˜ β€² 𝑦 𝑖 𝑝 }}

93 Ch5 eq7 𝑀 𝑖𝑗 𝑛+1 ← 𝑀 𝑖𝑗 𝑛 +πœ‚ 𝑦 𝑖 𝑝 1 𝑛 βˆ’ 𝑦 𝑖 𝑝 2 𝑛 π‘₯ 𝑖 𝑝 1 𝑛 βˆ’ π‘₯ 𝑖 𝑝 2 𝑛 π‘₯ 𝑖 𝑝 1 𝑛 βˆ’ π‘₯ 𝑖 𝑝 2 𝑛

94 Ch5 eq8 𝑀 𝑖𝑗 𝑛+1 ← 𝑀 𝑖𝑗 𝑛 +πœ‚ 𝑦 𝑖 𝑛 π‘₯ 𝑗 𝑛

95 Ch5 eq9 𝑀 𝑖𝑗 𝑛+1 ← 𝑀 𝑖𝑗 𝑛 +πœ‚ 𝑦 𝑖 𝑛 βˆ’ 𝑦 𝑛 π‘₯ 𝑖 𝑛 βˆ’ π‘₯ 𝑛


Download ppt "CSCE 2017 ICAI 2017 Las Vegas July. 17."

Similar presentations


Ads by Google