CSCE 2017 ICAI 2017 Las Vegas July. 17.

Slides:

Advertisements

Similar presentations

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.

Advertisements

NEURAL NETWORKS Backpropagation Algorithm

Introduction to Neural Networks Computing

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Producing Artificial Neural Networks using a Simple Embryogeny Chris Bowers School of Computer Science, University of Birmingham White.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

An Illustrative Example

Presenting: Itai Avron Supervisor: Chen Koren Characterization Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.

November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.

LOGO Classification III Lecturer: Dr. Bo Yuan

Radial Basis Function (RBF) Networks

Radial-Basis Function Networks

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.

Traffic Sign Recognition Using Artificial Neural Network Radi Bekker

1 Introduction to Artificial Neural Networks Andrew L. Nelson Visiting Research Faculty University of South Florida.

Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.

CSSE463: Image Recognition Day 21 Upcoming schedule: Upcoming schedule: Exam covers material through SVMs Exam covers material through SVMs.

Neural Networks Architecture Baktash Babadi IPM, SCS Fall 2004.

Appendix B: An Example of Back-propagation algorithm

Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy

Back-Propagation MLP Neural Network Optimizer ECE 539 Andrew Beckwith.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

An informal description of artificial neural networks John MacCormick.

Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.

Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

CS621 : Artificial Intelligence

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

Chapter 18 Connectionist Models

1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

1 Neural Networks MUMT 611 Philippe Zaborowski April 2005.

Big data classification using neural network

Artificial Neural Networks

The Gradient Descent Algorithm

Article Review Todd Hricik.

Restricted Boltzmann Machines for Classification

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Deep Learning Qing LU, Siyuan CAO.

Unsupervised Learning and Autoencoders

CSSE463: Image Recognition Day 17

Machine Learning Today: Reading: Maria Florina Balcan

CSC 578 Neural Networks and Deep Learning

Introduction to Neural Networks

Deep learning Introduction Classes of Deep Learning Networks

of the Artificial Neural Networks.

Introduction of MATRIX CAPSULES WITH EM ROUTING

Intelligent Leaning -- A Brief Introduction to Artificial Neural Networks Chiung-Yao Fang.

network of simple neuron-like computing elements

Basics of Deep Learning No Math Required

Emre O. Neftci iScience Volume 5, Pages (July 2018) DOI: /j.isci

Creating Data Representations

CSSE463: Image Recognition Day 17

Hierarchical Clustering

CSSE463: Image Recognition Day 17

CSSE463: Image Recognition Day 13

McCulloch–Pitts Neuronal Model :

Boltzmann Machine (BM) (§6.4)

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

A connectionist model in action

CSSE463: Image Recognition Day 17

CSSE463: Image Recognition Day 17

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Volume 74, Issue 1, Pages (April 2012)

CSC 578 Neural Networks and Deep Learning

Outline Announcement Neural networks Perceptrons - continued

Presentation transcript:

CSCE 2017 ICAI 2017 Las Vegas July. 17

Loading Discriminative Feature Representations in Hidden Layer Daw-Ran Liou Yang-En Chen Cheng-Yuan Liou Dept of Computer Sci and Information Eng National Taiwan University

Hinton, et al 2006 The “optimal spaced codes” is unclear and can hardly be accomplished by any learning algorithms for reduced Boltzmann machine.

Loading similar features. (partial plus global features) Deep learning Loading similar features. (partial plus global features)

Proposed object fctn: different classes 𝐸 𝑟𝑒𝑝 =− 1 2 𝑝 1 𝑝 𝑝 2 𝑝 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝 2 2

Object fctn: same class 𝐸 𝑎𝑡𝑡 = 1 2 𝑝 𝑘 1 =1 𝑃 𝑘 𝑝 𝑘 2 =1 𝑃 𝑘 𝑑 𝐲 𝑝 𝑘 1 , 𝐲 𝑝 𝑘 2 2 = 𝑝 𝑘 1 =1 𝑃 𝑘 𝑝 𝑘 2 =1 𝑃 𝑘 𝐸 𝑝 𝑘 1 𝑝 𝑘 2

Similar architecture

Becker and Hinton, 1992

Becker and Hinton, 1992 mutual information two modules

Proposed: single module

Three hidden layers 𝑦 𝑧 𝑜 𝑙 𝑘 𝑗 𝑖 𝑢 𝑣 𝑤

Different class patterns 𝐸 𝑟𝑒𝑝 =− 1 2 𝑝 1 =1 𝑃 𝑝 2 =1 𝑃 𝑖=1 𝐼 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 2

Training formulas ∆ 𝑤 𝑖𝑗 = 𝛿 𝑜 𝑖 𝑝 1 𝑧 𝑗 𝑝 1 − 𝛿 𝑜 𝑖 𝑝 2 𝑧 𝑗 𝑝 2 ∆ 𝑣 𝑗𝑘 = 𝛿 𝑧 𝑗 𝑝 1 𝑦 𝑘 𝑝 1 − 𝛿 𝑧 𝑗 𝑝 2 𝑦 𝑘 𝑝 2 ∆ 𝑢 𝑘𝑙 = 𝛿 𝑦 𝑘 𝑝 1 𝑥 𝑙 𝑝 1 − 𝛿 𝑦 𝑘 𝑝 2 𝑥 𝑙 𝑝 2

52 patterns 16X16 pixels (26+26)

Sorted minimum Hamming distances for the 52 representations

-output 1- obtained by orthogonal initial weights. The minimum distances for all patterns are all less than 90 (the curve marked with -input-). Single layer -output 1- obtained by orthogonal initial weights. -output 2- obtained by small random initial weights. 3 hidden layers -output 3- is obtained by orthogonal initial weights -output 4- is obtained by random initial weights

Sorted maximum Hamming distance between a representation and all others

Sorted averaged Hamming distance for each representation

Restoration of noisy patterns

Restoration

Single layer Set weights as logic combinations of two patterns

Logic combinations of two patterns as discriminative weights Wij

Object fctn: two different patterns 𝐸 𝑟𝑒𝑝 =− 1 2 𝑝 1 𝑝 𝑝 2 𝑝 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝 2 2

Two different patterns white pixel represented by 1 black pixel represented by -1

Define four logic operations Not: −1,1 16×16 → −1,1 16×16 Not 𝐴 = Not 𝐴 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 =− 𝐴 𝑖𝑗 ∀𝑖,𝑗=1,…,16

Define four logic operations Or: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 Or 𝐴 , 𝐵 = 𝐴 Or 𝐵 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 = max 𝐴 𝑖𝑗 , 𝐵 𝑖𝑗 ∀𝑖,𝑗=1,…,16

Define four logic operations And: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 And 𝐴 , 𝐵 = 𝐴 And 𝐵 = 𝑅 , 𝑤ℎ𝑒𝑟𝑒 𝑅 𝑖𝑗 = min 𝐴 𝑖𝑗 , 𝐵 𝑖𝑗 ∀𝑖,𝑗=1,…,16

Define four logic operations Xor: −1,1 16×16 × −1,1 16×16 → −1,1 16×16 Xor 𝐴 , 𝐵 = 𝐴 Xor 𝐵 = 𝑅 = 𝐴 And Not 𝐵 Or 𝐵 And Not 𝐴

Total 16 logic combinations of {“A”,“B”}

Total 16 logic combinations of {“A”,“B”}

Total 16 logic combinations of {“A”,“B”}

Total 16 logic combinations of {“A”,“B”}

Total 16 logic combinations of {“A”,“B”}

Set the 256 weights as one of the 16 combinations Output value E of the output neuron

Pre-Activation: Difference # Function of 𝐴 and 𝐵 Pre-Activation: A Pre-Activation: B Pre-Activation: Difference Sigmoid: A B Difference 𝑖 𝐰 𝑖 𝐰 𝑖 ∙ 𝐱 𝐴 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝐴 𝐰 𝑖 ∙ 𝐱 𝐵 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝐵 𝐰 𝑖 ∙ 𝐱 𝐴 − 𝐰 𝑖 ∙ 𝐱 𝐵 𝑦 𝑖 𝐴 𝑦 𝑖 𝐵 𝑦 𝑖 𝐴 − 𝑦 𝑖 𝐵 1 𝐴 And Not 𝐴 186 152 34 0.621 0.533 0.088 2 Not 𝐴 Or 𝐵 -166 -200 -0.571 -0.653 0.083 3 𝐵 And Not 𝐴 96 242 -146 0.358 0.738 -0.379 4 Not 𝐴 -256 -110 -0.762 -0.405 -0.357 5 𝐴 And Not 𝐵 146 0.379 6 Not 𝐵 0.357 7 𝐴 Xor 𝐵 -34 -0.088 8 Not 𝐴 And 𝐵 -0.083 9 𝐴 And 𝐵 200 166 0.653 0.571 10 Not 𝐴 Xor 𝐵 -152 -186 -0.533 -0.621 11 𝐵 110 256 0.405 0.762 12 𝐵 Or Not 𝐴 -242 -96 -0.738 -0.358 13 𝐴 14 𝐴 Or Not 𝐵 15 𝐴 Or 𝐵 16 𝐴 Or Not 𝐴

𝐰 ′ = 𝐵 And Not 𝐴 , A And Not B , 𝐵 Or Not 𝐴 , A Or Not B 𝐵 And Not 𝐴 , A And Not B , 𝐵 Or Not 𝐴 , A Or Not B 𝐰 𝟑 = 𝐵 And Not 𝐴 𝐰 𝟓 = A And Not B 𝐰 𝟏𝟐 = 𝐵 Or Not 𝐴 𝐰 𝟏𝟒 = A Or Not B

Figures 3-8 black-white images black for -1 and white for 1 red-black-green figures intensity of green for the values from 0 to 1 black for 0 intensity of red for the values from 0 to -1

Single layer Ten hidden neurons

Similarity of two images [U] and [V]

Initial weights with random numbers [-1,1] Trained weight matrix in upper row Its similar function/16 in bottom row Trained weights similar to discriminative fctns 3,12.

Initial weights with random numbers [-1,1] Row # 1. initial weights 2. most similar logic functions 3. (yA-yB)^2 / 4 : 1 (green) or 0 (black)

Initial weights with random numbers [-1,1] Row # 1. initial weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 : 值為1 (green) or 0 (black) 4. Similarity to A-B

Initial weights in W’ Weight unchanged during training (technique problem with ~hard limited fctn)

Initial weights in W’ Row # 1. Initial weights 2. Trained weights 3. Logic functions中最相似的function 4. (yA-yB)^2 / 4 5. Similarity to A-B

Initial weights in 0.001W’ Bottom row, similar discriminative fctns 3,5,12,14. Trained weights=[A]-[B] or =its negative version.

Initial weights in 0.001W’ Row # 1. Initial weights 2. Trained weights 3. most similar logic functions + similarities 4. (yA-yB)^2 / 4 5. Similarity to A-B

Initial W: small random number Bottom row: similar fctns #3,5,14; Trained weights = [A]-[B] or its negative version.

Initial W: small random number between [-0.01, 0.01] Row # 1. Trained weights 2. most similar logic functions + similarities 3. (yA-yB)^2 / 4 4. Similarity to A-B

Initial weights as [A]-[B] or negative version Optimal discriminative weights for distinguishing {A ,B}. Weights unchanged.

Initial weights as [A]-[B] or negative version Row # 1. initial weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 : 全部為 1 (green)

Autoencoder 256-10-256 (BP) Autoencoder : Train an autoencoder - MATLAB trainAutoencoder.

Autoencoder 256-10-256 (BP) Bottom row: several similar features of the -10- hidden neurons

Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row # 1. Trained weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4 4. Similarity to A-B

Autoencoder 256-10-256 (BP) Trained with: Autoencoder Row # 1. Trained weights 2. Logic functions中最相似的function 3. (yA-yB)^2 / 4

Exceeds that of logic operations. [A]-[B] or [B]-[A] Optimal discriminative weights. Cannot be reached by logic combinations.

Optimal weights for [A] and [B] with three pixels

Optimal discriminative weights See Fig Optimal discriminative weights See Fig.1 in: Cheng-Yuan Liou (2006), Backbone structure of hairy memory, ICANN, The 16th International Conference on Artificial Neural Networks, September 10-14, in edited book published by LNCS 4131, Part I, pp 688-697

Biological plausibility

Biological plausibility Hebbian learning

Resemble Hebbian learning Single layer

Similar hypothesis Covariance hypothesis Sejnowski TJ, 1977 Covariance hypothesis Sejnowski TJ, 1977 Sejnowski TJ, 1997

Have a nice day. Code in website http://red.csie.ntu.edu.tw/NN/Classinfo/classinfo_eng.html

eq2 𝜕 𝐸 𝑝1𝑝2 𝜕 𝑤 𝑖𝑗 =− 𝑚=1 𝑀 𝑦 𝑚 𝑝 1 − 𝑦 𝑚 𝑝 2 𝜕 𝑦 𝑚 𝑝 1 𝜕𝑤 𝑖𝑗 − 𝜕 𝑦 𝑚 𝑝 2 𝜕𝑤 𝑖𝑗 =− 1 2 𝑦 𝑖 𝑝 1 − 𝑦 𝑖 𝑝 2 × 1− 𝑦 𝑖 𝑝 1 2 𝑥 𝑗 𝑝 1 − 1− 𝑦 𝑖 𝑝 2 2 𝑥 𝑗 𝑝 2

Eq 2 (continued) 𝑦 𝑖 =𝑓 𝑛𝑒𝑡 𝑖 𝑛𝑒𝑡 𝑖 = 𝑗=1 𝑁 𝑤 𝑖𝑗 𝑥 𝑗 𝑓 𝑛𝑒𝑡 𝑖 = tanh 0.5 𝑛𝑒𝑡 𝑖 = 1− exp − 𝑛𝑒𝑡 𝑖 1+ exp − 𝑛𝑒𝑡 𝑖

Eq 2 (continued) 𝜕( 𝑛𝑒𝑡 𝑚 ( 𝑝 1 ) ) 𝜕 𝑤 𝑖𝑗 = 𝜕( 𝑛𝑒𝑡 𝑚 ( 𝑝 2 ) ) 𝜕 𝑤 𝑖𝑗 =0, for 𝑚≠𝑖 updating equations for the weights 𝑤 𝑖𝑗 ← 𝑤 𝑖𝑗 −𝜂 𝜕𝐸 𝜕 𝑤 𝑖𝑗

𝑤 𝑖 𝑁+1 ← 𝑤 𝑖 𝑁+1 + 𝜂 2 𝑦 𝑖 𝑝 1 − 𝑦 𝑖 𝑝 2 𝑦 𝑖 𝑝 1 2 − 𝑦 𝑖 𝑝 2 2

Eq 4 𝜕 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 𝜕 𝑤 𝑖𝑗 = 1− 𝑦 𝑖 𝑝 𝑘 1 2 𝑥 𝑗 𝑝 𝑘 1 − 1− 𝑦 𝑖 𝑝 𝑘 2 2 𝑥 𝑗 𝑝 𝑘 2 1− 𝑦 𝑖 𝑝 𝑘 1 2 𝑥 𝑗 𝑝 𝑘 1 − 1− 𝑦 𝑖 𝑝 𝑘 2 2 𝑥 𝑗 𝑝 𝑘 2 × 𝑦 𝑖 𝑝 𝑘 1 − 𝑦 𝑖 𝑝 𝑘 2

Eq 4 (continued) 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 = 1 2 𝑑 𝐲 𝑝 𝑘 1 , 𝐲 𝑝 𝑘 2 2 = 1 2 𝑖=1 𝑀 𝑦 𝑖 𝑝 𝑘 1 − 𝑦 𝑖 𝑝 𝑘 2 2

Eq 5 𝑤 𝑖𝑗 ← 𝑤 𝑖𝑗 −𝜂 𝜕 𝐸 𝑝 𝑘 1 𝑝 𝑘 2 𝜕 𝑤 𝑖𝑗

𝑦 𝑖 𝐴 =𝑡𝑎𝑛ℎ 𝑗=1 𝑁=256 𝑤 𝑖𝑗 𝑥 𝑗 𝐴 , ∀𝑖=1,…,𝑀

𝐸= 𝐸 𝐴𝐵 𝑟𝑒𝑝 = 1 2 𝑖=1 𝑀=10 𝑦 𝑖 𝐴 − 𝑦 𝑖 𝐵 2

𝑠 𝑈 , 𝑉 = 𝐱 𝑈 𝐱 𝑈 ∙ 𝐱 𝑉 𝐱 𝑉

𝜹 𝑜 𝑖 = 𝜕𝐸 𝜕 𝑜 𝑖 𝜕 𝑜 𝑖 𝜕 𝑛𝑒𝑡 𝑖 𝒐𝑖 is obtained much as in Eq.2

𝜹 𝑜 𝑖 𝑝 1 = 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 1 2 1− 𝑜 𝑖 𝑝 1 2 𝜹 𝑜 𝑖 𝑝 2 = 𝑜 𝑖 𝑝 1 − 𝑜 𝑖 𝑝 2 1 2 1− 𝑜 𝑖 𝑝 2 2

𝛿 𝑧 𝑗 ( 𝑝 1 ) = 1 2 1− 𝑧 𝑗 ( 𝑝 1 ) 2 𝑟 𝛿 𝑜 𝑟 ( 𝑝 1 ) 𝑤 𝑟𝑗 𝛿 𝑧 𝑗 ( 𝑝 2 ) = 1 2 1− 𝑧 𝑗 ( 𝑝 2 ) 2 𝑟 𝛿 𝑜 𝑟 ( 𝑝 2 ) 𝑤 𝑟𝑗 𝛿 𝑦 𝑘 ( 𝑝 1 ) = 1 2 1− 𝑦 𝑘 ( 𝑝 1 ) 2 𝑟 𝛿 𝑧 𝑟 ( 𝑝 1 ) 𝑣 𝑟𝑘 𝛿 𝑦 𝑘 ( 𝑝 2 ) = 1 2 1− 𝑦 𝑘 ( 𝑝 2 ) 2 𝑟 𝛿 𝑧 𝑟 ( 𝑝 2 ) 𝑣 𝑟𝑘

ch5 {{ 𝑣 𝑘𝑖 ← 𝑣 𝑘𝑖 +𝜂 1 2 𝑥 𝑘 𝑝 − 𝑥 𝑘 ′ 1− 𝑥 𝑘 ′ 2 𝑦 𝑖 𝑝 }}

Ch5 eq7 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑝 1 𝑛 − 𝑦 𝑖 𝑝 2 𝑛 𝑥 𝑖 𝑝 1 𝑛 − 𝑥 𝑖 𝑝 2 𝑛 𝑥 𝑖 𝑝 1 𝑛 − 𝑥 𝑖 𝑝 2 𝑛

Ch5 eq8 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑛 𝑥 𝑗 𝑛

Ch5 eq9 𝑤 𝑖𝑗 𝑛+1 ← 𝑤 𝑖𝑗 𝑛 +𝜂 𝑦 𝑖 𝑛 − 𝑦 𝑛 𝑥 𝑖 𝑛 − 𝑥 𝑛