Introduction of MATRIX CAPSULES WITH EM ROUTING

Introduction of MATRIX CAPSULES WITH EM ROUTING
1st May 2018

Who Am I? 學歷經歷期刊文章 2002, 中國文化大學地理系學士 2006, 臺灣大學地理環境資源學系碩士
台大空間資訊研究中心遙測及資料加值組組長台大空間資訊研究中心資料加值組研究助理、組長財團法人空間及環境科技文教基金會遙測及資料加值組組長財團法人空間及環境科技文教基金會遙測組、資料加值組研究助理臺灣觀光學院休閒管理系講師私立大同高級中學兼任地理教師期刊文章 Tzu-How Chu, Meng-Lung Lin* and Chia-Hao Chang （2012）mGUIDING (Mobile Guiding) — Using a Mobile GIS APP For Guiding, Scandinavian Journal of Hospitality and Tourism, 12(3): Tzu-How Chu, Meng-Lung Lin, Chia-Hao Chang, Cheng-Wu Chen（2011）, Developing a tour guiding information system for tourism service using mobile GIS and GPS techniques, Advances in Information Sciences and Service Sciences, Vol. 3, No. 6, pp. 49 ~ 58 張家豪、朱子豪、劉英毓，2005，應用高解像力遙測影像於台北市屋頂加蓋物之監測，台灣地理資訊學刊，第三期，pp.15-26

Geoffrey E. Hinton —The Godfather of AI
British cognitive psychologist and computer scientist. Emeritus Professor at the Dept. of Computer Science, University of Toronto. AI: Artificial Intelligence

Convolutional Neural Networks (CNNs)

https://chtseng. wordpress

Calculated from training data

Human face CNNs thought
Human face we imaged Human face CNNs thought

What Is Capsule? Instead of aiming for viewpoint invariance in the activities of “neurons” that use a single scalar output to summarize the activities of a local pool of replicated feature detectors, artificial neural networks should use local “capsules” that perform some quite complicated internal computations on their inputs and then encapsulate the results of these computations into a small vector of highly informative outputs. Each capsule learns to recognize an implicitly defined visual entity over a limited domain of viewing conditions and deformations and it outputs both the probability that the entity is present within its limited domain and a set of “instantiation parameters” that may include the precise pose, lighting and deformation of the visual entity relative to an implicitly defined canonical version of that entity. --Geoffrey E. Hinton et al., 2011, Transforming Auto-Encoders, ICANN 2011

Differences between CNNs and Capsules
An important difference between capsules and standard neural nets is that the activation of a capsule is based on a comparison between multiple incoming pose predictions whereas in a standard neural net it is based on a comparison between a single incoming activity vector and a learned weight vector.

How Capsules Work? Each capsule has a 4x4 pose matrix, M, and an activation probability, a. In between each capsule i in layer L and each capsule j in layer L+1 is a 4x4 trainable transformation matrix, Wij. The pose matrix of capsule i is transformed by Wij to cast a vote Vij = MiWij for the pose matrix of capsule j.

Expectation-Maximization Algorithm

SPREAD LOSS Use “spread loss” to directly maximize the gap between the activation of the target class (at) and the activation of the other classes to make the training less sensitive to the initialization and hyper-parameters of the model. L: total spread loss, Li: spread loss in capsule i, at: activation of target class, at: activation of other class i, m: margin

Capsules Architecture
ReLU: Rectified Linear Unit

The Small NORB Dataset Contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). NORB: NYU Object Recognition Benchmark

Experiments For training and test, stereo pairs of 96x96 pixels images were used for each. 5 kinds x 5 physical instances x 18 different azimuths x 9 elevations x 6 lighting conditions. Downsample from 96x96 pixels to 48x48 pixels. For training data Randomly crop 32x32 pixels and add random brightness and contrast. For test data Crop 32x32 pixels from center of the image.

ReLU Conv1 [14, 14, 32] PrimaryCaps ConvCaps1 ConvCaps2 Class Capsules
With original image (32x32), 5x5 convolutional layer and Stride=2 32 channels ReLU PrimaryCaps [14, 14, 32] capsules Each capsule: 4x4 pose matrix + 1 activation for layer L Calculate M and a for layer L+1 by EM ConvCaps1 [6, 6, 32] capsules: 3x3 convolutional capsule layers and Stride=2 Calculate M and a for layer L+2 by EM ConvCaps2 [4, 4, 32] capsules: 3x3 convolutional capsule layers, Stride=1 Calculate M and a for layer L+2 by EM Class Capsules [5, 1] capsules for 5 types of toy. Full connected within layer L+2 Coordinate Addition Calculate M and a for Class Capsules by EM ReLu Conv1中，將原始影像用32種不同的kernel，以5x5的kernel間距為2的距離進行摺積計算，再將算完結果為負值的像元利用線性整流單元（Rectified Linear Unit，ReLU）改為0 PrimaryCaps中，一樣維持32個channels，但對於各channel中的每個像元，以4x4計算與32個channels之間的pose matrix及activation，再打包成該像元的32個capsules

Conclusions A new type of capsule system proposed.
A new interactive routing procedure between capsule layers based on EM algorithm. Reduce errors by 45% (from 5.2% to 1.8%) from CNNs on the smallNORB data set.

Relations between Capsule and My Proposal (draft)
Temporal change of different land cover and land use can be specified by integration of multi-source and multi-temporal remote sensing data. With multi-temporal remote sensing data, pixel value change in the same position could be described by pose transformation matrix.

The End

Introduction of MATRIX CAPSULES WITH EM ROUTING

Similar presentations

Presentation on theme: "Introduction of MATRIX CAPSULES WITH EM ROUTING"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction of MATRIX CAPSULES WITH EM ROUTING

Similar presentations

Presentation on theme: "Introduction of MATRIX CAPSULES WITH EM ROUTING"— Presentation transcript:

Similar presentations

About project

Feedback