Analysis of Trained CNN (Receptive Field & Weights of Network) Bukweon Kim
Basic Example:Mnist Data Mnist data is data of 28 by 28 sized handwritten digit images, all labeled which is which. Let us observe the structural characteristics with this Mnist data using simple CNN.
Example CNN structure Input Image 24 X 24 28 X 28 5 X 5 X 4 5 X 5 X 4 First Convolution Weights 24 X 24 12 X 12 Convolution Pooling X 8 5 X 5 X 4 Second Convolution Weights 8 X 8 4 X 4 Fully Connected Classify 4 X 4 X 8 X 16 Fully Connected Layer Weights Classify Dictionary 16 X 1 16 X 10 ReLU Softmax ReLU ReLU
Receptive Field Receptive field is the area that this value is determined from. The green box is determined by red red boxes and green box affects cyan box or values. Any value that is not inside red box does not affect the value of the green box.
Classification Similar (high inner product value) 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Inner product and softmax 0 0 0 0 0 0 1 0 0 0 Result achieved by applying softmax on inner-product result of signal and library This feature strongly suggest input is either 0, 4, 5, 6, or 8 Signals of 16 features extracted from previous steps of CNN Library of 16 features for each digit
Outputs of several 6s 6 We will focus on this 13th signal and analysis what it means. Library of 16 features for 6
6 6 like image examples for explanation We will focus on this 13th signal and analysis what it means. Library of 16 features for 6 14th Mnist Data Confuses between 6 and 0 Classified as 6 Classified as 6 Classified as 1
What is the meaning of this 13th features? 0.2 + 0.1 0.4 0.7 Inner Product ReLU 1.7 For the convenience of understanding I will focus on the strongest signal Output of previous pooling layer Weights for 13th signal of next layer
Weight and signal analysis : Weight value Receptive field of each values. positive negative Main signal given from previous layer 0.22 0.18 0.04 0.05 2.22 2.97 0.07 0.66 0.83 0.61 -1.04 0.15 4.27 -1.18 -1.19 -0.21 -1.22 -1.33 -0.55 -0.67 0.07 0.05 0.12 0.22 0.18 0.04 Value translated as image moved 8 pixels below. Inner Product Inner Product 0.7 -0.7
Weight and signal analysis : Pooling 0.22 0.18 0.04 0.05 The max pooling made signal a bit local translation invariant. Even though we moved the image 2 pixels, the signal of selected pixel did not change 0.14 0.22 0.05 2 pixels 0.22 0.18 0.04 2 pixels 2×2 max pooling with stride 2
Weight and signal analysis : Weight value deeper understanding 1 Increased value of input corresponding to positive weight enhances the signal. 0.42099 0.27828 Increased value of input corresponding to negative weight suppress the signal. 0.03 -0.01 -0.26 -0.09 -0.12 -0.03 -0.85 -0.04 -0.02 0.15 -0.43 0.11 0.14 0.06 0.04 -0.21 0.02 -0.05 0.05 Value change of input corresponding to weight near 0 does not effect the signal much. 0.26084
Weight and signal analysis : Pooling deeper understanding Strongest signal output from one of this inner product or
+ Weight and signal analysis : Weight value deeper understanding 2 Map of the pattern determined from 2 previous filters Convolution & pooling Convolution Map of the signal of combination of any of 4 patterns we looked for ReLU + Weights of first convolution layer Outputs of first pooling layer 8th Weight in second convolutional layer
Weight and signal analysis : Weight value deeper understanding 3 ReLU + Each maps are looking for the patterns somewhat similar to these (these are not exact because it is not linear) the final output may be considered as the value derived from taking many combination of patterns in account. These accounted patterns may not only enhance the value, but also suppress the value. ReLU( + + + )= suppress enhance ReLU( + + + )= Input for fully connected layer!! suppress enhance
Why was CNN fooled/not fooled for examples? The strong 13th signal usually tells if input is 6 or not because of what it looks for. The strong 13th signal usually tells if input is 6 or not because of what it looks for. ignored Library of 16 features for 6 Existence of / pattern on middle fooled them to think it is 1. Confuses between 6 and 0 14th Mnist Data Classified as 6 Classified as 6 Classified as 1
The CNN with ReLU looks for combination of patterns as it gets deeper. Conclusion The CNN with ReLU looks for combination of patterns as it gets deeper. The pooling layer tells CNN that we are looking for local translation invariant features. Deeper layers of CNN allow the network to look for more complex combination of patterns. Also, it allow wider invariance for local patterns. With knowing what exactly CNN looks for, we can tell have deeper understanding of how the CNN works, and what can or can’t it do.
Semantic Segmentation Using Image Classification Pixelwise Classification Amniotic Fluid Umbilical Vein Stomach Bubble Shadowing Artifact Bone Other white region Classification CNN Stomach Bubble Repeat for every pixel Extract Patch centered at pixel Classify pixel
Comparison for Segmentation Result with and without spine position With some change of CNN structure, we could give the spine position information into the CNN structure where we wanted them to be applied