Presentation is loading. Please wait.

Presentation is loading. Please wait.

VALSE Webinar ICCV Pre-conference SORT & Genetic CNN

Similar presentations


Presentation on theme: "VALSE Webinar ICCV Pre-conference SORT & Genetic CNN"— Presentation transcript:

1 VALSE Webinar ICCV Pre-conference SORT & Genetic CNN
Speaker: Lingxi Xie Slides available at my homepage (TALKS)! Department of Computer Science The Johns Hopkins University

2 We Focus on Image Recognition
Image recognition or classification is important It is the lowest goal of understanding an image The ease of data collection and large-scale datasets Recognition itself is of little use, but it helps other tasks Many other tasks, including instance retrieval, object detection, semantic segmentation, boundary detection, etc., benefit from the pre-trained models on a large dataset Meanwhile, the recognition task is still developing A single label is not enough for describing an image Recognition is being combined with natural language processing 11/22/2018 VALSE Webinar 2017

3 Brief History: Image Recognition
Image recognition: a fundamental task Clearly defined, labeled data easy to obtain Development in datasets Small datasets: from two classes to few classes Mid-level datasets: tens or hundreds of classes Current age: more than 10,000 classes [Deng, 2009] Evolution in algorithms Early years: global features, e.g., color histograms From 2000’s: local features, e.g., SIFT Current age: deep neural networks, e.g., AlexNet 11/22/2018 VALSE Webinar 2017

4 Key Principles: Image Recognition
Principle #1: invariance The ability of modeling and capturing invariance determines the transfer ability The local features are often more repeatable than global features Example: handcrafted features – from global to local Principle #2: parameters A large parameter count often leads to the risk of over-fitting Example: neuron connectivity – from fully-connected to convolutional (partially-connected and weight sharing) Principle #3: capacity A model with a large capacity would benefit from data increase Example: network structure – from shallow to deep 11/22/2018 VALSE Webinar 2017

5 Deep Learning Basics Deep learning is the idea of constructing a very complicated mathematical function based on a hierarchy of differentiable operations We provide a large function space, and let the data speak for themselves The hierarchy often appears as a network structure, and the operations are often illustrated as links between neurons People tend to believe that a network with an enough depth and a sufficient number of neurons is able to fit any complicated feature space 11/22/2018 VALSE Webinar 2017

6 Recognition: Background
Deeper architectures AlexNet: the first deep network for large-scale recognition (8 layers) VGGNet: deeper structures (16 or 19 layers) GoogLeNet: multi-scale, multi-path (22 layers) ResNet: deeper networks with highway connections (50, 101 layers or more) DenseNet: dense layer connections (100+ layers) 11/22/2018 VALSE Webinar 2017

7 Recognition: Background (cont.)
Towards efficient network training Basic elements: learning rate, mini-batch, momentum ReLU: a non-linear unit to prevent gradient vanishing Dropout: introducing randomness to prevent over-fitting Batch normalization: towards better numerical stability 11/22/2018 VALSE Webinar 2017

8 Our Work on Image Recognition
Novel network modules L. Xie et.al, Towards Reversal-Invariant Image Representation, ICCV’2015, IJCV’2017 L. Xie et.al, Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons, ECCV’2016 Y. Wang et.al, SORT: Second-Order Response Transform for Visual Recognition, ICCV’2017 A new training strategy L. Xie et.al, DisturbLabel: Regularizing CNN on the Loss Layer, CVPR’2016 Automatically discovering new network structures L. Xie et.al, Genetic CNN, ICCV’2017 11/22/2018 VALSE Webinar 2017

9 ICCV 2017 SORT: Second-Order Response Transform for Visual Recognition
Speaker: Lingxi Xie Authors: Yan Wang, Lingxi Xie, Chenxi Liu, Siyuan Qiao, Ya Zhang, Wenjun Zhang, Qi Tian, Alan Yuille Department of Computer Science The Johns Hopkins University

10 Outline Introduction Second-Order Response Transform Experiments
Conclusions and Future Work 11/22/2018 VALSE Webinar 2017

11 Outline Introduction Second-Order Response Transform Experiments
Conclusions and Future Work 11/22/2018

12 Introduction Deep Learning
The state-of-the-art machine learning theory Using a cascade of many layers of non-linear neurons for feature extraction and transformation Learning multiple levels of feature representation Higher-level features are derived from lower-level features to form a hierarchical architecture Multiple levels of representation correspond to different levels of abstraction 11/22/2018

13 Introduction (cont.) The Convolutional Neural Networks
A fundamental machine learning tool Good performance in a wide range of problems in computer vision as well as other research areas Evolutions in many real-world applications Theory: a multi-layer, hierarchical network often has a larger capacity, also requires a larger amount of data to get trained 11/22/2018

14 Outline Introduction Second-Order Response Transform Experiments
Conclusions and Future Work 11/22/2018 VALSE Webinar 2017

15 Motivation The representation ability of deep neural networks comes from the composition of nonlinear functions Currently, the main source of nonlinearity comes from the ReLU (or sigmoid) activation, and the max-pooling operation We add a second-order term into the network to facilitate nonlinearity 11/22/2018 VALSE Webinar 2017

16 Branched Network Structures
An input data cube 𝐱 is feed into two parallel modules, and we get intermediate outputs 𝐅 1 𝐱; 𝜽 1 and 𝐅 𝟐 𝐱; 𝜽 2 , then fuse them into an output cube 𝐲 Example 1: in the Maxout network, 𝐅 1 𝐱 = 𝜽 1 𝐱 𝐅 2 𝐱 = 𝜽 2 𝐱, and 𝐲= max 𝐅 1 𝐱 , 𝐅 2 𝐱 Example 2: in the deep ResNet, 𝐅 1 𝐱 =𝐱, 𝐅 2 𝐱 = 𝜽 2 ′ 𝜎 𝜽 2 𝐱 , and 𝐲= 𝐅 1 𝐱 + 𝐅 2 𝐱 11/22/2018 VALSE Webinar 2017

17 Formulation Adding a second-order term into the fusion stage of 𝐅 1 𝐱 and 𝐅 𝟐 𝐱 𝐲= 𝐅 1 𝐱 + 𝐅 2 𝐱 + 𝐅 1 𝐱 ⊙ 𝐅 2 𝐱 ⊙ is element-wise product operation Implementation Details Gradient back-propagation is straightforward Less than 5% extra time, no extra memory 11/22/2018 VALSE Webinar 2017

18 Illustration A single-branch network, after each convolution layer is replaced by a two-branch module, can be improved by SORT 𝐱 𝐅 1 𝐱 𝐅 2 𝐱 𝐲 𝐅 𝐱 𝐲 R = 𝐅 1 𝐱 + 𝐅 2 𝐱 𝐲 S = 𝐅 1 𝐱 + 𝐅 2 𝐱 + 𝐅 1 𝐱 ⊙ 𝐅 2 𝐱 𝐲 R =𝐱+𝐅 𝐱 𝐲 S =𝐱+𝐅 𝐱 + 𝐱⊙𝐅 𝐱 ORIGINAL SORT A Two-Branch Block A Residual Block conv-1a conv-1b conv-2a conv-2b conv-a conv-b Fusion 11/22/2018 VALSE Webinar 2017

19 Benefit? What is the benefit of the second-order term?
Increasing nonlinearity The roles of different orders Cross-branch gradient back-propagation Other explanations? 11/22/2018 VALSE Webinar 2017

20 Increasing the Nonlinearity
Both ReLU and max operations are nonlinear at a sub-dimension, but a real second-order term is nonlinear at the entire input space 𝐅 1 + 𝐅 𝟐 max 𝐅 1 , 𝐅 𝟐 𝐅 1 ⊙ 𝐅 𝟐 ResNet-20 on CIFAR10 7.60 7.55 not converge 7.63 𝟕.𝟏𝟒 7.64 7.90 11/22/2018 VALSE Webinar 2017

21 The Role of Different Orders
Linear terms help convergence It is not recommended to use 𝐅 1 ⊙ 𝐅 𝟐 alone Nonlinear terms help representation ability Using a second-order term is better than using a piecewise linear term (such as ReLU and max) A combination of linear and nonlinear terms produces the best performance 11/22/2018 VALSE Webinar 2017

22 Cross-Branch Gradient Back-Prop
Original form: 𝐲= 𝐅 1 𝐱; 𝜽 1 + 𝐅 2 𝐱; 𝜽 2 𝜕𝐲 𝜕 𝜽 1 only depends on 𝜽 1 , 𝜕𝐲 𝜕 𝜽 2 only depends on 𝜽 2 SORT: 𝐲= 𝐅 1 𝐱; 𝜽 1 + 𝐅 2 𝐱; 𝜽 2 + 𝐅 1 𝐱; 𝜽 1 ⊙ 𝐅 2 𝐱; 𝜽 2 Both 𝜕𝐲 𝜕 𝜽 1 and 𝜕𝐲 𝜕 𝜽 2 depends on both 𝜽 1 and 𝜽 2 A branch can update the parameter based on the information from another branch 11/22/2018 VALSE Webinar 2017

23 Any Other Explanations?
This is still an open problem! Possible options Using a nonlinear kernel in visual recognition Gating: a popular idea in recurrent CNN The mask operation in the attention model 11/22/2018 VALSE Webinar 2017

24 Outline Introduction Second-Order Response Transform Experiments
Conclusions and Future Work 11/22/2018 VALSE Webinar 2017

25 Small-Scale Experiments
Datasets CIFAR10, CIFAR100, SVHN Networks LeNet (5 layers) BigNet (11 layers) ResNet (20 layers, 32 layers, 56 layers) WideResNet (28 layers) 11/22/2018 VALSE Webinar 2017

26 Small-Scale Results Network CIFAR10 CIFAR100 SVHN DSN (2014) 7.97
34.57 1.92 r-CNN (2015) 7.09 31.75 1.77 GePool (2016) 6.05 32.37 1.69 WRN (2016) 5.37 24.53 1.85 StocNet (2016) 5.25 24.98 1.75 DenNet (2017) 3.74 19.25 1.59 LeNet* 11.10 𝟏𝟎.𝟑𝟒 36.93 𝟑𝟒.𝟕𝟓 2.55 𝟐.𝟑𝟗 BigNet* 6.84 𝟔.𝟔𝟎 29.25 𝟐𝟖.𝟎𝟕 1.97 𝟏.𝟖𝟕 ResNet-20 7.60 𝟕.𝟏𝟒 30.66 𝟑𝟎.𝟏𝟗 2.04 𝟐.𝟎𝟏 ResNet-32 6.72 𝟔.𝟏𝟔 29.55 𝟐𝟖.𝟖𝟒 2.20 𝟏.𝟗𝟒 ResNet-56 6.00 𝟓.𝟓𝟐 27.55 𝟐𝟔.𝟖𝟖 2.22 𝟏.𝟖𝟏 WRN-28 4.78 𝟒.𝟎𝟎 22.05 𝟐𝟎.𝟗𝟒 1.80 𝟏.𝟓𝟐 11/22/2018 VALSE Webinar 2017

27 Small-Scale Results Network CIFAR10 CIFAR100 SVHN DSN (2014) 7.97
34.57 1.92 r-CNN (2015) 7.09 31.75 1.77 GePool (2016) 6.05 32.37 1.69 WRN (2016) 5.37 24.53 1.85 StocNet (2016) 5.25 24.98 1.75 DenNet (2017) 3.74 19.25 1.59 LeNet* 11.10 𝟏𝟎.𝟑𝟒 36.93 𝟑𝟒.𝟕𝟓 2.55 𝟐.𝟑𝟗 BigNet* 6.84 𝟔.𝟔𝟎 29.25 𝟐𝟖.𝟎𝟕 1.97 𝟏.𝟖𝟕 ResNet-20 7.60 𝟕.𝟏𝟒 30.66 𝟑𝟎.𝟏𝟗 2.04 𝟐.𝟎𝟏 ResNet-32 6.72 𝟔.𝟏𝟔 29.55 𝟐𝟖.𝟖𝟒 2.20 𝟏.𝟗𝟒 ResNet-56 6.00 𝟓.𝟓𝟐 27.55 𝟐𝟔.𝟖𝟖 2.22 𝟏.𝟖𝟏 WRN-28 4.78 𝟒.𝟎𝟎 22.05 𝟐𝟎.𝟗𝟒 1.80 𝟏.𝟓𝟐 11/22/2018 VALSE Webinar 2017

28 Small-Scale Results Network CIFAR10 CIFAR100 SVHN DSN (2014) 7.97
34.57 1.92 r-CNN (2015) 7.09 31.75 1.77 GePool (2016) 6.05 32.37 1.69 WRN (2016) 5.37 24.53 1.85 StocNet (2016) 5.25 24.98 1.75 DenNet (2017) 3.74 19.25 1.59 LeNet* 11.10 𝟏𝟎.𝟑𝟒 36.93 𝟑𝟒.𝟕𝟓 2.55 𝟐.𝟑𝟗 BigNet* 6.84 𝟔.𝟔𝟎 29.25 𝟐𝟖.𝟎𝟕 1.97 𝟏.𝟖𝟕 ResNet-20 7.60 𝟕.𝟏𝟒 30.66 𝟑𝟎.𝟏𝟗 2.04 𝟐.𝟎𝟏 ResNet-32 6.72 𝟔.𝟏𝟔 29.55 𝟐𝟖.𝟖𝟒 2.20 𝟏.𝟗𝟒 ResNet-56 6.00 𝟓.𝟓𝟐 27.55 𝟐𝟔.𝟖𝟖 2.22 𝟏.𝟖𝟏 WRN-28 4.78 𝟒.𝟎𝟎 22.05 𝟐𝟎.𝟗𝟒 1.80 𝟏.𝟓𝟐 11/22/2018 VALSE Webinar 2017

29 ImageNet Experiments Dataset Networks ILSVRC2012 AlexNet (8 layers)
ResNet (18, 34, or 50 layers) The Facebook implementation on pytorch is used 11/22/2018 VALSE Webinar 2017

30 ImageNet Results Network Top-1 Error Top-5 Error AlexNet 43.19 19.87
36.66 14.79 AlexNet*+SORT 𝟑𝟓.𝟔𝟔 𝟏𝟒.𝟏𝟑 ResNet-18 30.50 11.07 ResNet-18+SORT 𝟐𝟗.𝟗𝟓 𝟏𝟎.𝟖𝟎 ResNet-34 27.02 8.77 ResNet-34+SORT 𝟐𝟔.𝟓𝟕 𝟖.𝟓𝟓 ResNet-50 24.10 7.11 ResNet-50+SORT 𝟐𝟑.𝟖𝟐 𝟔.𝟕𝟐 11/22/2018 VALSE Webinar 2017

31 Outline Introduction Second-Order Response Transform Experiments
Conclusions and Future Work 11/22/2018 VALSE Webinar 2017

32 Conclusions SORT: a simple idea to improve deep networks
Effective: accuracy is boosted consistently Efficient: a light-weighted operation which needs less than 2% extra time and no extra memory Can be applied to a wide range of networks The role of different terms First-order terms: basic property and convergence Second-order terms: nonlinearity 11/22/2018 VALSE Webinar 2017

33 Future Work Applying SORT to the concatenation module?
Inception, ResNeXt, DenseNet, etc. Adding other terms? Even higher-order, or arbitrary polynomial terms Non-polynomial terms Application to recurrent neural networks? 11/22/2018 VALSE Webinar 2017

34 ICCV 2017 Genetic CNN Speaker: Lingxi Xie
Authors: Lingxi Xie, Alan Yuille Department of Computer Science The Johns Hopkins University

35 Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 11/22/2018 VALSE Webinar 2017

36 Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 11/22/2018 VALSE Webinar 2017

37 Introduction Deep Learning
The state-of-the-art machine learning theory Using a cascade of many layers of non-linear neurons for feature extraction and transformation Learning multiple levels of feature representation Higher-level features are derived from lower-level features to form a hierarchical architecture Multiple levels of representation correspond to different levels of abstraction 11/22/2018 VALSE Webinar 2017

38 Introduction (cont.) The Convolutional Neural Networks
A fundamental machine learning tool Good performance in a wide range of problems in computer vision as well as other research areas Evolutions in many real-world applications Theory: a multi-layer, hierarchical network often has a larger capacity, also requires a larger amount of data to get trained 11/22/2018 VALSE Webinar 2017

39 Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 11/22/2018 VALSE Webinar 2017

40 Designing CNN Structures
History From linear to non-linear From shallow to deep From fully-connected to convolutional Today A cascade of various types of non-linear units Typical units: convolution, pooling, activation, etc. 11/22/2018 VALSE Webinar 2017

41 Example Networks LeNet [LeCun et.al, 1998] 11/22/2018
VALSE Webinar 2017

42 Example Networks (cont.)
AlexNet [Krizhevsky et.al, 2012] 11/22/2018 VALSE Webinar 2017

43 Example Networks (cont.)
Other deep networks VGGNet [Simonyan et.al, 2014] GoogLeNet (Inception) [Szegedy et.al, 2014] Deep ResNet [He et.al, 2016] DenseNet [Huang et.al, 2016] 11/22/2018 VALSE Webinar 2017

44 Problem All the networks architectures are fixed
This limits the ability and complexity of the networks We see some examples such as the Stochastic Network [Huang et.al, 2016], which allows the network to skip some layers in the training stage, but we point out that this is a fixed structure with a stochastic training strategy 11/22/2018 VALSE Webinar 2017

45 Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 11/22/2018 VALSE Webinar 2017

46 General Idea Modeling a large family of CNN architectures as a solution space In this work, each architecture is encoded into a binary string of a fixed length Using an efficient search algorithm to explore good candidates In this work, the genetic algorithm is used 11/22/2018 VALSE Webinar 2017

47 The Genetic Algorithm A metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms Commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection 11/22/2018 VALSE Webinar 2017

48 The Genetic Algorithm (cont.)
Typical requirements of a genetic process A genetic representation of each individual (a sample in the solution space) A function to evaluate each individual (cost function or loss function) 11/22/2018 VALSE Webinar 2017

49 The Genetic Algorithm (cont.)
Flowchart of a genetic process Initialization: generating a population of individuals to start with Selection: determining which individuals survive Genetic operations: crossover, mutation, etc. Iteration: repeating the above two process several times and ending the process when a condition holds 11/22/2018 VALSE Webinar 2017

50 The Genetic Algorithm (cont.)
Example: the Traveling Salesman Problem (TSP) Finding the shortest Hamilton path over 𝑁 towns A typical genetic algorithm for TSP Genetic representation: a permutation of 𝑁 numbers Cost function: the total length of the current path Crossover: switching the sub-sequences in two paths Mutation: switching the position of two towns in a path Termination: after a fixed number of generations 11/22/2018 VALSE Webinar 2017

51 General Framework Two requirements of the genetic algorithm
A genetic representation: each CNN is encoded into a fixed length (𝐿) of binary codes An evaluation function: the network is trained from scratch and the accuracy is obtained Note: the genetic algorithm is only used to generate network structures, the network weights are trained from scratch! 11/22/2018 VALSE Webinar 2017

52 General Framework (cont.)
Input: # of individuals 𝑁, # of generations 𝑇, network configuration (to be detailed later), hyper-parameters (to be detailed later) Initialization: generating 𝑁 random individuals Evaluating each individual by training from scratch Repeat the following process for 𝑇 rounds Selection: generating 𝑁 individuals with Russian Roulette Crossover and mutation: generating new individuals pairwise or singly Evaluating each new individual by training from scratch Output: population after 𝑁 generations 11/22/2018 VALSE Webinar 2017

53 Encoding CNN into Binary Codes
Input: the number of stages 𝑆, the number of nodes 𝑚 𝑠 in each stage Each stage is a a DAG structure: node 𝑗 can receive information from node 𝑖 of 𝑖<𝑗 There is a bit denoting if node 𝑗 takes input from node 𝑖 A node sums up all its inputs and performs convolution There is a “source” node at the beginning, performing convolution and feeding the results to all nodes without a precedent; there is a “destination” node at the end, collecting from all nodes without a follower Output: a binary vector of length 𝑠 𝑚 𝑠 𝑚 𝑠 −1 11/22/2018 VALSE Webinar 2017

54 What is Encoded? What is encoded: What is not encoded:
Connection between layers in the same stage What is not encoded: Network weights The number of filters at each layer Geometric information such as stride and size Other layers such as pooling and activation Fully-connected stages 11/22/2018 VALSE Webinar 2017

55 Example of CNN Encoding
INPUT A1 A2 A3 A4 A0 A5 POOL1 pooling next stage 32×32×3 Code: 16×16×32 prev. stage B0 B1 B2 B3 B4 B5 B6 POOL2 Code: 8×8×64 Encoding Area Stage 1 Stage 2 11/22/2018 VALSE Webinar 2017

56 Relationship to Popular Nets
What can be encoded: Chain nets (e.g., VGGNet) Highway nets (e.g., ResNet) DenseNet [Huang et.al, 2016] What cannot be encoded: Multi-scale nets (e.g., GoogLeNet, a.k.a., Inception) Tricky modules (e.g., MaxOut) conv layer VGGNet 𝐾=4 ResNet 𝐾=3 Code: Code: 1-11 11/22/2018 VALSE Webinar 2017

57 Notations 𝑁: # of individuals, 𝑇: # of rounds
𝑆: # of stages, 𝐾 𝑠 : # of nodes at the 𝑠-th stage, 𝐿= 𝑠 𝐾 𝑠 𝐾 𝑠 −1 : # of bits 𝕄 𝑡,𝑛 : the 𝑛-th individual in the 𝑡-th round 𝑏 𝑡,𝑛 𝑙 ∈ 0,1 : the 𝑙-th bit in 𝕄 𝑡,𝑛 𝑟 𝑡,𝑛 : the fitness function value of 𝕄 𝑡,𝑛 11/22/2018 VALSE Webinar 2017

58 Initialization 𝑏 0,𝑛 𝑙 ~ℬ 0.5 , 𝑙=1,2,⋯,𝐿
We shall see later that initialization does not impact much on the genetic process 11/22/2018 VALSE Webinar 2017

59 Selection An individual is more likely to be selected if it produces better recognition performance The probability of selecting 𝕄 𝑡,𝑛 is proportional to 𝑟 𝑡,𝑛 − min 𝑛 𝑟 𝑡,𝑛 A Russian roulette process The worst individual is always eliminated, and some good individuals may be selected multiple times 11/22/2018 VALSE Webinar 2017

60 Crossover and Mutation
Enumerating each pair, performing crossover with probability 𝑝 C , if not used for crossover, performing mutation with probability 𝑝 M Crossover: switching each stage (multiple bits) with probability 𝑞 C Mutation: flipping each bit with probability 𝑞 M 11/22/2018 VALSE Webinar 2017

61 Evaluation A training-from-scratch process on 𝕄 𝑡,𝑛
If 𝕄 𝑡,𝑛 is previously evaluated, it is evaluated once again and the average accuracy is preserved To guarantee the testing data remain unseen, we partition the original training set into two (training and validation) subsets 11/22/2018 VALSE Webinar 2017

62 Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 11/22/2018 VALSE Webinar 2017

63 MNIST Experiments The MNIST dataset Network setting
10 classes, 60,000 training (50,000 for training, 10,000 for validation) and 10,000 testing images Network setting 𝑆=2, 𝐾 1 , 𝐾 2 = 3,5 𝐿=13, 2 𝐿 =8192 𝑁=20 (individuals), 𝐿=50 (rounds) 𝑝 M =0.8, 𝑞 M =0.1, 𝑝 C =0.2, 𝑞 C =0.3 11/22/2018 VALSE Webinar 2017

64 MNIST Results Gen Max % Min % Avg % Med % Std-D % 99.59 99.38 99.50
99.59 99.38 99.50 0.06 1 99.61 99.40 99.53 99.54 0.05 2 99.62 99.43 99.55 99.58 3 99.56 5 99.46 99.57 0.04 8 99.63 99.60 10 20 99.45 30 99.64 99.49 50 99.66 99.51 99.65 11/22/2018 VALSE Webinar 2017

65 CIFAR10 Experiments The CIFAR10 dataset Network setting
10 classes, 50,000 training (40,000 for training, 10,000 for validation) and 10,000 testing images Network setting 𝑆=3, 𝐾 1 , 𝐾 2 = 3,4,5 𝐿=19, 2 𝐿 =524288 𝑁=20 (individuals), 𝐿=50 (rounds) 𝑝 M =0.8, 𝑞 M =0.05, 𝑝 C =0.2, 𝑞 C =0.2 11/22/2018 VALSE Webinar 2017

66 CIFAR10 Results Gen Max % Min % Avg % Med % Std-D % 75.96 71.81 74.39
75.96 71.81 74.39 74.53 0.91 1 73.93 75.01 75.17 0.57 2 73.95 75.32 75.48 3 76.06 73.47 75.37 75.62 0.70 5 76.24 72.60 75.65 0.89 8 76.59 74.75 75.77 75.86 0.53 10 76.72 73.92 75.68 75.80 0.88 20 76.83 74.91 76.45 76.79 0.61 30 76.95 74.38 76.42 76.53 0.46 50 77.06 75.84 76.58 76.81 0.55 11/22/2018 VALSE Webinar 2017

67 Diagnosis: Initialization Issues
Is the genetic process sensitive to initialization? 11/22/2018 VALSE Webinar 2017

68 Diagnosis: Rationality
Do strong parents generate strong children? 11/22/2018 VALSE Webinar 2017

69 Designed CNN Structures
1 2 3 4 5 6 Code: 1-01 Chain-Shaped Networks AlexNet VGGNet Code: Code: Code: Multiple-Path GoogLeNet Highway Deep ResNet Two individual genetic processes are performed The best individuals after the final round are shown A little bit surprisingly, the learned network structures are similar in two individual genetic processes 11/22/2018 VALSE Webinar 2017

70 Transferring to Other Datasets
Using a basic structure learned from VGGNet For small datasets, 3 learned stages followed by fully-connected layers 64,128,256 filters at 3 stages For ILSVRC2012, 2 fixed down-sampling stages followed by 3 learned stages followed by fully-connected layers 256,512,512 filters at 3 stages 11/22/2018 VALSE Webinar 2017

71 Experiments: SVHN and CIFAR
DSN [Lee et.al, 2014] 1.92 7.97 34.57 Gener. Pooling [Lee et.al, 2016] 1.69 6.05 32.37 WideResNet [Zagorukyo, 2016] 1.85 5.37 24.53 StocNet [Huang et.al, 2016] 1.75 5.25 24.98 DenseNet [Huang et.al, 2016] 1.59 3.74 19.25 GeNet #1, after Gen-00 2.25 8.18 31.46 GeNet #1, after Gen-05 2.15 7.67 30.17 GeNet #1, after Gen-20 2.05 7.36 29.63 GeNet #1, after Gen-50 1.99 7.19 29.03 GeNet #2, after Gen-50 1.97 7.10 29.05 11/22/2018 VALSE Webinar 2017

72 Experiments: ILSVRC12 Top-1 Top-5 Depth
AlexNet [Krizhevsky et.al, 2012] 42.6 19.6 8 GoogLeNet [Szege. et.al, 2016] 34.2 12.9 22 VGGNet-16 [Simon. et.al, 2016] 28.5 9.9 16 VGGNet-19 [Simon. et.al, 2016] 28.7 19 ResNet-50 [He et.al, 2016] 24.6 7.7 50 ResNet-101 [He et.al, 2016] 23.4 7.0 101 ResNet-152 [He et.al, 2016] 23.0 6.7 152 GeNet #1 28.12 9.95 GeNet #2 27.87 9.74 11/22/2018 VALSE Webinar 2017

73 Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 11/22/2018 VALSE Webinar 2017

74 Limitations The genetic process is very slow
The explored network structures are still of limited flexibility Our approach is not evaluated in the scenario of very deep networks (hundreds of layers)? Our approach cannot symbiotically learn network structure and network weights 11/22/2018 VALSE Webinar 2017

75 Conclusions A genetic process to explore CNN structures
Foundation: a CNN encoding scheme Fact: the “genes” in strong individuals Efficient genetic operations are performed A lot of future work is remaining Increasing the depth of the networks Adding more network modules Incorporating learning network weights 11/22/2018 VALSE Webinar 2017

76 Thank you! Questions please? 11/22/2018 VALSE Webinar 2017


Download ppt "VALSE Webinar ICCV Pre-conference SORT & Genetic CNN"

Similar presentations


Ads by Google