Download presentation
Presentation is loading. Please wait.
1
CVPR 2017 (in submission) Genetic CNN
Speaker: Lingxi Xie Authors: Lingxi Xie, Alan Yuille Department of Computer Science The Johns Hopkins University
2
Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 9/22/2018 CVPR 2017 – In submission
3
Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 9/22/2018 CVPR 2017 – In submission
4
Lingxi Xie (谢凌曦) Education Background Working Experience
Bachelor in Engineering, Tsinghua University, 2010 Ph.D. in Engineering, Tsinghua University, 2015 Working Experience Visiting Student, the University of Texas at San Antonio, 2014 (supervisor: Prof. Qi Tian) Research Intern, Microsoft Research Asia, 2013 – 2015 (supervisor: Dr. Jingdong Wang) Postdoc Researcher, the Johns Hopkins University, 2015 – Present (supervisor: Prof. Alan Yuille) 9/22/2018 CVPR 2017 – In submission
5
Introduction Deep Learning
The state-of-the-art machine learning theory Using a cascade of many layers of non-linear neurons for feature extraction and transformation Learning multiple levels of feature representation Higher-level features are derived from lower-level features to form a hierarchical architecture Multiple levels of representation correspond to different levels of abstraction 9/22/2018 CVPR 2017 – In submission
6
Introduction (cont.) The Convolutional Neural Networks
A fundamental machine learning tool Good performance in a wide range of problems in computer vision as well as other research areas Evolutions in many real-world applications Theory: a multi-layer, hierarchical network often has a larger capacity, also requires a larger amount of data to get trained 9/22/2018 CVPR 2017 – In submission
7
Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 9/22/2018 CVPR 2017 – In submission
8
Designing CNN Structures
History From linear to non-linear From shallow to deep From fully-connected to convolutional Today A cascade of various types of non-linear units Typical units: convolution, pooling, activation, etc. 9/22/2018 CVPR 2017 – In submission
9
Example Networks LeNet [LeCun et.al, 1998] 9/22/2018
CVPR 2017 – In submission
10
Example Networks (cont.)
AlexNet [Krizhevsky et.al, 2012] 9/22/2018 CVPR 2017 – In submission
11
Example Networks (cont.)
Other deep networks VGGNet [Simonyan et.al, 2014] GoogLeNet (Inception) [Szegedy et.al, 2014] Deep ResNet [He et.al, 2016] DenseNet [Huang et.al, 2016] 9/22/2018 CVPR 2017 – In submission
12
Problem All the networks architectures are fixed
This limits the ability and complexity of the networks We see some examples such as the Stochastic Network [Huang et.al, 2016], which allows the network to skip some layers in the training stage, but we point out that this is a fixed structure with a stochastic training strategy 9/22/2018 CVPR 2017 – In submission
13
Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 9/22/2018 CVPR 2017 – In submission
14
General Idea Modeling a large family of CNN architectures as a solution space In this work, each architecture is encoded into a binary string of a fixed length Using an efficient search algorithm to explore good candidates In this work, the genetic algorithm is used 9/22/2018 CVPR 2017 – In submission
15
The Genetic Algorithm A metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms Commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection 9/22/2018 CVPR 2017 – In submission
16
The Genetic Algorithm (cont.)
Typical requirements of a genetic process A genetic representation of each individual (a sample in the solution space) A function to evaluate each individual (cost function or loss function) 9/22/2018 CVPR 2017 – In submission
17
The Genetic Algorithm (cont.)
Flowchart of a genetic process Initialization: generating a population of individuals to start with Selection: determining which individuals survive Genetic operations: crossover, mutation, etc. Iteration: repeating the above two process several times and ending the process when a condition holds 9/22/2018 CVPR 2017 – In submission
18
The Genetic Algorithm (cont.)
Example: the Traveling Salesman Problem (TSP) Finding the shortest Hamilton path over 𝑁 towns A typical genetic algorithm for TSP Genetic representation: a permutation of 𝑁 numbers Cost function: the total length of the current path Crossover: switching the sub-sequences in two paths Mutation: switching the position of two towns in a path Termination: after a fixed number of generations 9/22/2018 CVPR 2017 – In submission
19
General Framework Two requirements of the genetic algorithm
A genetic representation: each CNN is encoded into a fixed length (𝐿) of binary codes An evaluation function: the network is trained from scratch and the accuracy is obtained Note: the genetic algorithm is only used to generate network structures, the network weights are trained from scratch! 9/22/2018 CVPR 2017 – In submission
20
General Framework (cont.)
Input: # of individuals 𝑁, # of generations 𝑇, network configuration (to be detailed later), hyper-parameters (to be detailed later) Initialization: generating 𝑁 random individuals Evaluating each individual by training from scratch Repeat the following process for 𝑇 rounds Selection: generating 𝑁 individuals with Russian Roulette Crossover and mutation: generating new individuals pairwise or singly Evaluating each new individual by training from scratch Output: population after 𝑁 generations 9/22/2018 CVPR 2017 – In submission
21
Encoding CNN into Binary Codes
Input: the number of stages 𝑆, the number of nodes 𝑚 𝑠 in each stage Each stage is a a DAG structure: node 𝑗 can receive information from node 𝑖 of 𝑖<𝑗 There is a bit denoting if node 𝑗 takes input from node 𝑖 A node sums up all its inputs and performs convolution There is a “source” node at the beginning, performing convolution and feeding the results to all nodes without a precedent; there is a “destination” node at the end, collecting from all nodes without a follower Output: a binary vector of length 𝑠 𝑚 𝑠 𝑚 𝑠 −1 9/22/2018 CVPR 2017 – In submission
22
What is Encoded? What is encoded: What is not encoded:
Connection between layers in the same stage What is not encoded: Network weights The number of filters at each layer Geometric information such as stride and size Other layers such as pooling and activation Fully-connected stages 9/22/2018 CVPR 2017 – In submission
23
Example of CNN Encoding
INPUT A1 A2 A3 A4 A0 A5 POOL1 pooling next stage 32×32×3 Code: 16×16×32 prev. stage B0 B1 B2 B3 B4 B5 B6 POOL2 Code: 8×8×64 Encoding Area Stage 1 Stage 2 9/22/2018 CVPR 2017 – In submission
24
Relationship to Popular Nets
What can be encoded: Chain nets (e.g., VGGNet) Highway nets (e.g., ResNet) DenseNet [Huang et.al, 2016] What cannot be encoded: Multi-scale nets (e.g., GoogLeNet, a.k.a., Inception) Tricky modules (e.g., MaxOut) conv layer VGGNet 𝐾=4 ResNet 𝐾=3 Code: Code: 1-11 9/22/2018 CVPR 2017 – In submission
25
Notations 𝑁: # of individuals, 𝑇: # of rounds
𝑆: # of stages, 𝐾 𝑠 : # of nodes at the 𝑠-th stage, 𝐿= 𝑠 𝐾 𝑠 𝐾 𝑠 −1 : # of bits 𝕄 𝑡,𝑛 : the 𝑛-th individual in the 𝑡-th round 𝑏 𝑡,𝑛 𝑙 ∈ 0,1 : the 𝑙-th bit in 𝕄 𝑡,𝑛 𝑟 𝑡,𝑛 : the fitness function value of 𝕄 𝑡,𝑛 9/22/2018 CVPR 2017 – In submission
26
Initialization 𝑏 0,𝑛 𝑙 ~ℬ 0.5 , 𝑙=1,2,⋯,𝐿
We shall see later that initialization does not impact much on the genetic process 9/22/2018 CVPR 2017 – In submission
27
Selection An individual is more likely to be selected if it produces better recognition performance The probability of selecting 𝕄 𝑡,𝑛 is proportional to 𝑟 𝑡,𝑛 − min 𝑛 𝑟 𝑡,𝑛 A Russian roulette process The worst individual is always eliminated, and some good individuals may be selected multiple times 9/22/2018 CVPR 2017 – In submission
28
Crossover and Mutation
Enumerating each pair, performing crossover with probability 𝑝 C , if not used for crossover, performing mutation with probability 𝑝 M Crossover: switching each stage (multiple bits) with probability 𝑞 C Mutation: flipping each bit with probability 𝑞 M 9/22/2018 CVPR 2017 – In submission
29
Evaluation A training-from-scratch process on 𝕄 𝑡,𝑛
If 𝕄 𝑡,𝑛 is previously evaluated, it is evaluated once again and the average accuracy is preserved To guarantee the testing data remain unseen, we partition the original training set into two (training and validation) subsets 9/22/2018 CVPR 2017 – In submission
30
Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 9/22/2018 CVPR 2017 – In submission
31
MNIST Experiments The MNIST dataset Network setting
10 classes, 60,000 training (50,000 for training, 10,000 for validation) and 10,000 testing images Network setting 𝑆=2, 𝐾 1 , 𝐾 2 = 3,5 𝐿=13, 2 𝐿 =8192 𝑁=20 (individuals), 𝐿=50 (rounds) 𝑝 M =0.8, 𝑞 M =0.1, 𝑝 C =0.2, 𝑞 C =0.3 9/22/2018 CVPR 2017 – In submission
32
MNIST Results Gen Max % Min % Avg % Med % Std-D % 99.59 99.38 99.50
99.59 99.38 99.50 0.06 1 99.61 99.40 99.53 99.54 0.05 2 99.62 99.43 99.55 99.58 3 99.56 5 99.46 99.57 0.04 8 99.63 99.60 10 20 99.45 30 99.64 99.49 50 99.66 99.51 99.65 9/22/2018 CVPR 2017 – In submission
33
Diagnosis: Rationality
Do strong parents generate strong children? 9/22/2018 CVPR 2017 – In submission
34
Diagnosis: Initialization Issues
Is the genetic process sensitive to initialization? 9/22/2018 CVPR 2017 – In submission
35
CIFAR10 Experiments The CIFAR10 dataset Network setting
10 classes, 50,000 training (40,000 for training, 10,000 for validation) and 10,000 testing images Network setting 𝑆=3, 𝐾 1 , 𝐾 2 = 3,4,5 𝐿=19, 2 𝐿 =524288 𝑁=20 (individuals), 𝐿=50 (rounds) 𝑝 M =0.8, 𝑞 M =0.05, 𝑝 C =0.2, 𝑞 C =0.2 9/22/2018 CVPR 2017 – In submission
36
CIFAR10 Results Gen Max % Min % Avg % Med % Std-D % 75.96 71.81 74.39
75.96 71.81 74.39 74.53 0.91 1 73.93 75.01 75.17 0.57 2 73.95 75.32 75.48 3 76.06 73.47 75.37 75.62 0.70 5 76.24 72.60 75.65 0.89 8 76.59 74.75 75.77 75.86 0.53 10 76.72 73.92 75.68 75.80 0.88 20 76.83 74.91 76.45 76.79 0.61 30 76.95 74.38 76.42 76.53 0.46 50 77.06 75.84 76.58 76.81 0.55 9/22/2018 CVPR 2017 – In submission
37
Designed CNN Structures
1 2 3 4 5 6 Code: 1-01 Chain-Shaped Networks AlexNet VGGNet Code: Code: Code: Multiple-Path GoogLeNet Highway Deep ResNet Two individual genetic processes are performed The best individuals after the final round are shown A little bit surprisingly, the learned network structures are similar in two individual genetic processes 9/22/2018 CVPR 2017 – In submission
38
Transferring to Other Datasets
Using a basic structure learned from VGGNet For small datasets, 3 learned stages followed by fully-connected layers 64,128,256 filters at 3 stages For ILSVRC2012, 2 fixed down-sampling stages followed by 3 learned stages followed by fully-connected layers 256,512,512 filters at 3 stages 9/22/2018 CVPR 2017 – In submission
39
Experiments: SVHN and CIFAR
DSN [Lee et.al, 2014] 1.92 7.97 34.57 Gener. Pooling [Lee et.al, 2016] 1.69 6.05 32.37 WideResNet [Zagorukyo, 2016] 1.85 5.37 24.53 StocNet [Huang et.al, 2016] 1.75 5.25 24.98 DenseNet [Huang et.al, 2016] 1.59 3.74 19.25 GeNet #1, after Gen-00 2.25 8.18 31.46 GeNet #1, after Gen-05 2.15 7.67 30.17 GeNet #1, after Gen-20 2.05 7.36 29.63 GeNet #1, after Gen-50 1.99 7.19 29.03 GeNet #2, after Gen-50 1.97 7.10 29.05 9/22/2018 CVPR 2017 – In submission
40
Experiments: ILSVRC12 Top-1 Top-5 Depth
AlexNet [Krizhevsky et.al, 2012] 42.6 19.6 8 GoogLeNet [Szege. et.al, 2016] 34.2 12.9 22 VGGNet-16 [Simon. et.al, 2016] 28.5 9.9 16 VGGNet-19 [Simon. et.al, 2016] 28.7 19 ResNet-50 [He et.al, 2016] 24.6 7.7 50 ResNet-101 [He et.al, 2016] 23.4 7.0 101 ResNet-152 [He et.al, 2016] 23.0 6.7 152 GeNet #1 28.12 9.95 GeNet #2 27.87 9.74 9/22/2018 CVPR 2017 – In submission
41
Outline Introduction Designing CNN Structures Genetic CNN Experiments
Discussions and Conclusions 9/22/2018 CVPR 2017 – In submission
42
Limitations The genetic process is very slow
The explored network structures are still of limited flexibility Our approach is not evaluated in the scenario of very deep networks (hundreds of layers)? Our approach cannot symbiotically learn network structure and network weights 9/22/2018 CVPR 2017 – In submission
43
Conclusions A genetic process to explore CNN structures
Foundation: a CNN encoding scheme Fact: the “genes” in strong individuals Efficient genetic operations are performed A lot of future work is remaining Increasing the depth of the networks Adding more network modules Incorporating learning network weights 9/22/2018 CVPR 2017 – In submission
44
Thank you! Questions please? 9/22/2018 CVPR 2017 – In submission
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.