Download presentation
Presentation is loading. Please wait.
Published byCaio di Azevedo Modified over 5 years ago
1
Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li
Learning Deep Compositional Grammatical Architectures for Visual Recognition Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li Xilai Li李曦来, EE->CS and-or graph Tianfu Wu, PhD of Song-Chun Zhu Xi Song, student of Yunde Jia, CVPR 2013 and-or graph with Song-Chun Zhu
2
Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions
3
Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions
4
Contributions The first work that utilizes (AND-OR) grammar models in network engineering, which facilitates both feature exploration and exploitation in a hierarchical and compositional way. Better performance than state-of-the-art networks in image classification and object detection.
5
Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions
6
Motivation and Objective
DLA, … Unify the best practices developed in the popular networks? Generate building blocks thus networks in a principled way? By compositional grammatical architectures!
7
Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions
8
AOG Building Block The phrase structure grammar: Terminal, AND, OR
The dependency grammar: model lateral connections The hierarchy facilitates gradual increase of feature channels as in Deep Pyramid ResNets [20], and also leads to good balance between depth and width of networks. The compositional structure provides much more flexible information flows than DPN [7] and the DLA [69]. The lateral connections induce feature diversity and increase the effective depth of nodes along the path without introducing extra parameters.
9
Nodes in AOG Building Block
Terminal-nodes implement split-transform heuristic AND-nodes implement DenseNet-like aggregation (i.e., concatenation) for feature exploration. OR-nodes implement ResNet-like aggregation (i.e., summation) for feature exploitation.
10
Nodes Operations in AOG Block
11
Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions
12
AOGNet
13
Simplifying AOG Building Blocks
14
Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions
15
Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
16
Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
17
CIFAR-10 and CIFAR-100 AOGNet-PrimitiveSize-(#AOG blocks per stage)-[OutputFeatDim] floating point operations per second (FLOPS, flops or flop/s) In the table, FLOPs may indicate floating point operations. Pooling contains no parameters but has floating point operations; Conv has larger FLOPs/#Params than FC. FLOPs/#Params: Pooling/ReLU/…>Conv>FC
18
ImageNet-1K (cloud platforms)
19
ImageNet-1K (mobile platforms)
20
Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
21
PASCAL VOC 2007 and 2012
22
Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
23
Ablation Study RS: Removing Symmetric child nodes of OR-nodes in the pruned AOG building blocks, LC: adding Lateral Connections for dependency grammars.
24
Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions
25
Conclusions A method of learning deep compositional grammatical architectures which are capable of harnessing the best of grammars and deep neural networks for visual recognition An implementation with AND-OR Grammars, called AOGNets Promise performance on three image classification datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) and two object detection datasets (PASCAL VOC 2007 and 2012)
26
Update! AOGNets: Compositional Grammatical Architectures for Deep Learning ( v3, maybe camera-ready) Model Interpretability Adversarial Defense Object Detection and Segmentation in COCO
27
Model Interpretability
28
Adversarial Defense
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.