Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li

Similar presentations


Presentation on theme: "Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li"— Presentation transcript:

1 Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li
Learning Deep Compositional Grammatical Architectures for Visual Recognition Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li Xilai Li李曦来, EE->CS and-or graph Tianfu Wu, PhD of Song-Chun Zhu Xi Song, student of Yunde Jia, CVPR 2013 and-or graph with Song-Chun Zhu

2 Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

3 Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

4 Contributions The first work that utilizes (AND-OR) grammar models in network engineering, which facilitates both feature exploration and exploitation in a hierarchical and compositional way. Better performance than state-of-the-art networks in image classification and object detection.

5 Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

6 Motivation and Objective
DLA, … Unify the best practices developed in the popular networks? Generate building blocks thus networks in a principled way? By compositional grammatical architectures!

7 Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

8 AOG Building Block The phrase structure grammar: Terminal, AND, OR
The dependency grammar: model lateral connections The hierarchy facilitates gradual increase of feature channels as in Deep Pyramid ResNets [20], and also leads to good balance between depth and width of networks. The compositional structure provides much more flexible information flows than DPN [7] and the DLA [69]. The lateral connections induce feature diversity and increase the effective depth of nodes along the path without introducing extra parameters.

9 Nodes in AOG Building Block
Terminal-nodes implement split-transform heuristic AND-nodes implement DenseNet-like aggregation (i.e., concatenation) for feature exploration. OR-nodes implement ResNet-like aggregation (i.e., summation) for feature exploitation.

10 Nodes Operations in AOG Block

11 Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

12 AOGNet

13 Simplifying AOG Building Blocks

14 Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

15 Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study

16 Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study

17 CIFAR-10 and CIFAR-100 AOGNet-PrimitiveSize-(#AOG blocks per stage)-[OutputFeatDim] floating point operations per second (FLOPS, flops or flop/s) In the table, FLOPs may indicate floating point operations. Pooling contains no parameters but has floating point operations; Conv has larger FLOPs/#Params than FC. FLOPs/#Params: Pooling/ReLU/…>Conv>FC

18 ImageNet-1K (cloud platforms)

19 ImageNet-1K (mobile platforms)

20 Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study

21 PASCAL VOC 2007 and 2012

22 Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study

23 Ablation Study RS: Removing Symmetric child nodes of OR-nodes in the pruned AOG building blocks, LC: adding Lateral Connections for dependency grammars.

24 Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

25 Conclusions A method of learning deep compositional grammatical architectures which are capable of harnessing the best of grammars and deep neural networks for visual recognition An implementation with AND-OR Grammars, called AOGNets Promise performance on three image classification datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) and two object detection datasets (PASCAL VOC 2007 and 2012)

26 Update! AOGNets: Compositional Grammatical Architectures for Deep Learning ( v3, maybe camera-ready) Model Interpretability Adversarial Defense Object Detection and Segmentation in COCO

27 Model Interpretability

28 Adversarial Defense


Download ppt "Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li"

Similar presentations


Ads by Google