Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li Learning Deep Compositional Grammatical Architectures for Visual Recognition Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li Xilai Li李曦来, EE->CS and-or graph Tianfu Wu, PhD of Song-Chun Zhu Xi Song, student of Yunde Jia, CVPR 2013 and-or graph with Song-Chun Zhu
Outline Contributions Motivation and Objective Method Overview AOGNets Experiments Conclusions
Outline Contributions Motivation and Objective Method Overview AOGNets Experiments Conclusions
Contributions The first work that utilizes (AND-OR) grammar models in network engineering, which facilitates both feature exploration and exploitation in a hierarchical and compositional way. Better performance than state-of-the-art networks in image classification and object detection.
Outline Contributions Motivation and Objective Method Overview AOGNets Experiments Conclusions
Motivation and Objective DLA, … Unify the best practices developed in the popular networks? Generate building blocks thus networks in a principled way? By compositional grammatical architectures!
Outline Contributions Motivation and Objective Method Overview AOGNets Experiments Conclusions
AOG Building Block The phrase structure grammar: Terminal, AND, OR The dependency grammar: model lateral connections The hierarchy facilitates gradual increase of feature channels as in Deep Pyramid ResNets [20], and also leads to good balance between depth and width of networks. The compositional structure provides much more flexible information flows than DPN [7] and the DLA [69]. The lateral connections induce feature diversity and increase the effective depth of nodes along the path without introducing extra parameters.
Nodes in AOG Building Block Terminal-nodes implement split-transform heuristic AND-nodes implement DenseNet-like aggregation (i.e., concatenation) for feature exploration. OR-nodes implement ResNet-like aggregation (i.e., summation) for feature exploitation.
Nodes Operations in AOG Block
Outline Contributions Motivation and Objective Method Overview AOGNets Experiments Conclusions
AOGNet
Simplifying AOG Building Blocks
Outline Contributions Motivation and Objective Method Overview AOGNets Experiments Conclusions
Experiments Image Classification Object Detection Ablation Study CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
Experiments Image Classification Object Detection Ablation Study CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
CIFAR-10 and CIFAR-100 AOGNet-PrimitiveSize-(#AOG blocks per stage)-[OutputFeatDim] floating point operations per second (FLOPS, flops or flop/s) In the table, FLOPs may indicate floating point operations. Pooling contains no parameters but has floating point operations; Conv has larger FLOPs/#Params than FC. FLOPs/#Params: Pooling/ReLU/…>Conv>FC
ImageNet-1K (cloud platforms)
ImageNet-1K (mobile platforms)
Experiments Image Classification Object Detection Ablation Study CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
PASCAL VOC 2007 and 2012
Experiments Image Classification Object Detection Ablation Study CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study
Ablation Study RS: Removing Symmetric child nodes of OR-nodes in the pruned AOG building blocks, LC: adding Lateral Connections for dependency grammars.
Outline Contributions Motivation and Objective Method Overview AOGNets Experiments Conclusions
Conclusions A method of learning deep compositional grammatical architectures which are capable of harnessing the best of grammars and deep neural networks for visual recognition An implementation with AND-OR Grammars, called AOGNets Promise performance on three image classification datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) and two object detection datasets (PASCAL VOC 2007 and 2012)
Update! AOGNets: Compositional Grammatical Architectures for Deep Learning (1711.05847v3, maybe camera-ready) Model Interpretability Adversarial Defense Object Detection and Segmentation in COCO
Model Interpretability
Adversarial Defense