Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li

Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li
Learning Deep Compositional Grammatical Architectures for Visual Recognition Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li Xilai Li李曦来, EE->CS and-or graph Tianfu Wu, PhD of Song-Chun Zhu Xi Song, student of Yunde Jia, CVPR 2013 and-or graph with Song-Chun Zhu

Outline Contributions Motivation and Objective Method Overview AOGNets
Experiments Conclusions

Contributions The first work that utilizes (AND-OR) grammar models in network engineering, which facilitates both feature exploration and exploitation in a hierarchical and compositional way. Better performance than state-of-the-art networks in image classification and object detection.

Motivation and Objective
DLA, … Unify the best practices developed in the popular networks? Generate building blocks thus networks in a principled way? By compositional grammatical architectures!

AOG Building Block The phrase structure grammar: Terminal, AND, OR
The dependency grammar: model lateral connections The hierarchy facilitates gradual increase of feature channels as in Deep Pyramid ResNets [20], and also leads to good balance between depth and width of networks. The compositional structure provides much more flexible information flows than DPN [7] and the DLA [69]. The lateral connections induce feature diversity and increase the effective depth of nodes along the path without introducing extra parameters.

Nodes in AOG Building Block
Terminal-nodes implement split-transform heuristic AND-nodes implement DenseNet-like aggregation (i.e., concatenation) for feature exploration. OR-nodes implement ResNet-like aggregation (i.e., summation) for feature exploitation.

Nodes Operations in AOG Block

AOGNet

Simplifying AOG Building Blocks

Experiments Image Classification Object Detection Ablation Study
CIFAR-10 CIFAR-100 ImageNet-1K Object Detection PASCAL VOC 2007 PASCAL VOC 2012 Ablation Study

CIFAR-10 and CIFAR-100 AOGNet-PrimitiveSize-(#AOG blocks per stage)-[OutputFeatDim] floating point operations per second (FLOPS, flops or flop/s) In the table, FLOPs may indicate floating point operations. Pooling contains no parameters but has floating point operations; Conv has larger FLOPs/#Params than FC. FLOPs/#Params: Pooling/ReLU/…>Conv>FC

ImageNet-1K (cloud platforms)

ImageNet-1K (mobile platforms)

PASCAL VOC 2007 and 2012

Ablation Study RS: Removing Symmetric child nodes of OR-nodes in the pruned AOG building blocks, LC: adding Lateral Connections for dependency grammars.

Conclusions A method of learning deep compositional grammatical architectures which are capable of harnessing the best of grammars and deep neural networks for visual recognition An implementation with AND-OR Grammars, called AOGNets Promise performance on three image classification datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) and two object detection datasets (PASCAL VOC 2007 and 2012)

Update! AOGNets: Compositional Grammatical Architectures for Deep Learning ( v3, maybe camera-ready) Model Interpretability Adversarial Defense Object Detection and Segmentation in COCO

Model Interpretability

Adversarial Defense

Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li

Similar presentations

Presentation on theme: "Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li

Similar presentations

Presentation on theme: "Xilai Li, Tianfu Wu, and Xi Song CVPR 2019 Presented by Dingquan Li"— Presentation transcript:

Similar presentations

About project

Feedback