Iterative Crowd Counting

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Limin Wang, Yu Qiao, and Xiaoou Tang
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Spatial Pyramid Pooling in Deep Convolutional
Detection, Segmentation and Fine-grained Localization
Fully Convolutional Networks for Semantic Segmentation
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.
Deeply-Recursive Convolutional Network for Image Super-Resolution
Recent developments in object detection
Deep Learning for Dual-Energy X-Ray
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Object Detection based on Segment Masks
Compact Bilinear Pooling
Object detection with deformable part-based models
ISBI Camelyon16 Challenge Prague, April 13, 2016
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Predictive Model for Autonomous Driving
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Saliency-guided Video Classification via Adaptively weighted learning
Depth estimation and Plane detection
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Compositional Human Pose Regression
Part-Based Room Categorization for Household Service Robots
Hierarchical Deep Convolutional Neural Network
Synthesis of X-ray Projections via Deep Learning
Efficient Deep Model for Monocular Road Segmentation
CS6890 Deep Learning Weizhen Cai
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Counting In High Density Crowd Videos
Counting in High-Density Crowd Videos
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
Introduction to Neural Networks
Presenter: Usman Sajid
Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong
Convolutional Neural Networks for Visual Tracking
Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Counting in Dense Crowds using Deep Learning
Object Detection + Deep Learning
CornerNet: Detecting Objects as Paired Keypoints
KFC: Keypoints, Features and Correspondences
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Outline Background Motivation Proposed Model Experimental Results
Object Tracking: Comparison of
RCNN, Fast-RCNN, Faster-RCNN
Good View Hunting: Learning Photo Composition from Dense View Pairs Zijun Wei1, Jianming Zhang2, Xiaohui Shen2, Zhe Lin2, Radomír Měch2, Minh Hoai1, Dimitris.
Comparison of EET and Rank Pooling on UCF101 (split 1)
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis Speaker: ZHAO Jian
Neural Architecture Search: Basic Approach, Acceleration and Tricks
Department of Computer Science Ben-Gurion University of the Negev
Human-object interaction
Deep Object Co-Segmentation
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
Multi-UAV to UAV Tracking
End-to-End Facial Alignment and Recognition
Eliminating Background-Bias for Robust Person Re-identification
Point Set Representation for Object Detection and Beyond
Recent Developments on Super-Resolution
Computing the Stereo Matching Cost with a Convolutional Neural Network
SDSEN: Self-Refining Deep Symmetry Enhanced Network
A-CCNN: ADAPTIVE CCNN FOR DENSITY ESTIMATION AND CROWD COUNTING
Counting in High-Density Crowd Videos
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Iterative Crowd Counting Viresh Ranjan, Hieu Le, Minh Hoai Department of Computer Science, Stony Brook University Introduction Iterative Counting CNN Results Datasets ShanghaiTech [2], UCF CC [5], World Expo [11] Evaluation metrics Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) We present a method for crowd counting via density estimation 512 people Results on ShanghaiTech Part A & Part B Ablation study on ShanghaiTech Part A Approach Part A Part B MAE RMSE Crowd CNN [9] 181.8 277.7 32.0 49.8 MCNN [2] 110.2 173.2 26.4 41.3 Switch CNN [3] 90.4 135.0 21.6 33.4 CP-CNN [4] 73.6 106.4 20.1 30.1 Semi-supervised [6] 112.0 13.7 21.4 DecideNet [7] -- 20.7 29.4 ic-CNN (1 stage) 69.8 117.3 10.4 16.7 ic-CNN (2 stages) 68.5 116.2 10.7 16.0 Approach MAE RMSE LR-CNN alone 78.5 133.2 HR-CNN alone 136.2 204.0 HR-CNN + low res prediction 75.1 129.0 HR-CNN + low res features 77.4 130.4 ic-CNN 69.8 117.3 Loss: We propose iterative counting CNN (ic-CNN), a two branch architecture for coarse-to-fine estimation of crowd density maps ic-CNN estimates a high resolution crowd density map in two stages A low resolution density map at ¼ the size of the original image is predicted first. Low resolution density map is refined, and transformed into the final high resolution crowd density map. Highlights of ic-CNN architecture: Achieve state-of-the-art performance Can be trained end-to-end Has significantly fewer parameters than previous approaches Faster training We also present a multi-stage extension of ic-CNN which refines its prediction across multiple stages Low Resolution CNN (LR-CNN) fully convolutional branch with 11 conv layers. Max-pooling layers for down-sampling feature maps Density map is ¼ in size of the original image High Resolution CNN (HR-CNN) fully convolutional branch with 9 conv layers, max-pooling layers for down-sampling Bilinear interpolation for up sampling To handle variations in crowd density, high resolution branch incorporates features from the low resolution branch Low res prediction passed as feature map to HR-CNN Separate weighted mean squared loss terms for the two branches. Comparing model complexity, training time Approach Training time # Parameters MAE MCNN [2] unknown .12 million 110.2 Switch CNN [3] 22 hrs 12 million 90.4 CP-CNN [4] 63 million 73.6 ic-CNN 10 hrs 7.9 million 69.8 Results on UCF CC dataset Approach MAE RMSE Zisserman et al [1] 493.4 487.1 Idrees et al [8] 419.5 541.6 Crowd CNN [9] 467.0 498.5 MCNN [2] 377.6 509.1 Hydra-2s [10] 333.7 425.2 Switching CNN [3] 318.1 439.2 CP-CNN [4] 295.8 320.9 Semi-supervised [7] 279.6 388.9 ic-CNN 260.9 365.5 Results on World Expo dataset Approach S1 S2 S3 S4 S5 Avg MCNN[2] 3.4 20.6 12.9 13.0 8.1 11.6 Switch CNN[3] 4.2 14.9 14.2 18.7 4.3 11.2 CP-CNN[4] 2.9 14.7 10.5 10.4 5.8 8.8 ic-CNN 17.0 12.3 9.2 4.7 10.3 References Image GT Low Res High Res [1] Lempitsky, V. & Zisserman, A. Learning to count objects in images, NIPS10 [2] Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y., Single-image crowd counting via multi-column convolutional neural network, CVPR15 [3] Sam, D. B., Surya, S., & Babu, R. V., Switching convolutional neural network for crowd counting, CVPR17 [4] Sindagi, V. A. & Patel, V. M. Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs, ICCV17 [5] Saad, A. and Shah, M., A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis, CVPR07 [6] Liu, X., van de Weijer, J., & Bagdanov, A. D. Leveraging Unlabeled Data for Crowd Counting by Learning to Rank, CVPR18 [7] Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. Decidenet: Counting varying density crowds through attention guided detection and density estimation, CVPR18 [8] Idrees, Haroon and Saleemi, Imran and Seibert, Cody and Shah, Mubarak, Multi-source multi-scale counting in extremely dense crowd images, CVPR13 [9] Zhang, C., Li, H., Wang, X., and Yang, X., Cross-scene crowd counting via deep convolutional neural networks, CVPR15 [10 Onoro-Rubio, Daniel and Lopez-Sastre, Roberto J, Towards perspective-free object counting with deep learning, ECCV16 [11] Zhang, Cong and Li, Hongsheng and Wang, Xiaogang and Yang, Xiaokang, Cross-scene crowd counting via deep convolutional neural networks, CVPR15 Multi-stage extension Multiple ic-CNN blocks Each block uses the predictions from all previous blocks Acknowledgements. This work was supported by SUNY2020 Infrastructure Transportation Security Center. The authors would like to thank Boyu Wang for participating on the discussions and experiments related to an earlier version of the proposed technique. Contemporary Crowd Counting papers at ECCV 18. [a] Idrees, Haroon and Tayyab, Muhmmad and Athrey, Kishan and Zhang, Dong and Al-Maadeed, Somaya and Rajpoot, Nasir and Shah, Mubarak, Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds [b] Cao, Xinkun and Wang, Zhipeng and Zhao, Yanyun and Su, Fei, Scale Aggregation Network for Accurate and Efficient Crowd Counting