Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenter: Usman Sajid

Similar presentations


Presentation on theme: "Presenter: Usman Sajid"— Presentation transcript:

1 Presenter: Usman Sajid
CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting (Vishwanath A. Sindagi, Vishal M. Patel, Rutgers University) Presenter: Usman Sajid

2 Why this Paper? Among top 3 results in 2017 (In top 5 in different datasets till date as cited by several EECV, CVPR 2018 papers) Research implementation aligns with some part of this paper (may reuse the code) Used their code (model) to prove my hypothesis Simple yet effective approach

3 Crowd Analysis (Counting)
Many applications Political rallies, Public Ceremonies Hajj One of problems: non-uniform large variations in scale and appearance of the objects Previously, models do not focus much on different count distribution within an image

4 Crowd Analysis (Counting)
No one model giving best results in 3 mostly use datasets (UCF_CC_50, Shanghai_Tech (A & B), UCG_QNRF (recently released))

5 Proposed Model novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation

6 Proposed Model 2 stages: High Level Prior and Density Estimator
Empirical Observation till date: Better to use density maps rather than direct regression But very poor in localization

7 Proposed Model- High Level Prior
10 way crowd count classifier Classification or regression???

8 Proposed Model- Density Estimator
Final stage, resulting in density map

9 Objective Function 2 losses: one for each stage
For High Level Prior: Cross Entropy loss for the density estimation:

10 Training Details Very few training images available (e.g. 300 images)
Create Additional Images patches of size 1/4 th the size of original image are cropped from 100 random locations Augmentation techniques like horizontal flipping and noise addition are used to create another 200 patches So total 300 patches of arbitrary sizes from each image are extracted NVIDIA GTX TITAN-X GPU using Torch framework Training for 6 hours

11 Evaluation Criteria 2 Criteria widely used in this particular field
MAE and MSE as follows:

12 Datasets for Crowd Counting
Number of  Images Number of  Annotations Average Count Maximum Count Average Resolution UCF_CC_50 50 63,974 1279 4633 2101 x 2888 ShanghaiTech_PartA 482 241,677 501 3139 589 x 868 UCF-QNRF Came Approx. 3 weeks back 1535 1,251,642 815 12865 2013 x 2902

13 Results- Shanghai Tech

14 Results- Shanghai Tech
Current Best (as reported in CVPR 18): Part A, MAE: 68.2, MSE: Approx Part B, MAE: Approx. 10.6, MSE: Approx. 16.0

15 Results - UCF Current Best: MAE: 266.1, MSE: Approx

16 Conclusion Proposed multi-task cascaded CNN network for jointly learning crowd count classification and density map estimation Incorporated a high level prior into the network which enables it to learn globally relevant discriminative features thereby accounting for large count variations in the dataset End-to-End trained Top 3 Best in 2017 (Still in top 5)

17 Thank you 


Download ppt "Presenter: Usman Sajid"

Similar presentations


Ads by Google