Presenter: Usman Sajid

Slides:



Advertisements
Similar presentations
Face Alignment by Explicit Shape Regression
Advertisements

KE CHEN 1, SHAOGANG GONG 1, TAO XIANG 1, CHEN CHANGE LOY 2 1. QUEEN MARY, UNIVERSITY OF LONDON 2. THE CHINESE UNIVERSITY OF HONG KONG CUMULATIVE ATTRIBUTE.
ImageNet Classification with Deep Convolutional Neural Networks
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.
1 On the Statistical Analysis of Dirty Pictures Julian Besag.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
1 Blind Image Quality Assessment Based on Machine Learning 陈 欣
Spatial Pyramid Pooling in Deep Convolutional
From R-CNN to Fast R-CNN
Face Detection using the Viola-Jones Method
Particle Filters.
Stable Multi-Target Tracking in Real-Time Surveillance Video
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
Face Alignment at 3000fps via Regressing Local Binary Features CVPR14 Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun Presented by Sung Sil Kim.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
NTU & MSRA Ming-Feng Tsai
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
PRESENTED BY KE CHEN DEPARTMENT OF SIGNAL PROCESSING TAMPERE UNIVERSITY OF TECHNOLOGY, FINLAND CUMULATIVE ATTRIBUTE SPACE FOR AGE AND CROWD DENSITY ESTIMATION.
Radboud University Medical Center, Nijmegen, Netherlands
Automatic Grading of Diabetic Retinopathy through Deep Learning
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Learning to Compare Image Patches via Convolutional Neural Networks
Automatic Lung Cancer Diagnosis from CT Scans (Week 1)
Information Extraction Review of Übung 2
Guillaume-Alexandre Bilodeau
Summary of “Efficient Deep Learning for Stereo Matching”
Object Detection based on Segment Masks
Compact Bilinear Pooling
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Automatic Lung Cancer Diagnosis from CT Scans (Week 4)
Huazhong University of Science and Technology
CS6890 Deep Learning Weizhen Cai
Adversarially Tuned Scene Generation
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Compact Query Term Selection Using Topically Related Text
A Convolutional Neural Network Cascade For Face Detection
By: Kevin Yu Ph.D. in Computer Engineering
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
Bird-species Recognition Using Convolutional Neural Network
The Open World of Micro-Videos
On-going research on Object Detection *Some modification after seminar
Object Classification through Deconvolutional Neural Networks
GAUSSIAN PROCESS REGRESSION WITHIN AN ACTIVE LEARNING SCHEME
Outline Background Motivation Proposed Model Experimental Results
Tuning CNN: Tips & Tricks
John H.L. Hansen & Taufiq Al Babba Hasan
Iterative Crowd Counting
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
Attention for translation
Department of Computer Science Ben-Gurion University of the Negev
Automatic Handwriting Generation
Human-object interaction
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Week 3 Volodymyr Bobyr.
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Report 2 Brandon Silva.
Self-Supervised Cross-View Action Synthesis
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,
A-CCNN: ADAPTIVE CCNN FOR DENSITY ESTIMATION AND CROWD COUNTING
Van-Thanh Hoang May 11, 2019 Improving Object Localization with Fitness NMS and Bounded IoU Loss Lachlan Tychsen-Smith, Lars.
Do Better ImageNet Models Transfer Better?
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Deep CNN for breast cancer histology Image Analysis
Presentation transcript:

Presenter: Usman Sajid CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting (Vishwanath A. Sindagi, Vishal M. Patel, Rutgers University) Presenter: Usman Sajid

Why this Paper? Among top 3 results in 2017 (In top 5 in different datasets till date as cited by several EECV, CVPR 2018 papers) Research implementation aligns with some part of this paper (may reuse the code) Used their code (model) to prove my hypothesis Simple yet effective approach

Crowd Analysis (Counting) Many applications Political rallies, Public Ceremonies Hajj One of problems: non-uniform large variations in scale and appearance of the objects Previously, models do not focus much on different count distribution within an image

Crowd Analysis (Counting) No one model giving best results in 3 mostly use datasets (UCF_CC_50, Shanghai_Tech (A & B), UCG_QNRF (recently released))

Proposed Model novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation

Proposed Model 2 stages: High Level Prior and Density Estimator Empirical Observation till date: Better to use density maps rather than direct regression But very poor in localization

Proposed Model- High Level Prior 10 way crowd count classifier Classification or regression???

Proposed Model- Density Estimator Final stage, resulting in density map

Objective Function 2 losses: one for each stage For High Level Prior: Cross Entropy loss for the density estimation:

Training Details Very few training images available (e.g. 300 images) Create Additional Images patches of size 1/4 th the size of original image are cropped from 100 random locations Augmentation techniques like horizontal flipping and noise addition are used to create another 200 patches So total 300 patches of arbitrary sizes from each image are extracted NVIDIA GTX TITAN-X GPU using Torch framework Training for 6 hours

Evaluation Criteria 2 Criteria widely used in this particular field MAE and MSE as follows:

Datasets for Crowd Counting Number of  Images Number of  Annotations Average Count Maximum Count Average Resolution UCF_CC_50 50 63,974 1279 4633 2101 x 2888 ShanghaiTech_PartA 482 241,677 501 3139 589 x 868 UCF-QNRF Came Approx. 3 weeks back 1535 1,251,642 815 12865 2013 x 2902

Results- Shanghai Tech

Results- Shanghai Tech Current Best (as reported in CVPR 18): Part A, MAE: 68.2, MSE: Approx. 106.4 Part B, MAE: Approx. 10.6, MSE: Approx. 16.0

Results - UCF Current Best: MAE: 266.1, MSE: Approx. 320.9

Conclusion Proposed multi-task cascaded CNN network for jointly learning crowd count classification and density map estimation Incorporated a high level prior into the network which enables it to learn globally relevant discriminative features thereby accounting for large count variations in the dataset End-to-End trained Top 3 Best in 2017 (Still in top 5)

Thank you 