Presenter: Usman Sajid

Slides:

Advertisements

Similar presentations

Face Alignment by Explicit Shape Regression

Advertisements

KE CHEN 1, SHAOGANG GONG 1, TAO XIANG 1, CHEN CHANGE LOY 2 1. QUEEN MARY, UNIVERSITY OF LONDON 2. THE CHINESE UNIVERSITY OF HONG KONG CUMULATIVE ATTRIBUTE.

ImageNet Classification with Deep Convolutional Neural Networks

CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.

Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.

1 On the Statistical Analysis of Dirty Pictures Julian Besag.

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

1 Blind Image Quality Assessment Based on Machine Learning 陈欣

Spatial Pyramid Pooling in Deep Convolutional

From R-CNN to Fast R-CNN

Face Detection using the Viola-Jones Method

Particle Filters.

Stable Multi-Target Tracking in Real-Time Surveillance Video

VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR

Face Alignment at 3000fps via Regressing Local Binary Features CVPR14 Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun Presented by Sung Sil Kim.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

NTU & MSRA Ming-Feng Tsai

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

PRESENTED BY KE CHEN DEPARTMENT OF SIGNAL PROCESSING TAMPERE UNIVERSITY OF TECHNOLOGY, FINLAND CUMULATIVE ATTRIBUTE SPACE FOR AGE AND CROWD DENSITY ESTIMATION.

Radboud University Medical Center, Nijmegen, Netherlands

Automatic Grading of Diabetic Retinopathy through Deep Learning

Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images

Learning to Compare Image Patches via Convolutional Neural Networks

Automatic Lung Cancer Diagnosis from CT Scans (Week 1)

Information Extraction Review of Übung 2

Guillaume-Alexandre Bilodeau

Summary of “Efficient Deep Learning for Stereo Matching”

Object Detection based on Segment Masks

Compact Bilinear Pooling

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Automatic Lung Cancer Diagnosis from CT Scans (Week 4)

Huazhong University of Science and Technology

CS6890 Deep Learning Weizhen Cai

Adversarially Tuned Scene Generation

Enhanced-alignment Measure for Binary Foreground Map Evaluation

Compact Query Term Selection Using Topically Related Text

A Convolutional Neural Network Cascade For Face Detection

By: Kevin Yu Ph.D. in Computer Engineering

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang

Bird-species Recognition Using Convolutional Neural Network

The Open World of Micro-Videos

On-going research on Object Detection *Some modification after seminar

Object Classification through Deconvolutional Neural Networks

GAUSSIAN PROCESS REGRESSION WITHIN AN ACTIVE LEARNING SCHEME

Outline Background Motivation Proposed Model Experimental Results

Tuning CNN: Tips & Tricks

John H.L. Hansen & Taufiq Al Babba Hasan

Iterative Crowd Counting

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Attention for translation

Department of Computer Science Ben-Gurion University of the Negev

Automatic Handwriting Generation

Human-object interaction

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Week 3 Volodymyr Bobyr.

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Report 2 Brandon Silva.

Self-Supervised Cross-View Action Synthesis

THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU

Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,

A-CCNN: ADAPTIVE CCNN FOR DENSITY ESTIMATION AND CROWD COUNTING

Van-Thanh Hoang May 11, 2019 Improving Object Localization with Fitness NMS and Bounded IoU Loss Lachlan Tychsen-Smith, Lars.

Do Better ImageNet Models Transfer Better?

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Deep CNN for breast cancer histology Image Analysis

Presentation transcript:

Presenter: Usman Sajid CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting (Vishwanath A. Sindagi, Vishal M. Patel, Rutgers University) Presenter: Usman Sajid

Why this Paper? Among top 3 results in 2017 (In top 5 in different datasets till date as cited by several EECV, CVPR 2018 papers) Research implementation aligns with some part of this paper (may reuse the code) Used their code (model) to prove my hypothesis Simple yet effective approach

Crowd Analysis (Counting) Many applications Political rallies, Public Ceremonies Hajj One of problems: non-uniform large variations in scale and appearance of the objects Previously, models do not focus much on different count distribution within an image

Crowd Analysis (Counting) No one model giving best results in 3 mostly use datasets (UCF_CC_50, Shanghai_Tech (A & B), UCG_QNRF (recently released))

Proposed Model novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation

Proposed Model 2 stages: High Level Prior and Density Estimator Empirical Observation till date: Better to use density maps rather than direct regression But very poor in localization

Proposed Model- High Level Prior 10 way crowd count classifier Classification or regression???

Proposed Model- Density Estimator Final stage, resulting in density map

Objective Function 2 losses: one for each stage For High Level Prior: Cross Entropy loss for the density estimation:

Training Details Very few training images available (e.g. 300 images) Create Additional Images patches of size 1/4 th the size of original image are cropped from 100 random locations Augmentation techniques like horizontal flipping and noise addition are used to create another 200 patches So total 300 patches of arbitrary sizes from each image are extracted NVIDIA GTX TITAN-X GPU using Torch framework Training for 6 hours

Evaluation Criteria 2 Criteria widely used in this particular field MAE and MSE as follows:

Datasets for Crowd Counting Number of Images Number of Annotations Average Count Maximum Count Average Resolution UCF_CC_50 50 63,974 1279 4633 2101 x 2888 ShanghaiTech_PartA 482 241,677 501 3139 589 x 868 UCF-QNRF Came Approx. 3 weeks back 1535 1,251,642 815 12865 2013 x 2902

Results- Shanghai Tech

Results- Shanghai Tech Current Best (as reported in CVPR 18): Part A, MAE: 68.2, MSE: Approx. 106.4 Part B, MAE: Approx. 10.6, MSE: Approx. 16.0

Results - UCF Current Best: MAE: 266.1, MSE: Approx. 320.9

Conclusion Proposed multi-task cascaded CNN network for jointly learning crowd count classification and density map estimation Incorporated a high level prior into the network which enables it to learn globally relevant discriminative features thereby accounting for large count variations in the dataset End-to-End trained Top 3 Best in 2017 (Still in top 5)

Thank you 