Wenchi MA CV Group EECS,KU 03/20/2017

Slides:

Advertisements

Similar presentations

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Advertisements

Deep Convolutional Nets

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Spatial Localization and Detection

Deep Residual Learning for Image Recognition

Gaussian Conditional Random Field Network for Semantic Segmentation

Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

Introduction to Machine Learning, its potential usage in network area,

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Big data classification using neural network

Deep Residual Learning for Image Recognition

The Relationship between Deep Learning and Brain Function

Object Detection based on Segment Masks

Deep Learning Amin Sobhani.

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Jure Zbontar, Yann LeCun

A Pool of Deep Models for Event Recognition

Regularizing Face Verification Nets To Discrete-Valued Pain Regression

Ajita Rattani and Reza Derakhshani,

Inception and Residual Architecture in Deep Convolutional Networks

UZAKTAN ALGIILAMA UYGULAMALARI Segmentasyon Algoritmaları

Deep learning and applications to Natural language processing

Efficient Deep Model for Monocular Road Segmentation

Deep Residual Learning for Image Recognition

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Above and below the object level

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

By: Kevin Yu Ph.D. in Computer Engineering

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Bird-species Recognition Using Convolutional Neural Network

Introduction to Neural Networks

Image Classification.

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

Deep Learning Tutorial

Incremental Training of Deep Convolutional Neural Networks

Object Detection + Deep Learning

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Pose Estimation for non-cooperative Spacecraft Rendevous using CNN

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

Creating Data Representations

Declarative Transfer Learning from Deep CNNs at Scale

On Convolutional Neural Network

Use 3D Convolutional Neural Network to Inspect Solder Ball Defects

Outline Background Motivation Proposed Model Experimental Results

Tuning CNN: Tips & Tricks

John H.L. Hansen & Taufiq Al Babba Hasan

Inception-v4, Inception-ResNet and the Impact of

Heterogeneous convolutional neural networks for visual recognition

Course Recap and What’s Next?

Model Compression Joseph E. Gonzalez

Human-object interaction

Visual Manipulation Relationship Network for Autonomous Robotics

Motivation State-of-the-art two-stage instance segmentation methods depend heavily on feature localization to produce masks.

Learning and Memorization

Object Detection Implementations

End-to-End Facial Alignment and Recognition

CRCV REU 2019 Kara Schatz.

Week 3 Volodymyr Bobyr.

Real-time Object Recognition using deep learning-Raspberry Pi

YOLO-based Object Detection on ARM Mali GPU

Adrian E. Gonzalez , David Parra Department of Computer Science

CRCV REU 2019 Aaron Honculada.

SDSEN: Self-Refining Deep Symmetry Enhanced Network

ICLR, 2019 Jiahe Li

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.

Do Better ImageNet Models Transfer Better?

Convolutional neural network based Alzheimer’s disease classification from magnetic resonance brain images RachnaJain NikitaJainaAkshayAggarwal D.

Presentation transcript:

Wenchi MA CV Group EECS,KU 03/20/2017 Rethinking architectures of DCNN and object detection in scene recognition Wenchi MA CV Group EECS,KU 03/20/2017

Current work and related consideration Task: Object Detection Algorithm: You Only Look Once(YOLO) Architecture: Googlenet based Parameters:　>=97M (relatively small) Techniques: Inception V3 (construction series) Efficient Grid Size Reduction (channels in parallel) Feature fusion by multi-resolution feature maps Problems: relatively high training loss and non-ideal mAP 3*3 Thinking: Structure of the model impacts the detection accuracy so much (Reasonable loss? Small gap between training loss and test loss? Is the model too large?(easy to overfit and make the model out of control) )! Keep searching for the balance between accuracy and model size. Standard for construction? Relationship between objects and scenes. Dose scene classification benefits objective detection?　Merge scene information in object detection.

Wide–Residual-Inception Networks for Real-time Object Detection Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection Scale down the size of the model furtherly Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Wide–Residual-Inception Networks for Real-time Object Detection Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection inception Resnet Feature extractor Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Wide–Residual-Inception Networks for Real-time Object Detection Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection SSD WR-Inception network Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Wide–Residual-Inception Networks for Real-time Object Detection Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection Model Car Pedestrian Cyclist mAP mAR AP AR VGG-16 74 75 50 56 52 71 58 69 ResNet-101 76.04 74.82 47.74 56.07 53.61 75.26 58.9 70.06 WR-Inception 77.2 76.18 52.51 63.01 54.63 76.17 61.18 73.51 WR-Inception-12 78.24 80.24 51.08 64.29 59.28 63.03 75.14 Dataset: KITTI: obtained through stereo cameras and lidar scannerss in urban, rural, and highway driving environments, and has 10 categories in total, which are small cars, vans, trucks pedestrains, sitting people, cyclists, traims, miscellaneous, and “do not care” Wide-Residual-Inception Networks for Real-time Object Detecciton——Youngwan Lee[2017]

Wide–Residual-Inception Networks for Real-time Object Detection Computer Vision Laboratory, Inha University Wide–Residual-Inception Networks for Real-time Object Detection Contribution: Propose the model that requires less memory and fewer computations but shows better performance Ensure the real-time performance of object detector Query: KITTI is still a relatively small dataset. And its categories are limited. The performance of this model should be tested on more common and large dataset like Imagenet.

Proper model for a specific dataset Cambridge Proper model for a specific dataset The somewhat unanswered question in deep learning: Is the selected CNN optimal for the dataset in terms of accuracy and model size? There needs some certain standard, but base what? Given a pre-trained CNN for a specific dataset, refine the architecture in order to potentially increase the accuracy while possibly reducing the model size. Standard: the feature extraction ability of a CNN for a specific dataset. Intuition: separation enhancement To best separate the classes of a dataset, assuming a constant depth of the network. Refining Architectures of Deep Convolutional Neural Networks——Machine Intelligence Lab, University of Cambridge , UK and Microsoft Research Cambridge, UK[CVPR 2016]

Separation enhancement and deterioration capacity of a layer Cambridge Correlation Matrices for 8 Convolutional Layers of VGG-11 trained on SAD and CAMIT-NSAD Dark blue: minimum correlation between classes Bright yellow: maximum correlation Correlation Matrics give an indication of the separation between classes for a given convolutional layer. Top Row(SAD):The lower layers can separate the class better as compared to deeper layers Bottom Row(CAMIT-NSAD):The classes are separated lesser in lower layers and more prominently in deeper layers

Separation enhancement and deterioration capacity of a layer Cambridge Comparing Cl and Cl+1, which class pairs, the separation increased and which deteriorated The number of class pairs where the separation increased compared between layer l and l-1 The number of class pairs where the separation decreased compared between layer l and l-1 Finding the inner-class separation Separation situation varies through layers for different dataset

Separation enhancement and deterioration capacity of a layer Cambridge t: 22084 V: 3056 T: 5618 t: 22084 V: 3056 T: 5618 DR=Deep Refined Architecture (proposed approach) DR-1=Deep Refined Architecture with only the Stretch network DR-2=Deep Refined Architecture with only the Symmetric Split Sp-1=L1 Sparsified network Sp-2=L2 Sparsified network

Separation enhancement and deterioration capacity of a layer Cambridge Contribution: Provide quantified refining network architecture Realize the balance between precision and model size Query: SAD and CAMIT-NSAD are relatively small dataset and they are only scene data. What about big object dataset like ImageNet? Generalization problem has been avoided in this paper. When we do not know the source of the test data, how to transfer transfer learning the model and refine the better model?

MIT: Object Detectors Emerge in Deep Scene CNNs The same network can do both object localization and scene recognition in a single forward-pass The deep features from Places-CNN tend to perform better on scene-related recognition tasks compared to the features from ImageNet-CNN Scene recognition and classification Published as a conference paper at ICLR 2015 Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences(CAS) Mix scene data and object data together in the training process CVPR 2016

What needs to be taken into consideration? Cambridge What needs to be taken into consideration? Scale down the size of our model and make it more easy to controlled while improving feature extraction ability. How to carry out training with both object dataset and scene dataset with one single feature detector and how to merge the abstracted scene information with the features of objects?

Thank you!