Kuan-Chuan Peng Tsuhan Chen

Slides:



Advertisements
Similar presentations
Classification spotlights
Advertisements

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.
ImageNet Classification with Deep Convolutional Neural Networks
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
1 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George.
Recent Developments in Deep Learning Quoc V. Le Stanford University and Google.
Large-Scale Object Recognition with Weak Supervision
Presented by Zeehasham Rasheed
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.
Methods in Leading Face Verification Algorithms
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Spatial Pyramid Pooling in Deep Convolutional
MACHINE LEARNING AND ARTIFICIAL NEURAL NETWORKS FOR FACE VERIFICATION
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Video Tracking Using Learned Hierarchical Features
Detection, Segmentation and Fine-grained Localization
Object detection, deep learning, and R-CNNs
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.
Deep Convolutional Nets
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Image Captioning Approaches
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Philipp Gysel ECE Department University of California, Davis
Spatial Localization and Detection
Introduction to Convolutional Neural Networks
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Understanding and Predicting Image Memorability at a Large Scale A. Khosla, A. S. Raju, A. Torralba and A. Oliva International Conference on Computer Vision.
Classify A to Z Problem Statement Technical Approach Results Dataset
Recent developments in object detection
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Convolutional Neural Network
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
The Relationship between Deep Learning and Brain Function
From Vision to Grasping: Adapting Visual Networks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Saliency-guided Video Classification via Adaptively weighted learning
A Pool of Deep Models for Event Recognition
Lecture 24: Convolutional neural networks
Combining CNN with RNN for scene labeling (segmentation)
Part-Based Room Categorization for Household Service Robots
Bird-species Recognition Using Convolutional Neural Network
Introduction to Neural Networks
Image Classification.
Toward improved document classification and retrieval
Convolutional Neural Networks for Visual Tracking
Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision
Object Detection + Deep Learning
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Visualizing and Understanding Convolutional Networks
RCNN, Fast-RCNN, Faster-RCNN
Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
Ladislav Rampasek, Anna Goldenberg  Cell 
Heterogeneous convolutional neural networks for visual recognition
Dynamic Neural Networks Joseph E. Gonzalez
Human-object interaction
Image Processing and Multi-domain Translation
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Learning and Memorization
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Deep learning: Recurrent Neural Networks CV192
ICLR, 2019 Jiahe Li
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Presentation transcript:

Kuan-Chuan Peng Tsuhan Chen A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen

Introduction Breakthrough progress in object classification. cat dog lion tiger O. Russakovsky et al. ImageNet large scale visual recognition challenge. arXiv:1409.0575, 2014. N. Murray et al. AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR12.

Introduction Humans are interested in more than objects. For example, aesthetic quality. N. Murray et al. AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR12.

How do machines describe images? Examples by state-of-art algorithm: “man in black shirt is playing guitar.” “woman is holding bunch of bananas.” A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15. http://cs.stanford.edu/people/karpathy/deepimagesent/

How do machines describe images? Examples by state-of-art algorithm: “man in black shirt is playing guitar.” “woman is holding bunch of bananas.” A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15. http://cs.stanford.edu/people/karpathy/deepimagesent/

How do machines describe images? Examples by state-of-art algorithm: “man in black shirt is playing guitar.” “woman is holding bunch of bananas.” A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15. http://cs.stanford.edu/people/karpathy/deepimagesent/

How do machines describe images? Examples by state-of-art algorithm: “man in black shirt is playing guitar.” “woman is holding bunch of bananas.” A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15. http://cs.stanford.edu/people/karpathy/deepimagesent/

How do experts describe images? Examples by the Pulitzer Prize winners: “At bath times, Danielle appears serene. But no one know what lies beyond those eyes.” (by Lane DeGregory) “The surgery has dragged on for hours with little progress, and Mulliken, taking a breather next to an array of Sam's CAT scans, is feeling the frustration and exhaustion.” (by Tom Hallman Jr.) http://www.pulitzer.org/archives/8417 http://www.pulitzer.org/archives/6451

How do experts describe images? Images convey more than objects. “At bath times, Danielle appears serene. But no one know what lies beyond those eyes.” (by Lane DeGregory) “The surgery has dragged on for hours with little progress, and Mulliken, taking a breather next to an array of Sam's CAT scans, is feeling the frustration and exhaustion.” (by Tom Hallman Jr.) http://www.pulitzer.org/archives/8417 http://www.pulitzer.org/archives/6451

Beyond Objects Abstract attributes matter. Attributes relating to or involving general ideas or qualities rather than specific people, objects, or actions. [Merriam-Webster dictionary] Bridge the gap between machines and humans: Teach machines to solve abstract tasks (tasks involving abstract attributes). http://www.merriam-webster.com/dictionary/abstract

Goal A general framework to achieve better performance in abstract tasks. Multi-scale features by using convolutional neural networks (CNN).

Why CNN? speech recognition object classification video classification O. Russakovsky et al. ImageNet large scale visual recognition challenge. arXiv:1409.0575, 2014. L. Deng et al. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. ICASSP13. A. Karpathy et al. Large-scale video classification with convolutional neural networks. CVPR14.

Existing Abstract Tasks More and more abstract tasks are proposed.

Artistic Style & Artist Style Classification [F. S. Khan et al. MVA14.] Architectural Style Classification [Z. Xu et al. ECCV14.]

Emotion Classification [J. Machajdik et al. ACMMM10.] amusement anger awe contentment disgust excitement fear sad Emotion Classification [J. Machajdik et al. ACMMM10.] Aesthetic Classification [N. Murray et al. CVPR12.] high aesthetic quality low aesthetic quality

Fashion Style Classification [M. H. Kiapour et al. ECCV14.] Bohemian Hipster Fashion Style Classification [M. H. Kiapour et al. ECCV14.] Memorability Prediction [P. Isola et al. CVPR11.] Interestingness Prediction [M. Gygli et al. ICCV13.]

Inspiration It is tricky to describe abstract attributes as objects. Not easy to “locate” abstract attributes. What if abstract attributes prevail everywhere? Label-inheritable (LI) property. contentment [J. Machajdik et al. ACMMM10.] ?

Label-Inheritable (LI) Property Dataset Painting-91 [1] arcDataset [2] Caltech-101 [3] Task Artist style classification Architectural style classification Object classification Label Picasso Baroque Architecture Faces Label-inheritable Yes Partial Mostly No [1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14. [2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14. [3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.

Label-Inheritable (LI) Property Dataset Painting-91 [1] arcDataset [2] Caltech-101 [3] Task Artist style classification Architectural style classification Object classification Label Picasso Baroque Architecture Faces Label-inheritable Yes Partial Mostly No [1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14. [2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14. [3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.

Label-Inheritable (LI) Property Dataset Painting-91 [1] arcDataset [2] Caltech-101 [3] Task Artist style classification Architectural style classification Object classification Label Picasso Baroque Architecture Faces Label-inheritable Yes Partial Mostly No [1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14. [2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14. [3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.

Multi-Scale CNN Assume LI property holds for each image and the associated label. A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.

AlexNet The number of nodes in output layer is changed to be the number of classes in each task. A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.

Experimental Results classification accuracy (%) Method \ Task Artist style classification Artistic style classification Caltech-101 object classification (15 / 30 training examples per class) Architectural style classification (10 / 25 classes) Previous work (baseline) 53.10 [1] 62.20 [1] 83.80 / 86.50 [2] 69.17 / 46.21 [3] Single-scale CNN 55.15 67.37 83.45 / 88.19 70.64 / 54.84 2-scale CNN (ours) 58.11 69.67 80.19 / 87.58 74.82 / 58.89 3-scale CNN 57.91 70.96 N/A 75.32 / 59.13 Label-inheritable Yes Mostly No Partial [1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14. [2] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV14. [3] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.

Is it because of more training data? What if we train one CNN with images in different scales? A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.

Additional Results classification accuracy (%) Method \ Task Artist style classification Artistic style classification Caltech-101 object classification (15 / 30 training examples per class) Architectural style classification (10 / 25 classes) Previous work (baseline) 53.10 [1] 62.20 [1] 83.80 / 86.50 [2] 69.17 / 46.21 [3] Single-scale CNN 55.15 67.37 83.45 / 88.19 70.64 / 54.84 2-scale CNN (ours) 58.11 69.67 80.19 / 87.58 74.82 / 58.89 1 CNN + 2-scale images 46.86 61.95 N / A 67.93 / 49.06 Label-inheritable Yes Mostly No Partial [1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14. [2] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV14. [3] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.

Conclusion We proposed Multi-Scale Convolutional Neural Networks (MSCNN) based on Label-Inheritable (LI) property. Multi-scale features. MSCNN can outperform the state-of-art performance on datasets where LI property holds or even partially holds.

Towards Solving Abstract Tasks More CNN features to achieve better performance in abstract tasks. Multi-scale features (ICME15). Multi-depth features (ICIP15). Multi-task features (submitted to ICCV15). K.-C. Peng and T. Chen. A Framework of extracting multi-scale features using multiple convolutional neural networks. ICME15. K.-C. Peng and T. Chen. Cross-layer features in convolutional neural networks for generic classification tasks. ICIP15. K.-C. Peng and T. Chen. Toward correlating and solving abstract tasks using convolutional neural networks. Submitted to ICCV15.

Q & A