High-level Cues for Predicting Motivations

Slides:



Advertisements
Similar presentations
3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes.
Advertisements

Patch to the Future: Unsupervised Visual Prediction
Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct
Spatially Constrained Segmentation of Dermoscopy Images
Deep Learning and Neural Nets Spring 2015
Large-Scale Object Recognition with Weak Supervision
Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.
Presented by Relja Arandjelović Iterative Quantization: A Procrustean Approach to Learning Binary Codes University of Oxford 21 st September 2011 Yunchao.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Student: Hsu-Yung Cheng Advisor: Jenq-Neng Hwang, Professor
Spatial Pyramid Pooling in Deep Convolutional
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.
Yu-Gang Jiang, Yanran Wang, Rui Feng Xiangyang Xue, Yingbin Zheng, Hanfang Yang Understanding and Predicting Interestingness of Videos Fudan University,
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Organizing Heterogeneous Scene Collections through Contextual Focal Points Kai Xu, Rui Ma, Hao Zhang, Chenyang Zhu, Ariel Shamir, Daniel Cohen-Or, Hui.
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
Hierarchical Matching with Side Information for Image Classification
An Effective & Interactive Approach to Particle Tracking for DNA Melting Curve Analysis 李穎忠 DEPARTMENT OF COMPUTER SCIENCE & INFORMATION ENGINEERING NATIONAL.
Recognition Using Visual Phrases
Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Unsupervised Sparse Vector Densification for Short Text Similarity
Media Lab, Leiden Institute of Advance Computer Science
Conditional Generative Adversarial Networks
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Deeply-Recursive Convolutional Network for Image Super-Resolution
Recent developments in object detection
What Convnets Make for Image Captioning?
Object Detection based on Segment Masks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Predictive Model for Autonomous Driving
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Learning Mid-Level Features For Recognition
On Dataless Hierarchical Text Classification
References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Face Recognition and Feature Subspaces
Change is Life Baby Bottle Campaign for Moms & Babies
Compositional Human Pose Regression
Part-Based Room Categorization for Household Service Robots
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
Saliency detection Donghun Yeo CV Lab..
Vector-Space (Distributional) Lexical Semantics
mengye ren, ryan kiros, richard s. zemel
Above and below the object level
Normalized Cut Loss for Weakly-supervised CNN Segmentation
Attention-based Caption Description Mun Jonghwan.
CS 1674: Intro to Computer Vision Scene Recognition
Rob Fergus Computer Vision
Change is Life Baby Bottle Campaign for Moms & Babies
Using Natural Language Processing to Aid Computer Vision
Learning Object Context for Dense Captioning
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
Abnormally Detection
Dynamic Neural Networks Joseph E. Gonzalez
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Human-object interaction
Presented by: Anurag Paul
低比特卷积神经网络的量化研究介绍 主讲人:朱锋.
Embedding based entity summarization
Presented By: Harshul Gupta
ICCV 2019.
CVPR 2019 Poster.
Do Better ImageNet Models Transfer Better?
Presentation transcript:

High-level Cues for Predicting Motivations Arun Mallya and Svetlana Lazebnik University of Illinois at Urbana-Champaign Action Predictions (600-d) from Fusion net (Mallya and Lazebnik, ECCV 2016) trained on HICO. Scene Predictions (365-d) from VGG-16 trained on Places (Zhou et al, NIPS 2014). Image Embedding of size (600+365) v/s 8192 in prior work. Sentence Embeddings (4800-d) using Skip-Thought Vectors (Kiros et al, NIPS 2015). 3 nCCA models for retrieval with 300-d embeddings, one for each prediction type. Vondrick et al, CVPR 2016 Train/Val/Test set of size 6133/1532/2526

Intermediate Predictions and Results read-laptop type-on-laptop hold-laptop computer-room home-office physics-lab watch-giraffe feed-giraffe pet-giraffe stable rodeo-arena barndoor wear-skis stand-on-skis ride-skis ski-slope igloo ski-resort Ground Truth looking at computer screen home office do work Prediction typing on a computer meeting work Ground Truth bottle zoo feeding a baby giraffe Prediction feeding a giraffe care for giraffe Ground Truth skiing ski resort get down the hill Prediction resort have fun

Sentences used as-is — #Sentences: 2526 Numbers and Takeaways Method Motivation Action Scene Median Rank Recall @1 Recall @10 Sentences replaced with cluster centers — #Clusters (Motivation, Action, Scene): (256, 100, 100) SSVM - Image + Person fc7 + Language Model [1] 28 — 14 3 CCA - ImageNet VGG fc7 19 9.0 38.8 11 14.5 49.0 35.2 72.2 CCA - Action & Scene cues 12.0 44.9 8 18.6 56.3 33.4 73.8 Sentences used as-is — #Sentences: 2526 171 0.9 7.7 187 1.3 9.5 116 1.1 7.9 130 1.4 117 1.8 13.2 113 1.0 8.8 High-level cues provide a compact image representation and are more informative than generic VGG features. High-level cues increase model interpretability and allow us to diagnose and understand errors. In spite of advances, there is scope for improving action and scene predictions. CCA gives good generalization compared to complex methods on small datasets.