Look-ahead before you leap

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)
Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger.
3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Limin Wang, Yu Qiao, and Xiaoou Tang
From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning Jeff Michels Ashutosh Saxena Andrew Y. Ng Stanford University ICML 2005.
Computer Vision. Computer vision is concerned with the theory and technology for building artificial Computer vision is concerned with the theory and.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Quick Overview of Robotics and Computer Vision. Computer Vision Agent Environment camera Light ?
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Human abilities Presented By Mahmoud Awadallah 1.
1 Computational Vision CSCI 363, Fall 2012 Lecture 31 Heading Models.
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Hybrid-Structure Robot Design From the authors of Chang Gung University and Metal Industries R&D Center, Taiwan.
AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.
HIGH PERFORMANCE OBJECT DETECTION BY COLLABORATIVE LEARNING OF JOINT RANKING OF GRANULES FEATURES Chang Huang and Ram Nevatia University of Southern California,
Local features and image matching October 1 st 2015 Devi Parikh Virginia Tech Disclaimer: Many slides have been borrowed from Kristen Grauman, who may.
Week8 Fatemeh Yazdiananari.  Fixed the issues with classifiers  We retrained SVMs with the new UCF101 histograms  On temporally untrimmed videos: ◦
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
Unsupervised Visual Representation Learning by Context Prediction
Collaborative Grasp Planning with Multiple Object Representations Peter Brook Matei Ciocarlie Kaijen Hsiao.
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Announcements Paper presentation Project meet with me ASAP
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Perceptual real-time 2D-to-3D conversion using cue fusion
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Summary of “Efficient Deep Learning for Stereo Matching”
Object Detection based on Segment Masks
Deep Learning Amin Sobhani.
Object detection with deformable part-based models
Fitting: Voting and the Hough Transform
Deep Predictive Model for Autonomous Driving
Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Jun Ye Kai Li Guo-Jun Qi Kien.
Recognizing Deformable Shapes
Supervised Time Series Pattern Discovery through Local Importance
Action Recognition in the Presence of One
Compositional Human Pose Regression
Ch. 29: Predetermined Time Systems
Intelligent Information System Lab
Week 6 Cecilia La Place.
VQA: Visual Question Answering
Object detection as supervised classification
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
Lecture 25: Introduction to Recognition
Bird-species Recognition Using Convolutional Neural Network
Attributes and Simile Classifiers for Face Verification
Face Recognition with Deep Learning Method
CS 1674: Intro to Computer Vision Scene Recognition
The Open World of Micro-Videos
KFC: Keypoints, Features and Correspondences
Local features and image matching
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:
Unsupervised Perceptual Rewards For Imitation Learning
Human-object interaction
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Presented By: Harshul Gupta
Deep Structured Scene Parsing by Learning with Image Descriptions
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Week 6: Moving Target Detection Using Infrared Sensors
ML Approach to Approximating Ambient Light Exposure
Presentation transcript:

Look-ahead before you leap Dinesh Jayaraman, Kristen Grauman

Overview Problem Methods Experiments Conclusion

Problem: Active Recognition Hard to understand in one view. Acquire new views. “Active Recognition” Recognition system can select which views to see.

Components of Active Recognition Per-view Recognition Control Evidence Fusion Control (change view) Per-view recognition Evidence Fusion

Related Work Independent pipelines Certain heuristics Predict next-best move. Some predefined categories. Train for 1 view recognition.

Applications Instance recognition from turn-table scans Slide Credit: Dinesh Jayaraman Instance recognition from turn-table scans Toy category recognition with custom robot

Overview Problem Methods Experiments Conclusion

Multi-task training of active recognition components + look-ahead. Method Overview Perception Action selection Evidence fusion JOINT TRAINING Look-ahead FORECASTING THE EFFECTS OF ACTIONS Multi-task training of active recognition components + look-ahead. Slide Credit: Dinesh Jayaraman

System Architecture Actor Sensor Aggregator Look- ahead Classifier (RCNN)

Unrolled Architecture Actor Time t Sensor m(t) p(t) a(t) s1,s2,s3…, s(t) a(t) = Aggregator s(t) Actor Sensor Aggregator Time t+1 a(t+1) Time T Classifier m(t+1) LOOKAHEAD a(t+1) LOOKAHEAD Trained through combination of gradient descent and REINFORCE

Overview Problem Methods Experiments Conclusion

Accurate Scene Categorization Dataset: Sun360 8992 panoramic images Indoor and outdoor scenes Goal: predict scene Baselines: single view, randow views (average), random views(recurrent), transinformation, seqDP, transinformation + SeqDP

Look-Ahead active RNN outperforms all baseline Normal vs Active RNN Look-Ahead active RNN outperforms all baseline

Scene Categorization Result Ground Truth: plaza courtyard

Ground Truth: Church

Object Categorization Dataset: GERMS 6 videos 136 objects rotated around different fixed axis Goal: predict instance Baselines: single view, randow views (average), random views(recurrent), transinformation, seqDP, transinformation + SeqDP

Training all 3 components jointly is most critical to performance Normal vs Active RNN Training all 3 components jointly is most critical to performance

GERMS have less accuracy as compared to SUN360

Why? GERMS dataset is small as compared to SUN360. Single degree of motion as compared to two in SUN360.

Object Categorization Result Predicted Label: T=1 Slide Credit: Dinesh Jayaraman

Future Work Need more large 3-D and video benchmark dataset. Extension to other real word problems.

Failure Cases Predict street at T = 2 but because of T=3 view, it changes it prediction. Though street score is still high but give more preference to plaza courtyard.

Failure Cases Field having different flowers but it recognize it as park.

Conclusion Problem Methods Experiments Active recognition RNN with look-ahead module End to end joint training with gradient descent and REINFORCE. Experiments Outperformed all baselines on scene categorization and object detection. Need more large 3-D and video benchmark dataset.

Thank You