Capturing Human Insight for Visual Learning Kristen Grauman Department of Computer Science University of Texas at Austin Work with Sudheendra Vijayanarasimhan,

Slides:



Advertisements
Similar presentations
Attributes for Classifier Feedback Amar Parkash and Devi Parikh.
Advertisements

Attributes for deeper supervision in learning: Two examples Annotator rationales for visual recognition Active learning with objects and attributes Slide.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
C ONSTRAINED S EMI -S UPERVISED L EARNING USING A TTRIBUTES AND C OMPARATIVE A TTRIBUTES Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta The Robotics.
Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection CVPR2013 POSTER.
Relative Attributes Presenter: Shuai Zheng (Kyle) Supervised by Philip H.S. Torr Author: Devi Parikh (TTI-Chicago) and Kristen Grauman (UT-Austin)
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
SVM Active Learning with Application to Image Retrieval
Spatial Pyramid Pooling in Deep Convolutional
Hierarchical Subquery Evaluation for Active Learning on a Graph Oisin Mac Aodha, Neill Campbell, Jan Kautz, Gabriel Brostow CVPR 2014 University College.
Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Semantic Kernel Forests from Multiple Taxonomies Sung Ju Hwang (University of Texas at Austin), Fei Sha (University of Southern California), and Kristen.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin CVPR 2010.
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.
Kuan-Chuan Peng Tsuhan Chen
Human abilities Presented By Mahmoud Awadallah 1.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin.
Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.
Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
WhittleSearch: Image Search with Relative Attribute Feedback CVPR 2012 Adriana Kovashka Devi Parikh Kristen Grauman University of Texas at Austin Toyota.
Recognition Using Visual Phrases
Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas.
CS 1699: Intro to Computer Vision Active Learning Prof. Adriana Kovashka University of Pittsburgh November 24, 2015.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.
Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.
Interactively Discovery of Attributes Vocabulary Devi Parikh and Kristen Grauman.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
IEEE 2015 Conference on Computer Vision and Pattern Recognition Active Learning for Structured Probabilistic Models with Histogram Approximation Qing SunAnkit.
Richer Human-Machine Communication in Attributes-based Visual Recognition Devi Parikh TTIC.
C ONSTRAINED S EMI -S UPERVISED L EARNING USING A TTRIBUTES AND C OMPARATIVE A TTRIBUTES Presenter : Ankit Laddha Most of the slides are borrowed from.
CS 2750: Machine Learning Active Learning and Crowdsourcing
Cross-modal Hashing Through Ranking Subspace Learning
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Compact Bilinear Pooling
Data Driven Attributes for Action Detection
Part-Based Room Categorization for Household Service Robots
Introductory Seminar on Research: Fall 2017
Object detection as supervised classification
Thesis Advisor : Prof C.V. Jawahar
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Accounting for the relative importance of objects in image retrieval
Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Bo Zhang
Object-Graphs for Context-Aware Category Discovery
CS 1674: Intro to Computer Vision Scene Recognition
CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes
Rob Fergus Computer Vision
Classifier-Based Approximate Policy Iteration
Presentation transcript:

Capturing Human Insight for Visual Learning Kristen Grauman Department of Computer Science University of Texas at Austin Work with Sudheendra Vijayanarasimhan, Adriana Kovashka, Devi Parikh, Prateek Jain, Sung Ju Hwang, and Jeff Donahue Frontiers in Computer Vision Workshop, MIT August 22, 2011

Problem: how to capture human insight about the visual world? The complex space of visual objects, activities, and scenes. [tiny image montage by Torralba et al.] Annotator Point+label “mold” restrictive Human effort expensive

Problem: how to capture human insight about the visual world? The complex space of visual objects, activities, and scenes. [tiny image montage by Torralba et al.] Annotator Our approach: Listen: Explanations, Comparisons, Implied cues,… Ask: Actively learn

Deepening human communication to the system What is this? ? What property is changing here? What’s worth mentioning? Do you find him attractive? Why? How do you know? Is it ‘furry’? ? Which is more ‘open’? < ? [Donahue & Grauman ICCV 2011; Hwang & Grauman BMVC 2010; Parikh & Grauman ICCV 2011, CVPR 2011; Kovashka et al. ICCV 2011]

Soliciting rationales We propose to ask the annotator not just what, but also why. Is the team winning? Is her form perfect? Is it a safe route? How can you tell?

Soliciting rationales Annotation task: Is her form perfect? How can you tell? [Donahue & Grauman, ICCV 2011] pointed toes balanced falling knee angled falling pointed toes knee angled balanced pointed toes knee angled balanced Synthetic contrast example Spatial rationale Attribute rationale Influence on classifier Good form Bad form Spatial rationale Attribute rationale [Zaidan et al. HLT 2007]

Rationale results Scene Categories: How can you tell the scene category? Hot or Not: What makes them hot (or not)? Public Figures: What attributes make them (un)attractive? Collect rationales from hundreds of MTurk workers. [Donahue & Grauman, ICCV 2011]

Rationale results PubFigOriginals+Rationales Male64.60%68.14% Female51.74%55.65% Hot or NotOriginals+Rationales Male54.86%60.01% Female55.99%57.07% ScenesOriginals+Rationales Kitchen Living Rm Inside City Coast Highway Bedroom Street Country Mountain Office Tall Building Store Forest [Donahue & Grauman, ICCV 2011] Mean AP

Issue: presence of objects != significance Our idea: Learn cross-modal representation that accounts for “what to mention” Learning what to mention Textual: Frequency Relative order Mutual proximity Visual: Texture Scene Color… TAGS: Cow Birds Architecture Water Sky Training: human-given descriptions Birds Architecture Water Cow Sky Tiles

Importance-aware semantic space View yView x [Hwang & Grauman, BMVC 2010] Learning what to mention

[Hwang & Grauman, BMVC 2010] Our method Words + Visual Visual only Query Image Learning what to mention: results

Problem: how to capture human insight about the visual world? The complex space of visual objects, activities, and scenes. [tiny image montage by Torralba et al.] Annotator Our approach: Listen: Explanations, Comparisons, Implied cues Ask: Actively learn

Traditional active learning Unlabeled data Labeled data Current Model Active Selection Annotator At each cycle, obtain label for the most informative or uncertain example. [Mackay 1992, Freund et al. 1997, Tong & Koller 2001, Lindenbaum et al. 2004, Kapoor et al. 2007,…] ?

$ $ $ $ $ Unlabeled data Labeled data Current Model $ Active Selection Annotator Annotation tasks vary in cost and info Multiple annotators working parallel Massive unlabeled pools of data ? [Vijayanarasimhan & Grauman NIPS 2008, CVPR 2009, Vijayanarasimhan et al. CVPR 2010, CVPR 2011, Kovashka et al. ICCV 2011] Challenges in active visual learning

Current classifier Unlabeled data Sub-linear time active selection [Jain, Vijayanarasimhan, Grauman, NIPS 2010] 110 Hash table We propose a novel hashing approach to identify the most uncertain examples in sub-linear time. Actively selected examples For 4.5 million unlabeled instances, 10 minutes machine time per iter, vs. 60 hours for a naïve scan. For 4.5 million unlabeled instances, 10 minutes machine time per iter, vs. 60 hours for a naïve scan.

Live active learning results on Flickr test set Outperforms status quo data collection approach [Vijayanarasimhan & Grauman, CVPR 2011]

Summary Humans are not simply “label machines” Widen access to visual knowledge –New forms of input, often requiring associated new learning algorithms Manage large-scale annotation efficiently –Cost-sensitive active question asking Live learning: moving beyond canned datasets