Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.

Slides:



Advertisements
Similar presentations
Attributes for Classifier Feedback Amar Parkash and Devi Parikh.
Advertisements

Bringing Semantics Into Focus Using Visual Abstraction Larry ZitnickDevi Parikh Microsoft ResearchVirginia Tech Larry ZitnickDevi Parikh Microsoft ResearchVirginia.
Describing Images Using Attributes. Describing Images Farhadi et.al. CVPR 2009.
Adding Unlabeled Samples to Categories by Learned Attributes Jonghyun Choi Mohammad Rastegari Ali Farhadi Larry S. Davis PPT Modified By Elliot Crowley.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam, Larry Zitnick, and Devi Parikh.
3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.
SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.
Large-Scale Object Recognition using Label Relation Graphs Jia Deng 1,2, Nan Ding 2, Yangqing Jia 2, Andrea Frome 2, Kevin Murphy 2, Samy Bengio 2, Yuan.
Machine learning continued Image source:
Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.
Capturing Human Insight for Visual Learning Kristen Grauman Department of Computer Science University of Texas at Austin Work with Sudheendra Vijayanarasimhan,
Fraud Detection CNN designed for anti-phishing. Contents Recap PhishZoo Approach Initial Approach Early Results On Deck.
Electrical & Computer Engineering Dept. University of Patras, Patras, Greece Evangelos Skodras Nikolaos Fakotakis.
Lecture 31: Modern object recognition
My Group’s Current Research on Image Understanding.
IT’S NOT POLITE TO POINT: DESCRIBING PEOPLE WITH UNCERTAIN ATTRIBUTES CVPR 2013 Poster.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
Relative Attributes Presenter: Shuai Zheng (Kyle) Supervised by Philip H.S. Torr Author: Devi Parikh (TTI-Chicago) and Kristen Grauman (UT-Austin)
Computer Vision & Biomimetic Object Recognition Bruce A. Draper Department of Computer Science January 28, 2008.
Learning to Extract Form Labels Nguyen et al.. The Challenge We want to retrieve and integrate online databases We want to retrieve and integrate online.
Spatial Pyramid Pooling in Deep Convolutional
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.
Unsupervised Learning of Hierarchical Spatial Structures Devi Parikh, Larry Zitnick and Tsuhan Chen.
Kuan-Chuan Peng Tsuhan Chen
Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
A General Framework for Tracking Multiple People from a Moving Camera
CVPR 2013 Diversity Tutorial Closing Remarks: What can we do with multiple diverse solutions? Dhruv Batra Virginia Tech.
Category Discovery from the Web slide credit Fei-Fei et. al.
© Devi Parikh 2008 Localization and Segmentation of 2D High Capacity Color Barcodes Gavin Jancke Microsoft Research, Redmond Devi Parikh Carnegie Mellon.
Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.
Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Mining Logical Clones in Software: Revealing High-Level Business & Programming Rules Wenyi Qian 1, Xin Peng 1, Zhenchang Xing 2, Stan Jarzabek 3, Wenyun.
Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
WhittleSearch: Image Search with Relative Attribute Feedback CVPR 2012 Adriana Kovashka Devi Parikh Kristen Grauman University of Texas at Austin Toyota.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.
Multimedia Analytics Jianping Fan Department of Computer Science University of North Carolina at Charlotte.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Interactively Discovery of Attributes Vocabulary Devi Parikh and Kristen Grauman.
Real Time Collaboration and Sharing
IEEE 2015 Conference on Computer Vision and Pattern Recognition Active Learning for Structured Probabilistic Models with Histogram Approximation Qing SunAnkit.
Richer Human-Machine Communication in Attributes-based Visual Recognition Devi Parikh TTIC.
Gaussian Conditional Random Field Network for Semantic Segmentation
NEIL: Extracting Visual Knowledge from Web Data Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta Carnegie Mellon University CS381V Visual Recognition -
Analyzing the Behavior of Deep Models Dhruv Batra Devi Parikh Aishwarya Agrawal (EMNLP 2016)
A Visual Stepping Stone to AI
Data Driven Attributes for Action Detection
Overview of Challenge Aishwarya Agrawal (Virginia Tech)
Learning and Using Common Sense
Project Implementation for ITCS4122
Thesis Advisor : Prof C.V. Jawahar
Accounting for the relative importance of objects in image retrieval
Rob Fergus Computer Vision
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Zeroshot Learning Mun Jonghwan.
Descriptive Words Location
Semantic Segmentation
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Deep Structured Scene Parsing by Learning with Image Descriptions
Week 6 Presentation Ngoc Ta Aidean Sharghi.
Presentation transcript:

Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech

Image Understanding Slide credit: Devi Parikh “Color College Avenue”, Blacksburg, VA, May 2012

Accuracy MachineHuman State of Affairs Slide credit: Devi Parikh

How do we teach machines today? Slide credit: Devi Parikh

And on, and on, and on… Slide credit: Devi Parikh

How do machines behave? Slide credit: Devi Parikh

Airplane Cabin Amusement Park Aquarium Badminton CourtBedroom Xiao et al., CVPR 2010 Slide credit: Devi Parikh

Clarifai, April 10 th 2014

Slide credit: Devi Parikh Need a better mode of communication! Interacting with Vision Systems

Attributes Examples: furry, natural, young, etc. Mid-level Shareable across concepts Human understandable Machine detectable Allow for human-machine communication Slide credit: Devi Parikh

[Parikh and Grauman, ICCV 2011] [Parkash and Parikh, ECCV 2012] [Biswas and Parikh, CVPR 2013] [Lad and Parikh, ECCV 2014] [Kovashka, Parikh and Grauman, CVPR 2012] [Parikh and Grauman, ICCV 2013] [Bansal, Farhadi and Parikh, ECCV 2014] Supervisor User Role of the Human Communicator Supervisor User Human Machine Image SearchInstilling Domain Knowledge Characterizing Failure Modes Interpretable Models My missing brother is fuller-faced than this boy. Polar bears are white and larger than rabbits. If the image is blurry or the face is not frontal, I may fail. Active and Interactive Learning Slide credit: Devi Parikh Supervisor I think this is a polar bear because this is a white and furry animal.

Accessing Common Sense Direct communication Learn by observing structure in our visual world? Slide credit: Devi Parikh

Two professors converse in front of a blackboard. Slide credit: Larry Zitnick

Two professors stand in front of a blackboard. Slide credit: Larry Zitnick

Two professors converse in front of a blackboard. Slide credit: Larry Zitnick

Challenges Lacking visual density Annotations are expensive (and boring) Computer vision doesn’t work well enough Slide credit: Devi Parikh

Is photorealism necessary? Slide credit: Larry Zitnick

JennyMike Slide credit: Larry Zitnick

Interface 2x

Mike fights off a bear by giving him a hotdog while Jenny runs away. Slide credit: Larry Zitnick

1,000 classes of semantically similar scenes: Class 1 Class 2 Class 1,000 1,000 classes x 10 scenes per class = 10,000 scenes Slide credit: Larry Zitnick Dataset Dataset online [Zitnick and Parikh, CVPR 2013]

Slide credit: Larry Zitnick Visual Features

Cloud Cat Basketball Smile Gaze Person sitting Tree Person standing Slide credit: Larry Zitnick Visual Features

Cloud Cat Basketball Smile Gaze Person sitting Tree Person standing Slide credit: Devi Parikh Visual Features Which visual features are important for semantic meaning? Which words correlate with specific visual features?

Generate and Retrieve Scenes Input: Jenny is catching the ball. Mike is kicking the ball. The table is next to the tree. Tuples:,, >,, >,,<>> Slide credit: Devi Parikh [Zitnick, Parikh and Vanderwende, ICCV 2013] Automatically GeneratedHuman Generated Retrieval: score a database of scenes

Slide credit: Devi Parikh [Antol, Zitnick and Parikh, ECCV 2014] Learning Fine-grained Interactions 3x

Learning Fine-grained Interactions Train on abstract, test on real

Results: 60 categories Accuracy % Domain adaptation Learn explicit mapping from abstract to real world Multi-label problem [Antol, Zitnick and Parikh, ECCV 2014]

Visual Abstraction For… Studying mappings between images and text Zero-shot learning Studying image memorability, specificity, etc. Learning common sense knowledge Rich annotation modality – Ask for descriptions – Ask for scenes – Show scene and ask for modification Goes beyond “Jenny and Mike.” Study high-level image understanding tasks without waiting for lower-level vision tasks to be solved

[Xinlei Chen]

Accuracy MachineHuman Conclusion Give computer vision systems access to common- sense knowledge – Communication with humans via attributes (text) – Visual abstraction Use humans for more than just “labels” Slide credit: Devi Parikh

Thank you. Slide credit: Devi Parikh