C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Animal, Plant & Soil Science
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Critical Thinking Course Introduction and Lesson 1
Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Caifeng Shan, Member, IEEE IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012.
Continuous Audit at Insurance Companies
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Designing Software for Personal Music Management and Access Frank Shipman & Konstantinos Meintanis Department of Computer Science Texas A&M University.
Physical Database Monitoring and Tuning the Operational System.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Observing Behavior A nonexperimental approach. QUANTITATIVE AND QUALITATIVE APPROACHES Quantitative Focuses on specific behaviors that can be easily quantified.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Sabine Mendes Lima Moura Issues in Research Methodology PUC – November 2014.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
Science and Engineering Practices
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Smart Learning Services Based on Smart Cloud Computing
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Mining and Summarizing Customer Reviews
RSBM Business School Research in the real world: the users dilemma Dr Gill Green.
2 1 Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
OO Analysis and Design CMPS OOA/OOD Cursory explanation of OOP emphasizes ▫ Syntax  classes, inheritance, message passing, virtual, static Most.
1. Develops ideas, plans, and produces artworks that serve specific functions (e.g., expressive, social, and utilitarian).
Modeling, Searching, and Explaining Abnormal Instances in Multi-Relational Networks Chapter 1. Introduction Speaker: Cheng-Te Li
A methodology for developing new technology ideas to avoid
2 1 Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design - Lecture 2
OBSERVATIONAL METHODS © 2012 The McGraw-Hill Companies, Inc.
Classroom Strategies Classroom Strategies. Our classroom strategies are the most effective ways to build fluency, vocabulary, comprehension, and writing.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
Introduction to research methods 10/26/2004 Xiangming Mu.
Assumes that events are governed by some lawful order
Media Arts and Technology Graduate Program UC Santa Barbara MAT 259 Visualizing Information Winter 2006George Legrady1 MAT 259 Visualizing Information.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 The Theoretical Framework. A theoretical framework is similar to the frame of the house. Just as the foundation supports a house, a theoretical framework.
Facilitating Document Annotation using Content and Querying Value.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Pad++: A Zooming Graphical Interface for Exploring Alternate Interface Physics Presented By: Daniel Loewus-Deitch.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
OBSERVATIONAL METHODS © 2009 The McGraw-Hill Companies, Inc.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 Presentation Methodology Summary B. Golden. 2 Introduction Why use visualizations?  To facilitate user comprehension  To convey complexity and intricacy.
1. Take Out Completed Activity 2
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
Facilitating Document Annotation Using Content and Querying Value.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
AP Statistics Tutorial Welcome to Stat Trek's free, online Advanced Placement (AP) Statistics tutorial. It has been carefully developed to help you master.
Laboratory Science and Quantitative Core Requirements.
Qualitative Research Quantitative Research. These are the two forms of research paradigms (Leedy, 1997) which are qualitative and quantitative These paradigms.
Visual Information Retrieval
Glossary of Terms Used in Science Papers AS
Qualitative Research Quantitative Research.
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Information Networks: State of the Art
RESEARCH BASICS What is research?.
Visual Question Answering
Debate issues Sabine Mendes Lima Moura Issues in Research Methodology
Presentation transcript:

C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual Abstraction

Outline Introduction Related work Generating abstract images Semantic importance of visual features Measuring the semantic similarity of images Relating text to visual phenomena Discussion

Introduction A fundamental goal of computer vision is to discover the semantically meaningful information contained within an image. Images contain a vast amount of knowledge including the presence of various objects, their properties, and their relations to other objects.

Introduction Numerous works have explored related areas, including ranking the relative importance of visible objects and semantically interpreting images and so on. Unfortunately progress in this direction is restricted by our limited ability to automatically extract a diverse and accurate set of visual features from real images.

Introduction In this paper we pose the question: “Is photorealism necessary for the study of semantic understanding?” We propose a novel methodology for studying semantic understanding. Unlike traditional approaches that use real images, we hypothesize that the same information can be learned from abstract images rendered from a collection of clip art.

Introduction

The use of synthetic images provides two main advantages over real images. First, the difficulties in automatically detecting or hand- labeling relevant information in real images can be avoided. Using abstract images, even complex relation information can be easily computed given the relative placement of the clip art. Second, it is possible to generate different, yet semantically similar scenes.

Introduction Contributions: 1. Our main contribution is a new methodology for studying semantic information using abstract images. 2. We measure the mutual information between visual features and the semantic classes to discover which visual features are most semantically meaningful. 3. We compute the relationship between words and visual features. 4. We analyze the information provided by various types of visual features in predicting semantic similarity.

Related work Numerous papers have explored the semantic understanding of images. Most relevant are those that try to predict a written description of a scene from image features. These methods use a variety of approaches. For instance, methods generating novel sentences rely on the automatic detection of objects and attributes, and use language statistics or spatial relationships for verb prediction. Our dataset has the unique property of having sets of semantically similar images, i.e. having multiple images per sentence description. Our scenes are (trivially) fully annotated, unlike previous datasets that have limited visual annotation.

Generating abstract images Our goal is to create a set of scenes that are semantically similar. There are two main concerns when generating a collection of abstract images. First, they should be comprehensive. The images must have a wide variety of objects, actions, relations, etc. Second, they should generalize. The properties learned from the dataset should be applicable to other domains.

Generating abstract images Initial scene creation Our scenes are created from a collection of 80 pieces of clip art created by an artist. Clip art depicting a boy and girl are created from seven different poses and five different facial expressions, resulting in 35 possible combinations for each. Pieces of clip art represent the other objects in the scene, including trees, toys, hats, animals, etc.

Generating abstract images Initial scene creation

Generating abstract images Generating scene descriptions A new set of subjects were asked to describe the scenes. The subjects were asked to describe the scene using one or two sentences.

Generating abstract images Generating semantically similar scenes We asked subjects to generate scenes depicting the written descriptions. By having multiple subjects generate scenes for each description, we can create sets of semantically similar scenes. The same scene generation interface was used as described above with two differences. First, the subjects were given a written description of a scene and asked to create a scene depicting it. Second, the clip art was randomly chosen as above, except we enforced any clip art that was used in the original scene was also included.

Generating abstract images Generating semantically similar scenes In total, we generated 1,002 original scenes and descriptions. Ten scenes were generated from each written description, resulting in a total of 10,020 scenes. That is, we have 1,002 sets of 10 scenes that are known to be semantically similar.

Semantic importance of visual features We examine the relative semantic importance of various scene properties or features. We use the mutual information shared between a specified feature and a set of classes representing semantically similar scenes. Mutual information (MI) measures how much information the knowledge of either the feature or the class provide of the other. For instance, if the MI between a feature and the classes is small, it indicates that the feature provides minimal information for determining whether scenes are semantically similar

Semantic importance of visual features X is the set of feature values, and Y is the set of scene classes In many instances, we want to measure the gain in information due to the addition of new features. To measure the amount of information that is gained from a feature X over another feature Z we use the Conditional Mutual Information(CMI),

Semantic importance of visual features

Measuring the semantic similarity of images The semantic similarity of images is dependent on the various characteristics of an image, such as the object present, their attributes and relations.

Relating text to visual phenomena

Discussion The potential of using abstract images to study the high-level semantic understanding of visual data is especially promising Numerous potential applications exist for semantic datasets using abstract images, which we’ve only begun to explore in this paper. Abstract scenes can represent the same complex relationships that exist in natural scenes, and additional datasets may be generated to explore new scenarios or scene types. Future research on high-level semantics will be free to focus on the core problems related to the occurrence and relations between visual phenomena.

Thank You