Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing,

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.

Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.

Computation and representation Joe Lau. Overview of lecture What is computation? Brief history Computational explanations in cognitive science Levels.

Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)

Automatic indexing and retrieval of crime-scene photographs Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield Scene of.

Mining for High Complexity Regions Using Entropy and Box Counting Dimension Quad-Trees Rosanne Vetro, Wei Ding, Dan A. Simovici Computer Science Department.

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

ADVISE: Advanced Digital Video Information Segmentation Engine

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.

An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.

AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.

Project IST_1999_ ARTISTE – An Integrated Art Analysis and Navigation Environment Review Meeting N.1: Paris, C2RMF, November 28, 2000 Workpackage.

Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.

Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.

Information Retrieval in Practice

Smart Learning Services Based on Smart Cloud Computing

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System.

CS 586 – Distributed Multimedia Information Management Prof. Dennis McLeod.

A three-dimensional analysis software for languages Budapest Franz Dotter, Johann Leitner.

Ontology-derived Activity Components for Composing Travel Web Services Matthias Flügge Diana Tourtchaninova

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.

Vision-Language Integration in AI: a reality check Katerina Pastra and Yorick Wilks Department of Computer Science, Natural Language Processing Group,

Crossing Media for Video Search: enabling usability beyond traditional broadcast & TV Katerina Pastra and Stelios Piperidis Language Technology Applications,

Virtual reality. Tasks 3D digital model from planes 3D digital model of existing objects Office work Field observations Solid modeling Photogrammetry.

VRML Dr. Alun Moon What is VRML The Virtual Reality Modeling Language (VRML) is a file format for describing interactive 3D objects.

 Coding efficiency/Compression ratio:  The loss of information or distortion measure:

Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,

Multimedia Databases (MMDB)

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.

Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Semantic Web - Multimedia Annotation – Steffen Staab

NATIONAL TECHNICAL UNIVERSITY OF ATHENS Image, Video And Multimedia Systems Laboratory Background

Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.

出處： Signal Processing and Communications Applications, 2006 IEEE 作者： Asanterabi Malima, Erol Ozgur, and Miijdat Cetin 2015/10/251 指導教授：張財榮學生：陳建宏學號： M97G0209.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Workshop on Semantic Knowledge in Computer Vision, ICCV 2005 Symbol Grounding for Semantic Image Interpretation: From Image Data to Semantics Céline Hudelot,

Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.

Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.

ANALOGY “A Program for the Solution of a Class of Geometric-Analogy Intelligence-Test Questions” Thomas G. Evans 1968.

1 Viewing Vision-Language Integration as a Double-Grounding case Katerina Pastra Department of Computer Science, Natural Language Processing Group, University.

Chapter 10. The Explorer System in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans On, Kyoung-Woon Biointelligence Laboratory.

MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.

DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.

Ontology Support for Abstraction Layer Modularization Hyun Cho, Jeff Gray Department of Computer Science University of Alabama

Situation We now accept that grammar is not restricted to writing but is present in speech. Problem This can lead to assumptions that there is one kind.

1 Domain Management in a Hierarchical Generic Models Library University Pascal Paoli of Corsica SPE Laboratory Fabrice BERNARDI, Jean-François SANTUCCI.

Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham.

21/1/ Analysis - Model of real-world situation - What ? System Design - Overall architecture (sub-systems) Object Design - Refinement of Design.

Concept mining for programming automation. Problem ➲ A lot of trivial tasks that could be automated – Add field Patronim on Customer page. – Remove field.

Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.

User Interaction in Computer-based learning the bandwidth problem.

Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.

Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.

The PLA Model: On the Combination of Product-Line Analyses 강태준.

Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.

© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.

An Ontology framework for Knowledge-Assisted Semantic Video Analysis and Annotation Centre for Research and Technology Hellas/ Informatics and Telematics.

Visual Information Retrieval

Color-Texture Analysis for Content-Based Image Retrieval

Tomás Murillo-Morales and Klaus Miesenberger

Multimedia Information Retrieval

Visual Grounding.

Presentation transcript:

Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing, Athens, Greece

The pervasive digital video context File-swapping networks (P2P), (video files & video blogs) IPTV, iTV Video search engines Conversational robots, MM presentation systems... access to MM content generation of MM content Auto-analysis of image-language relations complementarity independence equivalence

Overview Focus on semantic equivalence relation = Multimedia Integration = image-language association  Brief review of state of the art association mechanisms – feature sets used  The OntoVis feature set suggestion  Using OntoVis in the VLEMA prototype  Prospects for going from 3D to 2D  Future plans and conclusions

Association Mechanisms in prototypes Intelligent MM systems from SHRDLU to conversational robots of new millennium (Pastra and Wilks 2004):  Simulated or manually abstracted visual input is used  to avoid difficulties in image analysis  Integration resources used with a priori known associations (e.g. image X on screen is a “ball”), or allowing simple inferences e.g. matching an input image to an object-model in the resource, which is in its turn linked to a “concept/word” )  to avoid difficulties in associating V-L  Applications are restricted to blocksworlds/miniworlds  scaling issues

Association algorithms To be embedded in prototypes:  Probabilistic approaches for learning (e.g. Barnard et al. 2003)  use word/phrase + image/image region (f-v vectors)  require properly annotated corpora (IBM, Pascal etc.)  Logic-based approaches (e.g. Dasiopoulou et al. 2004)  use feature-augmented ontologies  match low-level image features + leaf nodes  Use of both approaches reported too (Srikanth et al. 2005) Feature set used: shape, colour, texture, position, size Scaling?

The quest for the appropriate f-set Constraints in defining a f-set:  Features must be distinctive of object classes (at the basic-level)  Feature values must be detectable by image analysis modules Cognitive thesis: No feature set is fully representative of the characteristics of an object, but one may be more or less successful in fixing the reference of the corresponding concept (word)

The OntoVis suggestion Feature-set suggested physical structure: the number of parts into which an object is expected to be decomposed in different dimensions visually verifiable functionality: visual characteristics an object may have which are related to its function, & interrelations: relative location of objects, relative size A domain model  Ontology + KBase for static indoor scenes (sitting rooms in 3D – XI KR language)

The OntoVis suggestion x y z

OntoVis – KB examples props(sofa(X),[has_xclusters_moreThan(X,1)]). props(sofa(X),[has_yclusters_equalMoreThan(X,2)]). props(sofa(X),[has_ yclusters_equalLessThan(X,4)]). props(sofa(X),[has_ zclusters_equalMoreThan(X,2)]). props(sofa(X),[has_zclusters_equalLessThan(X,3)]). props(sofa(X),[on_floor(X,yes)]). props(sofa(X),[has_surface(X,yes)]). props(sofa(X),[size(X,XCLUSTERS)]). props(chair(X),[has_xclusters (X,1)]). props(chair(X),[has_ yclusters_equalMoreThan(X,2)]). props(chair(X),[has_ yclusters_equalLessThan(X,4)]). props(chair(X),[has_zclusters_equalMoreThan(X,2)]). props(chair(X),[has_zclusters_equalLessThan(X,3)]). props(chair(X),[on_floor(X,yes)]). props(chair(X),[has_surface(X,yes)]). Props(chair(X),[size(X,XCLUSTER_YValue,TableYDIM_UpperConstraint)]). armchairs? stools?

OntoVis – KB examples props(table(X),[has_xclusters(X,1)]). props(table(X),[has_yclusters(X,2)]). props(table(X),[has_zclusters(X,1)]). props(table(X),[on_floor(X,yes)]). props(table(X),[has_surface(X,yes)]). props(table(X),[size(X,YDIM,XDIM, Relative_to_Room_YXDIM)]). props(heater(X),[has_xclusters(X,1)]). props(heater(X),[has_yclusters(X,1)]). props(heater(X),[has_zclusters(X,1)]). props(heater(X),[on_wall(X,yes)]). props(heater(X),[on_floor(X,no)]). props(heater(X),[has_surface(X,yes)]). props(heater(X),[size(X,XDIM,YDIM, Relative_to_Wall_YXDIM)]).

OntoVis F-set advantages  It generalizes over visual appearance differences (e.g. different styles of sofas)  It goes beyond viewpoint (view angle + distance) differences  It can be used to reason on object id by analogy (e.g. to describe “sofa-like” objects if not certain)

Using OntoVis VLEMA: A Vision-Language intEgration MechAnism  Input: automatically re-constructed static scenes in 3D (VRML format) from RESOLV (robot-surveyor)  Integration task: Medium Translation from images (3D sitting rooms) to text (what and where in EN)  Domain: estates surveillance  Horizontal prototype  Implemented in shell programming and ProLog

The Input

OntoVis + KB “…a heater … and a sofa with 3 seats…” Description Data Transformations Object Segmentation Object Naming System Architecture

The Output Wed Jul 7 13:22:22 GMTDT 2004 VLEMA V1.0 Katerina of Sheffield Description of the automatically constructed VRML file “development-scene.wrl” This is a general view of a room. We can see the front wall, the left-side wall, the floor, A heater on the lower part of the front-wall and a sofa with 3 seats. The heater is shorter in length than the sofa. It is on the right of the sofa.

 Extension of OntoVis and testing in VRML worlds  Modular description of clusters/parts (not rely just on their number in each dimension)  Exploration of portability of f-set to 2D images Initial signs of feasibility: cf. research on detecting spatial relations in 2D, structure-identification in 2D, algorithms for 3D reconstruction from photographs) Future Plans & Conclusions To what extent scalable even in 3D? Complementary or alternative to current approaches? OntoVis Indications of OntoVis scalability & feasibility that worth further exploration