Download presentation
Presentation is loading. Please wait.
Published byEsmond Cole Modified over 9 years ago
1
Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing, Athens, Greece
2
The pervasive digital video context File-swapping networks (P2P), (video files & video blogs) IPTV, iTV Video search engines Conversational robots, MM presentation systems... access to MM content generation of MM content Auto-analysis of image-language relations complementarity independence equivalence
3
Overview Focus on semantic equivalence relation = Multimedia Integration = image-language association Brief review of state of the art association mechanisms – feature sets used The OntoVis feature set suggestion Using OntoVis in the VLEMA prototype Prospects for going from 3D to 2D Future plans and conclusions
4
Association Mechanisms in prototypes Intelligent MM systems from SHRDLU to conversational robots of new millennium (Pastra and Wilks 2004): Simulated or manually abstracted visual input is used to avoid difficulties in image analysis Integration resources used with a priori known associations (e.g. image X on screen is a “ball”), or allowing simple inferences e.g. matching an input image to an object-model in the resource, which is in its turn linked to a “concept/word” ) to avoid difficulties in associating V-L Applications are restricted to blocksworlds/miniworlds scaling issues
5
Association algorithms To be embedded in prototypes: Probabilistic approaches for learning (e.g. Barnard et al. 2003) use word/phrase + image/image region (f-v vectors) require properly annotated corpora (IBM, Pascal etc.) Logic-based approaches (e.g. Dasiopoulou et al. 2004) use feature-augmented ontologies match low-level image features + leaf nodes Use of both approaches reported too (Srikanth et al. 2005) Feature set used: shape, colour, texture, position, size Scaling?
6
The quest for the appropriate f-set Constraints in defining a f-set: Features must be distinctive of object classes (at the basic-level) Feature values must be detectable by image analysis modules Cognitive thesis: No feature set is fully representative of the characteristics of an object, but one may be more or less successful in fixing the reference of the corresponding concept (word)
7
The OntoVis suggestion Feature-set suggested physical structure: the number of parts into which an object is expected to be decomposed in different dimensions visually verifiable functionality: visual characteristics an object may have which are related to its function, & interrelations: relative location of objects, relative size A domain model Ontology + KBase for static indoor scenes (sitting rooms in 3D – XI KR language)
8
The OntoVis suggestion x y z
9
OntoVis – KB examples props(sofa(X),[has_xclusters_moreThan(X,1)]). props(sofa(X),[has_yclusters_equalMoreThan(X,2)]). props(sofa(X),[has_ yclusters_equalLessThan(X,4)]). props(sofa(X),[has_ zclusters_equalMoreThan(X,2)]). props(sofa(X),[has_zclusters_equalLessThan(X,3)]). props(sofa(X),[on_floor(X,yes)]). props(sofa(X),[has_surface(X,yes)]). props(sofa(X),[size(X,XCLUSTERS)]). props(chair(X),[has_xclusters (X,1)]). props(chair(X),[has_ yclusters_equalMoreThan(X,2)]). props(chair(X),[has_ yclusters_equalLessThan(X,4)]). props(chair(X),[has_zclusters_equalMoreThan(X,2)]). props(chair(X),[has_zclusters_equalLessThan(X,3)]). props(chair(X),[on_floor(X,yes)]). props(chair(X),[has_surface(X,yes)]). Props(chair(X),[size(X,XCLUSTER_YValue,TableYDIM_UpperConstraint)]). armchairs? stools?
10
OntoVis – KB examples props(table(X),[has_xclusters(X,1)]). props(table(X),[has_yclusters(X,2)]). props(table(X),[has_zclusters(X,1)]). props(table(X),[on_floor(X,yes)]). props(table(X),[has_surface(X,yes)]). props(table(X),[size(X,YDIM,XDIM, Relative_to_Room_YXDIM)]). props(heater(X),[has_xclusters(X,1)]). props(heater(X),[has_yclusters(X,1)]). props(heater(X),[has_zclusters(X,1)]). props(heater(X),[on_wall(X,yes)]). props(heater(X),[on_floor(X,no)]). props(heater(X),[has_surface(X,yes)]). props(heater(X),[size(X,XDIM,YDIM, Relative_to_Wall_YXDIM)]).
11
OntoVis F-set advantages It generalizes over visual appearance differences (e.g. different styles of sofas) It goes beyond viewpoint (view angle + distance) differences It can be used to reason on object id by analogy (e.g. to describe “sofa-like” objects if not certain)
12
Using OntoVis VLEMA: A Vision-Language intEgration MechAnism Input: automatically re-constructed static scenes in 3D (VRML format) from RESOLV (robot-surveyor) Integration task: Medium Translation from images (3D sitting rooms) to text (what and where in EN) Domain: estates surveillance Horizontal prototype Implemented in shell programming and ProLog
13
The Input
14
OntoVis + KB “…a heater … and a sofa with 3 seats…” Description Data Transformations Object Segmentation Object Naming System Architecture
15
The Output Wed Jul 7 13:22:22 GMTDT 2004 VLEMA V1.0 Katerina Pastra@University of Sheffield Description of the automatically constructed VRML file “development-scene.wrl” This is a general view of a room. We can see the front wall, the left-side wall, the floor, A heater on the lower part of the front-wall and a sofa with 3 seats. The heater is shorter in length than the sofa. It is on the right of the sofa.
16
Extension of OntoVis and testing in VRML worlds Modular description of clusters/parts (not rely just on their number in each dimension) Exploration of portability of f-set to 2D images Initial signs of feasibility: cf. research on detecting spatial relations in 2D, structure-identification in 2D, algorithms for 3D reconstruction from photographs) Future Plans & Conclusions To what extent scalable even in 3D? Complementary or alternative to current approaches? OntoVis Indications of OntoVis scalability & feasibility that worth further exploration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.