Presentation is loading. Please wait.

Presentation is loading. Please wait.

Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing,

Similar presentations


Presentation on theme: "Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing,"— Presentation transcript:

1 Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing, Athens, Greece

2 The pervasive digital video context File-swapping networks (P2P), (video files & video blogs) IPTV, iTV Video search engines Conversational robots, MM presentation systems... access to MM content generation of MM content Auto-analysis of image-language relations complementarity independence equivalence

3 Overview Focus on semantic equivalence relation = Multimedia Integration = image-language association  Brief review of state of the art association mechanisms – feature sets used  The OntoVis feature set suggestion  Using OntoVis in the VLEMA prototype  Prospects for going from 3D to 2D  Future plans and conclusions

4 Association Mechanisms in prototypes Intelligent MM systems from SHRDLU to conversational robots of new millennium (Pastra and Wilks 2004):  Simulated or manually abstracted visual input is used  to avoid difficulties in image analysis  Integration resources used with a priori known associations (e.g. image X on screen is a “ball”), or allowing simple inferences e.g. matching an input image to an object-model in the resource, which is in its turn linked to a “concept/word” )  to avoid difficulties in associating V-L  Applications are restricted to blocksworlds/miniworlds  scaling issues

5 Association algorithms To be embedded in prototypes:  Probabilistic approaches for learning (e.g. Barnard et al. 2003)  use word/phrase + image/image region (f-v vectors)  require properly annotated corpora (IBM, Pascal etc.)  Logic-based approaches (e.g. Dasiopoulou et al. 2004)  use feature-augmented ontologies  match low-level image features + leaf nodes  Use of both approaches reported too (Srikanth et al. 2005) Feature set used: shape, colour, texture, position, size Scaling?

6 The quest for the appropriate f-set Constraints in defining a f-set:  Features must be distinctive of object classes (at the basic-level)  Feature values must be detectable by image analysis modules Cognitive thesis: No feature set is fully representative of the characteristics of an object, but one may be more or less successful in fixing the reference of the corresponding concept (word)

7 The OntoVis suggestion Feature-set suggested physical structure: the number of parts into which an object is expected to be decomposed in different dimensions visually verifiable functionality: visual characteristics an object may have which are related to its function, & interrelations: relative location of objects, relative size A domain model  Ontology + KBase for static indoor scenes (sitting rooms in 3D – XI KR language)

8 The OntoVis suggestion x y z

9 OntoVis – KB examples props(sofa(X),[has_xclusters_moreThan(X,1)]). props(sofa(X),[has_yclusters_equalMoreThan(X,2)]). props(sofa(X),[has_ yclusters_equalLessThan(X,4)]). props(sofa(X),[has_ zclusters_equalMoreThan(X,2)]). props(sofa(X),[has_zclusters_equalLessThan(X,3)]). props(sofa(X),[on_floor(X,yes)]). props(sofa(X),[has_surface(X,yes)]). props(sofa(X),[size(X,XCLUSTERS)]). props(chair(X),[has_xclusters (X,1)]). props(chair(X),[has_ yclusters_equalMoreThan(X,2)]). props(chair(X),[has_ yclusters_equalLessThan(X,4)]). props(chair(X),[has_zclusters_equalMoreThan(X,2)]). props(chair(X),[has_zclusters_equalLessThan(X,3)]). props(chair(X),[on_floor(X,yes)]). props(chair(X),[has_surface(X,yes)]). Props(chair(X),[size(X,XCLUSTER_YValue,TableYDIM_UpperConstraint)]). armchairs? stools?

10 OntoVis – KB examples props(table(X),[has_xclusters(X,1)]). props(table(X),[has_yclusters(X,2)]). props(table(X),[has_zclusters(X,1)]). props(table(X),[on_floor(X,yes)]). props(table(X),[has_surface(X,yes)]). props(table(X),[size(X,YDIM,XDIM, Relative_to_Room_YXDIM)]). props(heater(X),[has_xclusters(X,1)]). props(heater(X),[has_yclusters(X,1)]). props(heater(X),[has_zclusters(X,1)]). props(heater(X),[on_wall(X,yes)]). props(heater(X),[on_floor(X,no)]). props(heater(X),[has_surface(X,yes)]). props(heater(X),[size(X,XDIM,YDIM, Relative_to_Wall_YXDIM)]).

11 OntoVis F-set advantages  It generalizes over visual appearance differences (e.g. different styles of sofas)  It goes beyond viewpoint (view angle + distance) differences  It can be used to reason on object id by analogy (e.g. to describe “sofa-like” objects if not certain)

12 Using OntoVis VLEMA: A Vision-Language intEgration MechAnism  Input: automatically re-constructed static scenes in 3D (VRML format) from RESOLV (robot-surveyor)  Integration task: Medium Translation from images (3D sitting rooms) to text (what and where in EN)  Domain: estates surveillance  Horizontal prototype  Implemented in shell programming and ProLog

13 The Input

14 OntoVis + KB “…a heater … and a sofa with 3 seats…” Description Data Transformations Object Segmentation Object Naming System Architecture

15 The Output Wed Jul 7 13:22:22 GMTDT 2004 VLEMA V1.0 Katerina Pastra@University of Sheffield Description of the automatically constructed VRML file “development-scene.wrl” This is a general view of a room. We can see the front wall, the left-side wall, the floor, A heater on the lower part of the front-wall and a sofa with 3 seats. The heater is shorter in length than the sofa. It is on the right of the sofa.

16  Extension of OntoVis and testing in VRML worlds  Modular description of clusters/parts (not rely just on their number in each dimension)  Exploration of portability of f-set to 2D images Initial signs of feasibility: cf. research on detecting spatial relations in 2D, structure-identification in 2D, algorithms for 3D reconstruction from photographs) Future Plans & Conclusions To what extent scalable even in 3D? Complementary or alternative to current approaches? OntoVis Indications of OntoVis scalability & feasibility that worth further exploration


Download ppt "Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing,"

Similar presentations


Ads by Google