Download presentation
Presentation is loading. Please wait.
1
D1 - 12/12/2000 Le présent document contient des informations qui sont la propriété de France Télécom. L'acceptation de ce document par son destinataire implique, de la part de ce dernier, la reconnaissance du caractère confidentiel de son contenu et l'engagement de n'en faire aucune reproduction, aucune transmission à des tiers, aucune divulgation et aucune utilisation commerciale sans l'accord préalable écrit de France Télécom R&D France Telecom's expectations and research in Object Recognition Henri Sanson, Christophe Laurent, Olivier Bernier
2
D2 - 12/12/2000 France Télécom R&D Outline è France Telecom Markets and context evolution è Visual content indexing: from low-level to semantic description Overview of current research in Image retrieval and video annotation è Object recognition for Human Computer Interface
3
D3 - 12/12/2000 France Télécom R&D France Telecom Markets and Applications è France Telecom is a global Telecommunication operator: è from telephony to multimedia and audiovisual services Fixed networks/services: Mobiles networks/services: Internet access and services IP Data and communication services for corporations è 2 structuring trends: An increasing importance and presence of visual contents in the services: leveraging the higher bandwidth high value added services (people pay for content and the means to reach it easily) A need to compensate an increasing technological and functional complexity by providing natural Human Machine/Service Interfaces Vocal Interfaces Visual interfaces
4
D4 - 12/12/2000 France Télécom R&D Visual content indexing: from low-level to semantic description (1) è Context: more visual content, huge data volumes, temporal constraint for videos Need for efficient indexing methods enabling fast and relevant access è Applications: Media asset management: in addition to traditional audiovisual companies (TV, production), more and more enterprises own image or video assets and face management issues. Relevance has prime importance But cost effectiveness is becoming more and more accute Web search and filtering engines Huge volume, very variable: robust automatic indexing is the only solution Although surrounding text may be used, the visual content itself is the only reliable source to use Video surveillance: Specific environment and content type Automatic processing
5
D5 - 12/12/2000 France Télécom R&D Visual content indexing: from low-level to semantic description (2) è Traditionnally 2 radically opposite approaches: Accurate but manual Semantic Annotation: Ontologies Time consuming The indexing choices limit possible queries Low-level –based feature descriptions: aka "Color, Texture and Shape" MPEG-7 Visual Framework Automatic processing but very limited in practice (save for some classification purposes): relies on query-by-example, little usable è Emerging trend: convergence of both previous paradigms Automated knowledge-based semantic indexing using visual recognition Many advantages: Semantic and automatic No linguistic fence Indexing complementation is always possible But still difficult !
6
D6 - 12/12/2000 France Télécom R&D Visual content indexing: from low-level to semantic description (3) è Constraints of the application impacting the recognition: High variability of shooting conditions: same objects appear very differently: color, pose, scale, shadows, High variability in the content type (indoor, outdoor, News, movies, sports) Potentially huge number of objects or object categories to recognize concurrently Video Real time working targeted, even much faster for still image Recognition approach as flexible as possible is expected for generic objects è Qualification of the methods must in fine be done by real experimentation "on the ground" by true end users, and is measured by their satisfaction rate.
7
D7 - 12/12/2000 France Télécom R&D Current work (1) è Research: Color space invariance w.r.t shooting conditions Salient feature-based image retrieval and object description Face detection and recognition in generic images: è Development: Video indexing platform: Video: shot change detection, specific image labelling (news speaker, weather report/ commercal gingle), face detection, text detection Audio/speech: speech/music/other segmentation, keywords recognition, free vocabulary phonetic search
8
D8 - 12/12/2000 France Télécom R&D Saliency-based Color Image Indexing è Image signature is extracted from a limited number of perceptually important pixels called salient points è Salient points are computed by combining a discrete wavelet transform with a Zerotree representation of wavelet coefficients Salient points are located on most sharp boundaries è The image signature is composed of a color correlogram computed in the neighborhood of each salient point This signature can be completed with a texture signature computed around the salient points è An invariant color space (c1c2c3) is used to be robust to imaging conditions
9
D9 - 12/12/2000 France Télécom R&D Salient Points Extraction
10
D10 - 12/12/2000 France Télécom R&D Experimental Results è Database containing ~2000 TV images è Extraction of 18 difficult requests è Computation of ranking metric è Comparaison with the MPEG-7 SCD (Scalable Color Descriptor) Outperforms SCD in 90% of cases !
11
D11 - 12/12/2000 France Télécom R&D Object recognition for Human Computer Interface (1) è Context: Services functionalities are everyday more and more sophisticated End users are expecting simpler user interfaces Visual interactions appear to be a good complement to more usual vocal interfaces: Universality: much less constrained by linguistical variability Web cams are widespread Maybe less sensitive to environmental noise/ capturing conditions Permits fast interaction: an image worths 1000 words
12
D12 - 12/12/2000 France Télécom R&D Visual recognition for Human-Computer Interfaces è Face detection and tracking Neural Network based face detection for still images. Extension to real time face detection in video streams. Real time face tracking for HCI using statistical models (EM, particle filtering). è Gesture recognition Static hand posture recognition based on neural networks. Dynamic gesture recognition (HMMS, IOHMMs and GIOHMMs). Body tracking in 2D. Body tracking in 3D using disparity cameras (Triclops).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.