Download presentation
Presentation is loading. Please wait.
Published byKelley Campbell Modified over 8 years ago
1
eNTERFACE 08 Project 2 “multimodal high-level data integration” Mid-term presentation August 19th, 2008
2
Team Olga Vybornova (Université catholique de Louvain, UCL-TELE, Belgium) Hildeberto Mendonça (Université catholique de Louvain, UCL-TELE, Belgium) Ao Shen (University of Birmingham, UK) Daniel Neiberg (TMH/CTT, KTH Royal Institute of Technology, Sweden) David Antonio Gomez Jauregui (TELECOM and Management SudParis, France)
3
Project objectives to augment and improve the previous work, look for new methods of data fusion to resolve the problem and implement a/the technique distinguishing between the data from different modalities that should be fused and the data that should not be fused but analyzed separately to explore and employ a context-aware cognitive architecture for decision-making purposes. 3
4
4 A set of variables describing states of the world (user’s input, an object, an event, behavior, etc.) represented in different media and through different information channels. GOAL OF DATA FUSION: The result of the fusion (merging semantic content from multiple streams) should give an efficient joint interpretation of the multimodal behavior of the user(s) – to provide effective and advanced interaction Background - Multimodality
5
Audio Stream Video Stream Speech Recognizer Video Analyzer Sound Waves Syntactic Analyzer Recognized String Sequence of Images Semantic Analyzer Syntactic Triple Knowledge Base Fusion Mechanism Human Behavior Analyzer Movements Coordinates Movements Meanings Advise People Linguistic meanings
6
Audio Stream Video Stream Speech Recognizer Video Analyzer Sound Waves Syntactic Analyzer Recognized String Sequence of Images Semantic Analyzer Syntactic Triple Knowledge Base Fusion Mechanism Human Behavior Analyzer Movements Coordinates Movements Meanings Advise People Linguistic meanings
7
Audio Stream Video Stream Sphinx-4 Open CV Sound Waves C & C Tool Parser Recognized String Sequence of Images C & C Tool Boxer Syntax Analysis Protegè Jena Fusion Mechanism Human Behavior Analyzer Movements Coordinates Movements Meanings Advise People Linguistic meanings Semantic Validation
8
Integration 8 All tools are integrated through socket communication C++ and Java interoperating normally The interchanging data format is XML Verifiable Easy data identification Easy data compatibility Low cost of manipulation Processing XML on demand Main issues: transparency, extensibility and customization
9
Speech Recognition 9 Sphinx 4 Integrated in system! Fined tuned for maximum length of n-best lists 2 Language models created Scenario dependent 3-grams, 150 Words 86,9% Accuracy, Speed: 0,94 X real time Wall Street Journal + scenarios 3-grams, 5000 words 68,6% Accuracy, Speed: 3,19 X real time
10
Speech Identification 10 Standard GMM-based speaker identification system Developed in Matlab To the right are the results from a 2-person development set as a function of Gaussians
11
Speech Recognition Output 11 yesterday i received an email from nick yesterday i received an email from nick yesterday i received an email from nick to yesterday i received an email from nick for
12
Syntax and Semantics 12
13
Syntax and Semantics 13
14
Syntax and Semantics 14
15
Image Processing 15 OpenCV Library (Open Source) Motion History to calculate the motion direction Matching template to identify objects in the scene Gaussian probability distribution to model the color of clothes Background subtraction technique to detect the foreground Blob identification to track people in the scene
16
Image Processing 16
17
Image Processing Output 17
18
Ontology 18 Restricted-domain ontology – structure and its instantiation Pattern situations (semantic frames) User profile - a priori collected information about users - preferences, social relationships information, etc. - and dynamically obtained data Using Protegè to create and edit Using Jena to manage the ontology data
19
Ontology 19
20
Project schedule 20 Overall progress: 65 % WP1: Workshop preparation – Done WP2: Integration of multimodal components – Done WP3: Multimodal fusion implementation – Running WP4: Scenario implementation and reporting – To do Strategic changes to achieve the goal: Everybody focusing on the fusion mechanism Less priority on the improvement of modalities Each risky task has a plan B associated with less time consuming, but less robust too.
21
Next Steps 21 Intergration of WordNet into the ontology Rules to process human behavior Mapping the semantic analysis with the ontology Fusion mechanism
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.