Download presentation
Presentation is loading. Please wait.
1
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http://cvsp.cs.ntua.gr/projects/musclehttp://cvsp.cs.ntua.gr
2
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 ICCS-NTUA: E-team Researchers & Directions Researchers: P. Maragos, S. Kollias (Faculty members) G. Papandreou, K. Rapantzikos, G. Evangelopoulos, A. Katsamanis, I. Kokkinos (PhD GRA) G. Stamou, I. Avrithis (Post-Doc) (WP6) E-team 1: Audio-Visual (AV) Speech Analysis & Recognition Face Detection, Modeling & Tracking AV Feature Extraction, Fusion, Dynamic Models for AV-ASR AV to Articulatory Speech Inversion (WP6) E-team 2: Audio-Visual Understanding Audio-Visual Salient Event Detection, Integrated Multimedia Content Analysis
3
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 AV-ASR Front-End Speech Feature Transform./ Selection Modulations – Energy Multiband Filtering Nonlinear Processing Demodulation VAD Dynamics - Fractals Embedding Geometrical Filtering Fractal Dimensions Speaker Normalization M-Array Processing Visual Active Appearance Model Face Detection/Tracking Mouth R.O.I. Features Fusion Feature Stream MFCC
4
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 Audiovisual ASR: Face Modeling ● A well studied problem in Computer Vision: ● Active Appearance Models, Morphable Models, Active Blobs ● Both Shape & Appearance can enhance lipreading ● The shape and appearance of human faces “live” in low dimensional manifolds = =
5
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 Image Fitting Example step 2step 6step 10 step 14step 18
6
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 Example: Face Interpretation Using AAM original video shape track superimposed on original video reconstructed face This is what the visual-only speech recognizer “sees”! Generative models like AAM allow us to evaluate the output of the visual front-end
7
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 Joint Image Segmentation and Object Detection via the Expectation Maximization algorithm Generative models ‘compete’ for image observations Segmentation translates into the assignment of image observations into one of K models (image labelling) Segmentation labels are treated like hidden data EM algorithm: Ε-step: use current parameter estimates to assign micro-segments to objects M-step use assignment probabilities to derive optimal model parameters Active Appearance Models used as generative models for the object categories of cars and faces
8
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 Top-Down Segmentation Results Thresholding the E-step we get a hard figure-ground segmentation No ‘shape-prior’ knowledge is necessary for the segmentation generative model contains information about shape variation Combination of bottom-up & top-down detection On false alarm locations the object model manages to reconstruct the image appearance only by chance, thereby typically getting a small image support for the object.
9
Spatio-Temporal Visual Attention I : Video Analysis Create video volume Feature extraction from spatiotemporal data Fusion & saliency generation
10
MUSCLE ICCS - NTUA WP6 E-teams: 8-12-2005 Use spatiotemporal VA for efficient global classification of videos Claim: features extracted only from low or high saliency regions are more representative of the input video Foreground/Background segmentation Claim: most salient regions are related to foreground areas of the video Spatio-Temporal Visual Attention II: Classification & segmentation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.