ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL:
MUSCLE ICCS - NTUA WP6 E-teams: ICCS-NTUA: E-team Researchers & Directions Researchers: P. Maragos, S. Kollias (Faculty members) G. Papandreou, K. Rapantzikos, G. Evangelopoulos, A. Katsamanis, I. Kokkinos (PhD GRA) G. Stamou, I. Avrithis (Post-Doc) (WP6) E-team 1: Audio-Visual (AV) Speech Analysis & Recognition Face Detection, Modeling & Tracking AV Feature Extraction, Fusion, Dynamic Models for AV-ASR AV to Articulatory Speech Inversion (WP6) E-team 2: Audio-Visual Understanding Audio-Visual Salient Event Detection, Integrated Multimedia Content Analysis
MUSCLE ICCS - NTUA WP6 E-teams: AV-ASR Front-End Speech Feature Transform./ Selection Modulations – Energy Multiband Filtering Nonlinear Processing Demodulation VAD Dynamics - Fractals Embedding Geometrical Filtering Fractal Dimensions Speaker Normalization M-Array Processing Visual Active Appearance Model Face Detection/Tracking Mouth R.O.I. Features Fusion Feature Stream MFCC
MUSCLE ICCS - NTUA WP6 E-teams: Audiovisual ASR: Face Modeling ● A well studied problem in Computer Vision: ● Active Appearance Models, Morphable Models, Active Blobs ● Both Shape & Appearance can enhance lipreading ● The shape and appearance of human faces “live” in low dimensional manifolds = =
MUSCLE ICCS - NTUA WP6 E-teams: Image Fitting Example step 2step 6step 10 step 14step 18
MUSCLE ICCS - NTUA WP6 E-teams: Example: Face Interpretation Using AAM original video shape track superimposed on original video reconstructed face This is what the visual-only speech recognizer “sees”! Generative models like AAM allow us to evaluate the output of the visual front-end
MUSCLE ICCS - NTUA WP6 E-teams: Joint Image Segmentation and Object Detection via the Expectation Maximization algorithm Generative models ‘compete’ for image observations Segmentation translates into the assignment of image observations into one of K models (image labelling) Segmentation labels are treated like hidden data EM algorithm: Ε-step: use current parameter estimates to assign micro-segments to objects M-step use assignment probabilities to derive optimal model parameters Active Appearance Models used as generative models for the object categories of cars and faces
MUSCLE ICCS - NTUA WP6 E-teams: Top-Down Segmentation Results Thresholding the E-step we get a hard figure-ground segmentation No ‘shape-prior’ knowledge is necessary for the segmentation generative model contains information about shape variation Combination of bottom-up & top-down detection On false alarm locations the object model manages to reconstruct the image appearance only by chance, thereby typically getting a small image support for the object.
Spatio-Temporal Visual Attention I : Video Analysis Create video volume Feature extraction from spatiotemporal data Fusion & saliency generation
MUSCLE ICCS - NTUA WP6 E-teams: Use spatiotemporal VA for efficient global classification of videos Claim: features extracted only from low or high saliency regions are more representative of the input video Foreground/Background segmentation Claim: most salient regions are related to foreground areas of the video Spatio-Temporal Visual Attention II: Classification & segmentation