ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer.

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
An Overview of Machine Learning
Segmentation In The Field Medicine Advanced Image Processing course By: Ibrahim Jubran Presented To: Prof. Hagit Hel-Or.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Automatic Identification of Bacterial Types using Statistical Image Modeling Sigal Trattner, Dr. Hayit Greenspan, Prof. Shimon Abboud Department of Biomedical.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
AAM based Face Tracking with Temporal Matching and Face Segmentation Dalong Du.
SecurePhone Workshop - 24/25 June Speaking Faces Verification Kevin McTait Raphaël Blouet Gérard Chollet Silvia Colón Guido Aversano.
Computer and Robot Vision I
Broadcast News Parsing Using Visual Cues: A Robust Face Detection Approach Yannis Avrithis, Nicolas Tsapatsoulis and Stefanos Kollias Image, Video & Multimedia.
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: Computer Vision, Speech Communication and Signal Processing Research.
Real-Time Audio-Visual Automatic Speech Recognition Demonstrator TSI-TUC, Greece (A. Potamianos, E. Sanchez-Soto, M. Perakakis) NTUA, Greece (P. Maragos,
ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: Computer Vision, Speech Communication and.
A Study of Approaches for Object Recognition
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
MUSCLE- Network of Excellence Movie Summarization and Skimming Demonstrator ICCS-NTUA (P. Maragos, K. Rapantzikos, G. Evangelopoulos, I. Avrithis) AUTH.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
A Probabilistic Framework for Video Representation Arnaldo Mayer, Hayit Greenspan Dept. of Biomedical Engineering Faculty of Engineering Tel-Aviv University,
Artificial Intelligence & Information Analysis Group (AIIA) Centre of Research and Technology Hellas INFORMATICS & TELEMATICS INSTITUTE.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
New work-package WP5: Multimodal Processing and Interaction MUSCLE JPA3 Leaders: Petros Maragos, ICCS-NTUA Alexandros Potamianos, TSI-TUC.
Non-invasive Techniques for Human Fatigue Monitoring Qiang Ji Dept. of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Multimodal Interaction Dr. Mike Spann
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
NATIONAL TECHNICAL UNIVERSITY OF ATHENS Image, Video And Multimedia Systems Laboratory Background
CSCE 5013 Computer Vision Fall 2011 Prof. John Gauch
Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.
Time state Athanassios Katsamanis, George Papandreou, Petros Maragos School of E.C.E., National Technical University of Athens, Athens 15773, Greece Audiovisual-to-articulatory.
Multimodal Information Analysis for Emotion Recognition
The University of Texas at Austin Vision-Based Pedestrian Detection for Driving Assistance Marco Perez.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto.
Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Image Classification for Automatic Annotation
Christopher M. Bishop Object Recognition: A Statistical Learning Perspective Microsoft Research, Cambridge Sicily, 2003.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Motion Estimation using Markov Random Fields Hrvoje Bogunović Image Processing Group Faculty of Electrical Engineering and Computing University of Zagreb.
Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
 Mentor : Prof. Amitabha Mukerjee Learning to Detect Salient Objects Team Members - Avinash Koyya Diwakar Chauhan.
Meeting 8: Features for Object Classification Ullman et al.
Detecting Moving Objects, Ghosts, and Shadows in Video Streams
Main research interest
Object Recognition by Parts
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Journal of Vision. 2009;9(3):5. doi: /9.3.5 Figure Legend:
Object Recognition by Parts
Object Recognition by Parts
Object Recognition by Parts
Paper Reading Dalong Du April.08, 2011.
Object Recognition by Parts
Object Recognition with Interest Operators
Presentation transcript:

ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL:

MUSCLE ICCS - NTUA WP6 E-teams: ICCS-NTUA: E-team Researchers & Directions Researchers: P. Maragos, S. Kollias (Faculty members) G. Papandreou, K. Rapantzikos, G. Evangelopoulos, A. Katsamanis, I. Kokkinos (PhD GRA) G. Stamou, I. Avrithis (Post-Doc) (WP6) E-team 1: Audio-Visual (AV) Speech Analysis & Recognition Face Detection, Modeling & Tracking AV Feature Extraction, Fusion, Dynamic Models for AV-ASR AV to Articulatory Speech Inversion (WP6) E-team 2: Audio-Visual Understanding Audio-Visual Salient Event Detection, Integrated Multimedia Content Analysis

MUSCLE ICCS - NTUA WP6 E-teams: AV-ASR Front-End Speech Feature Transform./ Selection Modulations – Energy Multiband Filtering Nonlinear Processing Demodulation VAD Dynamics - Fractals Embedding Geometrical Filtering Fractal Dimensions Speaker Normalization M-Array Processing Visual Active Appearance Model Face Detection/Tracking Mouth R.O.I. Features Fusion Feature Stream MFCC

MUSCLE ICCS - NTUA WP6 E-teams: Audiovisual ASR: Face Modeling ● A well studied problem in Computer Vision: ● Active Appearance Models, Morphable Models, Active Blobs ● Both Shape & Appearance can enhance lipreading ● The shape and appearance of human faces “live” in low dimensional manifolds = =

MUSCLE ICCS - NTUA WP6 E-teams: Image Fitting Example step 2step 6step 10 step 14step 18

MUSCLE ICCS - NTUA WP6 E-teams: Example: Face Interpretation Using AAM original video shape track superimposed on original video reconstructed face This is what the visual-only speech recognizer “sees”! Generative models like AAM allow us to evaluate the output of the visual front-end

MUSCLE ICCS - NTUA WP6 E-teams: Joint Image Segmentation and Object Detection via the Expectation Maximization algorithm Generative models ‘compete’ for image observations Segmentation translates into the assignment of image observations into one of K models (image labelling) Segmentation labels are treated like hidden data EM algorithm: Ε-step: use current parameter estimates to assign micro-segments to objects M-step use assignment probabilities to derive optimal model parameters Active Appearance Models used as generative models for the object categories of cars and faces

MUSCLE ICCS - NTUA WP6 E-teams: Top-Down Segmentation Results Thresholding the E-step we get a hard figure-ground segmentation No ‘shape-prior’ knowledge is necessary for the segmentation  generative model contains information about shape variation  Combination of bottom-up & top-down detection On false alarm locations the object model manages to reconstruct the image appearance only by chance, thereby typically getting a small image support for the object.

Spatio-Temporal Visual Attention I : Video Analysis  Create video volume  Feature extraction from spatiotemporal data  Fusion & saliency generation

MUSCLE ICCS - NTUA WP6 E-teams:  Use spatiotemporal VA for efficient global classification of videos  Claim: features extracted only from low or high saliency regions are more representative of the input video  Foreground/Background segmentation  Claim: most salient regions are related to foreground areas of the video Spatio-Temporal Visual Attention II: Classification & segmentation