Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis Research activities at AUTH related to.

Similar presentations


Presentation on theme: "Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis Research activities at AUTH related to."— Presentation transcript:

1 Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis WP6 e-team: Audiovisual Understanding

2 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Outline Introduction Introduction Dialogue detection concept: cross-correlation of indicator functions Dialogue detection concept: cross-correlation of indicator functions Speaker turn detection based on speech and visual cues (mouth activity) Speaker turn detection based on speech and visual cues (mouth activity) Frontal face detection; facial feature detection (e.g. mouth) Frontal face detection; facial feature detection (e.g. mouth) One-two speaker detection One-two speaker detection Speaker clustering based on speech and visual cues Speaker clustering based on speech and visual cues Fingerprinting Fingerprinting

3 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Indicator functions and their cross- correlation (1) A dialogue between two persons from the movie “Secret Window” [Dialogue 1].

4 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Indicator functions and their cross- correlation (2) A scene without a dialogue between two persons

5 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Speaker Turn Detection  Audio Segmentation aims at finding acoustic events within an audio stream. Speaker turn detection is a special case of speaker segmentation.  Important step in pre-processing of speech in order to implement audio indexing or speaker tracking.  Usually, no prior knowledge about speakers is assumed. Speaker 1Speaker 2

6 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki MODEL BASED SEGMENTATION DISTBIC CONTRAST THE HYPOTHESIS OF NO SPEAKER TURN ( ) AGAINST THE SPEAKER TURN ( ) BIC CRITERION Speaker turn!!!!

7 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Frontal face images at quartet and octet resolution Original Image Quartet Image Octet Image Original Image Quartet Image Octet Image

8 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Face detection based on corners Face detection based on corners The figures show the 3 possible feature point set configurations, having 100 feature points each. They differ at the minimum distance allowed between the feature points. In general, small inter feature point distances yield a feature point concentration and poor face detection. The minimum allowed distance is a parameter of the training procedure. The figures show the 3 possible feature point set configurations, having 100 feature points each. They differ at the minimum distance allowed between the feature points. In general, small inter feature point distances yield a feature point concentration and poor face detection. The minimum allowed distance is a parameter of the training procedure.

9 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Face detection Receiver Operating Characteristic (ROC) curves For the SVM-based face detection, the best results were obtained with the sigmoidal kernel. Best equal error rate 4.5% For the SVM-based face detection, the best results were obtained with the sigmoidal kernel. Best equal error rate 4.5% The maximum likelihood detection commits a few false alarm. For FAR in [5.2%, 5.67%] the FRR drops quickly from 6.1% to 0.7%. The maximum likelihood detection commits a few false alarm. For FAR in [5.2%, 5.67%] the FRR drops quickly from 6.1% to 0.7%.

10 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki One/Two Speaker Detection Two-speaker detection (NIST 2002): Best EER 16.2 % Kajarekar, Adami, Hermansky, 2003 One-speaker detection (NIST 2002): Best EER 7.1 %

11 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Frontal face authentication

12 AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Fingerprinting


Download ppt "Research activities at AUTH related to dialogue detection Ioannis Pitas Constantine Kotropoulos Nikos Nikolaidis Research activities at AUTH related to."

Similar presentations


Ads by Google