Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

The Extended Cohn-Kanade Dataset(CK+):A complete dataset for action unit and emotion-specified expression Author:Patrick Lucey, Jeffrey F. Cohn, Takeo.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.
Fingerprint Verification Bhushan D Patil PhD Research Scholar Department of Electrical Engineering Indian Institute of Technology, Bombay Powai, Mumbai.
M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Keyword Spotting Using Crosscorrelation Engineering Expo Banquet 2009.
A Comparative Study of Texture Features for the Discrimination of Gastric Polyps in Endoscopic Video A Comparative Study of Texture Features for the Discrimination.
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.
Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts.
UPM, Faculty of Computer Science & IT, A robust automated attendance system using face recognition techniques PhD proposal; May 2009 Gawed Nagi.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
Detection of Target Speakers in Audio Databases Ivan Magrin-Chagnolleau *, Aaron E. Rosenberg **, and S. Parthasarathy ** * Rice University, Houston, Texas.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Separation of Multispeaker Speech Using Excitation Information B.Yegnanarayana, R.Kumara Swamy and S.R.Mahadeva Prasanna Dept of Computer Science and.
Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Eigenedginess vs. Eigenhill, Eigenface and Eigenedge by S. Ramesh, S. Palanivel, Sukhendu Das and B. Yegnanarayana Department of Computer Science and Engineering.
COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013.
Speech Processing Laboratory
VISUAL MONITORING OF RAILROAD GRADE CROSSING AND RAILROAD TRACKS University of Central Florida.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
Modeling speech signals and recognizing a speaker.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM) Dr. Bernard Chen Assistant Professor Department of Computer.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Object Recognition in Images Slides originally created by Bernd Heisele.
Latent SVM 1 st Frame: manually select target Find 6 highest weighted areas in template Area of 16 blocks Train 6 SVMs on those areas Train 1 SVM on entire.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
12/4/981 Automatic Target Recognition with Support Vector Machines Qun Zhao, Jose Principe Computational Neuro-Engineering Laboratory Department of Electrical.
Categorization by Learning and Combing Object Parts B. Heisele, T. Serre, M. Pontil, T. Vetter, T. Poggio. Presented by Manish Jethwa.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
9.913 Pattern Recognition for Vision Class9 - Object Detection and Recognition Bernd Heisele.
Robust Segmentation of Freight Containers in Train Monitoring Videos Qing-Jie Kong*, Avinash Kumar**, Narendra Ahuja**,Yuncai Liu* **Department of Electrical.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Experience Report: System Log Analysis for Anomaly Detection
Presenter: Ibrahim A. Zedan
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
Hybrid Features based Gender Classification
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Categorization by Learning and Combing Object Parts
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Aline Martin ECE738 Project – Spring 2005
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Multimodal Caricatural Mirror
John H.L. Hansen & Taufiq Al Babba Hasan
Speaker Identification:
Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.
Presentation transcript:

Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai – India

Speaker Change Detection Automatic segmentation of multispeaker speech data into data of one speaker only Dissimilarity of distributions of the data before and after a speaker change point Proposal: Speaker change detection as a pattern classification problem Patterns extracted from the data around the speaker change points as positive examples Patterns extracted from the data between the speaker change points negative examples

Speaker Change Detection using SVMs SVM trained using the positive and negative examples of speaker change points The SVM to scan the multispeaker data to hypothesize speaker change points Main issues: - Speaker independent detection of the points - Silence regions before speaker change points - Varying durations of speaker turns - Length of the window used for extraction of patterns - Large dimension of segmental pattern vectors - Large number of false alarms

Speaker Change Detection System

Fixed Duration Window based Patterns

Speaker Change Point Hypothesization using Fixed Duration Window based Patterns Input: The continuous speech signal of multispeaker speech data without silence regions The SVM is trained with pattern vectors extracted from the fixed length windows of n frames Sliding window method: A test pattern is extracted for every n frames with one frame shift. The test patterns with positive output of the SVM are hypothesized as speaker change points Several hypotheses may be spurious.

False Alarm Reduction Two methods are considered for reduction of spurious hypotheses (false alarms) 1 st method: A threshold of 5 frames on the duration of speaker turns. 2 nd method: The false hypotheses on validation data are used as the negative examples in training an SVM for false alarm reduction.

Studies on Speaker Change Detection Extended data of NIST2003 speaker recognition evaluation database 2-sp conversations, each of about 5 minute duration including 3 for each of M-M, M-F and F-F speaker conversations Speaker change points are manually marked Data divided into training, validation and test datasets Each dataset includes one each of M-M, M-F and F-F Training dataset for SVM Validation dataset to derive the negative examples for the false alarm reduction SVM Test dataset to evaluate the performance of speaker change detection system

Performance of Speaker Change Detection System # actual speaker change points in test dataset: 282 # frames in the test dataset: about # speaker change points missed (not detected): M # false alarms: FA Window length (in msec) After speaker change hypothesization After smoothing After false alarm reduction MFAM M

Summary Speaker change detection as a pattern classification problem. Fixed duration window method SVMs to hypothesize the speaker change points. Methods for reduction of the number of false alarms. Performance of the proposed method on NIST2003 speaker verification database.

Thank You