Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai – India
Speaker Change Detection Automatic segmentation of multispeaker speech data into data of one speaker only Dissimilarity of distributions of the data before and after a speaker change point Proposal: Speaker change detection as a pattern classification problem Patterns extracted from the data around the speaker change points as positive examples Patterns extracted from the data between the speaker change points negative examples
Speaker Change Detection using SVMs SVM trained using the positive and negative examples of speaker change points The SVM to scan the multispeaker data to hypothesize speaker change points Main issues: - Speaker independent detection of the points - Silence regions before speaker change points - Varying durations of speaker turns - Length of the window used for extraction of patterns - Large dimension of segmental pattern vectors - Large number of false alarms
Speaker Change Detection System
Fixed Duration Window based Patterns
Speaker Change Point Hypothesization using Fixed Duration Window based Patterns Input: The continuous speech signal of multispeaker speech data without silence regions The SVM is trained with pattern vectors extracted from the fixed length windows of n frames Sliding window method: A test pattern is extracted for every n frames with one frame shift. The test patterns with positive output of the SVM are hypothesized as speaker change points Several hypotheses may be spurious.
False Alarm Reduction Two methods are considered for reduction of spurious hypotheses (false alarms) 1 st method: A threshold of 5 frames on the duration of speaker turns. 2 nd method: The false hypotheses on validation data are used as the negative examples in training an SVM for false alarm reduction.
Studies on Speaker Change Detection Extended data of NIST2003 speaker recognition evaluation database 2-sp conversations, each of about 5 minute duration including 3 for each of M-M, M-F and F-F speaker conversations Speaker change points are manually marked Data divided into training, validation and test datasets Each dataset includes one each of M-M, M-F and F-F Training dataset for SVM Validation dataset to derive the negative examples for the false alarm reduction SVM Test dataset to evaluate the performance of speaker change detection system
Performance of Speaker Change Detection System # actual speaker change points in test dataset: 282 # frames in the test dataset: about # speaker change points missed (not detected): M # false alarms: FA Window length (in msec) After speaker change hypothesization After smoothing After false alarm reduction MFAM M
Summary Speaker change detection as a pattern classification problem. Fixed duration window method SVMs to hypothesize the speaker change points. Methods for reduction of the number of false alarms. Performance of the proposed method on NIST2003 speaker verification database.
Thank You