VQ speaker verification with sentence codebook Filipe Moreira, Carlos Espain CEFAT / DEEC / FEUP / Universidade do Porto.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Face Recognition and Biometric Systems Eigenfaces (2)
Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Confidence Measures for Speech Recognition Reza Sadraei.
The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.
Halliday/Resnick/Walker Fundamentals of Physics 8th edition
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Chapter 2: Pattern Recognition
Speaker Adaptation for Vowel Classification
דורון דמרי מקסים גורביץ מנחה : דמיטרי פורמן Cached Vector Quantization.
VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.
VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros,
Distributed Representations of Sentences and Documents
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
1 Less is More? Yi Wu Advisor: Alex Rudnicky. 2 People: There is no data like more data!
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Fast vector quantization image coding by mean value predictive algorithm Authors: Yung-Gi Wu, Kuo-Lun Fan Source: Journal of Electronic Imaging 13(2),
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Effect of Mutual Coupling on the Performance of Uniformly and Non-
: Chapter 10: Image Recognition 1 Montri Karnjanadecha ac.th/~montri Image Processing.
1 Music Classification Using Significant Repeating Patterns Chang-Rong Lin, Ning-Han Liu, Yi-Hung Wu, Arbee L.P. Chen, Proc. of 9th International Conference,
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
More About Significance Tests
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Analysis of Variance ( ANOVA )
Presented by Tienwei Tsai July, 2005
Jeopardy Chi-Squared Confidence Intervals Hypothesis Testing Vocabulary Formulas Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Keystroke Biometric System Client: Dr. Mary Villani Instructor: Dr. Charles Tappert Team 4 Members: Michael Wuench ; Mingfei Bi ; Evelin Urbaez ; Shaji.
User Authentication Using Keystroke Dynamics Jeff Hieb & Kunal Pharas ECE 614 Spring 2005 University of Louisville.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Quiz Preparations1 QUIZ PREPARATION Prepare for Quiz and start thinking about the Final Project.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
1 Information Hiding Based on Search Order Coding for VQ Indices Source: Pattern Recognition Letters, Vol.25, 2004, pp.1253 – 1261 Authors: Chin-Chen Chang,
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Statistics for Decision Making Basic Inference QM Fall 2003 Instructor: John Seydel, Ph.D.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
GROUP LEARNING TEACHING and ASSESSMENT Give me a fish I eat for a DayTeach me to fishI eat for a life time.
Jeopardy Vocabulary Formulas Q $100 Q $100 Q $100 Q $100 Q $100 Q $200
ARTIFICIAL NEURAL NETWORKS
A gender sensitive organization should meet the following criteria in its practices: First, the balance of women and men on the staff should include a.
i) Two way ANOVA without replication
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Pattern Recognition PhD Course.
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
NORPIE 2004 Trondheim, 14 June Automatic bearing fault classification combining statistical classification and fuzzy logic Tuomo Lindh Jero Ahola Petr.
Hypothesis Testing and Confidence Intervals
Pattern Recognition and Training
Pattern Recognition and Training
Presentation transcript:

VQ speaker verification with sentence codebook Filipe Moreira, Carlos Espain CEFAT / DEEC / FEUP / Universidade do Porto

VQ speaker verification with sentence codebook ABSTRACT Experiments on text independent speaker verification (with speakers of the same sex) using vector quantisation are reported in this paper. Several situations were considered using different features sets, training times and testing utterances durations, both with speaker independent and speaker dependent thresholds. The amount and location of silence in the testing sentences is usually a problem in text-independent speaker verification systems. We proposed and tested the use of string codebooks to attenuate the problem. A codebook is extracted from the utterance and it is this codebook that is quantised again into the claimer's codebook. It is also reported an additional experiment combining different techniques of speaker verification.

VQ speaker verification with sentence codebook TRAINING CONDITIONS 55 men / 55 women 2 kinds of sentences used (3 and 5-digits strings), one for each size of codebook (average of 9.6 s for the first one and 25.2 s for the second one) 2 different sets of features per codebook

VQ speaker verification with sentence codebook TESTING CONDITIONS 55 men / 55 women 2 kinds of sentences used (2 and 7-digits strings) of different durations (1.3 s for the first one and 3.2 s for the second one) 2 different sets of features per codebook 2 kinds of thresholds (speaker dependent - SDT - and speaker independent - SIT) 1210 sessions for false rejection (FR) and 5490 for false acceptance (FA)

VQ speaker verification with sentence codebook RESULTS Cbk 1Cbk 2 CCs+  CCs CCs CCs+  CCs CCs SIT SDT Table 1 EER (%) for normalised distance quantisation using different features sets and thresholds Cbk 1Cbk 2 CCs+  CCs CCs CCs+  CCs CCs SIT SDT Table 2 EER (%) for the string codebook quantisation using different features sets and thresholds Cbk 1Cbk 2 CCs+  CCs CCs CCs+  CCs CCs SIT SDT Table 3 EER (%) for normalised distance quantisation using different features sets and thresholds Cbk 1Cbk 2 CCs+  CCs CCs CCs+  CCs CCs SIT SDT Table 4 EER (%) for the string codebook quantisation using different features sets and thresholds Short sentencesLong sentences

VQ speaker verification with sentence codebook COMBINATION OF TECHNIQUES Several techniques of combining the results of different statistically independent classifiers are known [2, 4]. We applied four combination techniques to the two methods described above: the quantisation of the frames into the claimer’s codebook (S1) and the quantisation of the string codebook into the claimer’s codebook (S2).

VQ speaker verification with sentence codebook COMBINATION OF TECHNIQUES In the first technique, called Maximum (MAX), the maximum of the quantisation errors given by methods S1 and S2, for each sentence, was taken as the final output. In the second technique, called Minimum (MIN), the minimum of the quantisation errors given by methods S1 and, for each sentence, was instead considered as the final output. In the third technique, named Product (PROD), the final output is the product of the quantisation errors. Finally, in the fourth technique, named Sum (SUM), the final output is the average of the errors.

VQ speaker verification with sentence codebook COMBINATION OF TECHNIQUES - RESULTS Fig. 3. EER(%) for the several methods using the CCs plus  CCs features set Fig. 4. EER(%) for the several methods using the CCs features set Fig. 1. EER(%) for the several methods using the CCs plus  CCs features set Fig. 2. EER(%) for the several methods using the CCs features set Long sentencesShort sentences

VQ speaker verification with sentence codebook CONCLUSIONS The use of codebooks generated from the strings to be verified proved to be a much better technique than the simple use of a direct quantisation of the utterance. Concerning the combination of techniques the results are according to the expectations, giving a significant improvement in the overall performance of the system. The system with Cbk 2, 25.2 s. of training time, CCs plus  CCs, using the combinations PROD and SUM between S1 and S2, gave an EER of 0.6% with testing utterances of 3.2 s.