University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition University of Joensuu, Department of Computer Science PUMS –seminaari Turku Pasi Fränti, Juhani Saastamoinen, Evgeny Karpov, Ville Hautamäki, Tomi Kinnunen, Ismo Kärkkäinen
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Research Group Pasi Fränti Professor Juhani Saastamoinen Project manager Evgeny Karpov Project researcher Ville Hautamäki Project researcher Tomi Kinnunen Researcher Ismo Kärkkäinen Clustering algorithms PUMS project
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax PUMS & JoY Speaker Recognition PUMS season : –Identification, no verification –Port it in mobile phone –Feature fusion –Real-time
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Application Scenarios Speaker Verification Speaker Identification Speaker Recognition Whose voice is this?Is this Bob’s voice? (Claim) + Verification Imposter! ? Identification
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Identification System Recognition: min. MSE within DB over input speech Signal Processing Speaker Modelling Feature Vectors Speech Audio Add trained speaker profiles Use all profiles in recognition Decision Speaker Profile Database
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax sprofiler Results Fusion Speech features (HY) ProfMatch srlib Real-time SpeakerProfiler Winsprofiler Epocsprofiler console UI Windows Series60 TCL/TK (HY) console UI common speaker recognition app. interface DB
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Planned Results sprofiler Fusion Speech features (HY) ProfMatch srlib Real-time SpeakerProfiler Winsprofiler Epocsprofiler DB Applications Access control Teleconference Large scale database Mobile phone login? Results common speaker recognition app. interface Segmentation VAD common speaker recognition app. interface Verification
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax System in Mobile Phone Port to Symbian OS with Series 60 UI platform
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Symbian Phones Series 60 phone features: –16 MB ROM –8 MB RAM –176 x 208 display –32-bit ARM- processor –No floating-point unit!!! Series 80 Series 60 UIQ
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax FFTGEN Multiplication results must fit in 32 bits: truncate multiplication inputs FFTGEN: Truncate to 16/16 bits (“16/16 FFT”) 32-bit multiplication result FFT layer inputFFT Twiddle FactorX X 16-bit integer FFT layer output (part of it) Crop-off for next layer: 16 bits! 16-bit integer 16 used bits16 crop-off bits
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Proposed Information Preserving “22/10 FFT” Approximate DFT operator F with G Increase ||F-G||, preserve more signal information –minimize maximum relative error in scaled sine values with respect to scale; 980 good for FFT sizes up to 1024 –Truncate multiplication inputs to 22/10 bits (signal/op) 22 used bits 10 crop-off bits 32-bit multiplication result X 32-bit integer, 22 bits used16-bit integer, 10 bits used 32-bit integer FFT layer inputFFT Twiddle FactorX FFT layer output (part of it) Crop-off for next layer: 10 bits
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Scale of Error in Proposed FFT 16/1622/10 Log10 of relative error in FFT elements FFTGEN22/10 FFT average standard deviation
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Mobile Phone Results TIMIT, 100 speakersrecog. rate (%)std. dev. (%) FLOAT100.0N/A FFTGEN FIXED MIXED100.0N/A MIXED implementation, signalrecog. rate (%)std. dev. (%) FLOAT, Symbian audio FLOAT, PC audio100.0N/A FIXED, Symbian audio FIXED, PC audio100.0N/A
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Improving Accuracy by Information Fusion Feature set 1... Feature set 2 Feature set 3 Classifier 1 Classifier 2 Classifier 3 score 1 score 2 score 3 Decision feature vector Score combiner (e.g. 5 MFCCs) (e.g. F0 + -F0) (e.g. formants F1,F2,F3)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Information Fusion Results Decision- level fusion Score- level fusion Feature- level fusion BASELINE: Best individual Feature set combination MFCC + MFCC All feature sets FMT + FMT ARCSIN + ARCSIN LPCC + LPCC Fusion succesfull Fusion sucks N/A
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speech input stream Silence detection Feature extraction Pre-quantization Speaker database Speaker 1 model Speaker N model List of candidate speakers Active speakersPruned speakers Frame blocking Decision ? END... Fill buffer with new data All frames Non-silent frames Feature vectors Redused set of vectors Matching v v v v v v v Database pruning v v YesNo Vantage-point tree (VPT) indexing of the code vectors 1. Averaging 2. Random sampling 3. Decimation 4. Clustering (LBG) 1. Static pruning 2. Hierarchical pruning 3. Adaptive pruning 4. Confidence-based pruning Reducing # vectors Speed up NN search Reduce # speakers Real-Time Speaker Identification
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: Baseline System (TIMIT) (Average length of test utterance = 8.9 s) Real-time requirement satisfied 4 x realtime
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: Pre-Quantization (TIMIT) (Codebook size = 64) Averaging performs worst, clustering best About 2:1 speed-up to full search (no pre-quantization) without degradation in the accuracy 9 x realtime
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: Pruning Variants (TIMIT) (Codebook size = 64) 11 x realtime Recommended method : adaptive pruning (AP)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: PQ, Pruning and PQP (TIMIT) (Codebook size = 64) 33 x realtime Recommended method : Combination of pre- quantization and pruning (PQP)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results : VQ vs. GMM (TIMIT) 13:1 speed-up without degradation 9:1 to 10:1 speed-up without degradation VQGMM Best time : 0.27 s = 33 x error rate 0.32 % Smallest error : s = 28 x realtime Best time : 0.18 s = 49 x error rate 0.16 % Smallest error : s = 49 x realtime (Average length of test utterance = 8.9 s)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results : VQ vs. GMM (NIST-1999) VQGMM 13:1 to 16:1 speedup with minor degradation 23:1 to 34:1 speedup with minor degradation Best time : 0.48 s = 63 x error rate % Smallest error : s = 3 x realtime Best time : 0.82 s = 37 x error rate % Smallest error: s = 0.8 x realtime (Average length of test utterance = 30.4 s)