University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Speaker Recognition.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration Global Results.
Advertisements

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Chapter 8 Introduction to Number Theory. 2 Contents Prime Numbers Fermats and Eulers Theorems.
Fill in missing numbers or operations
AP STUDY SESSION 2.
1
& dding ubtracting ractions.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 3 Data Transmission.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
UNITED NATIONS Shipment Details Report – January 2006.
David Burdett May 11, 2004 Package Binding for WS CDL.
We need a common denominator to add these fractions.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
CALENDAR.
Year 6 mental test 10 second questions
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Break Time Remaining 10:00.
The basics for simulations
Division- the bus stop method
PP Test Review Sections 6-1 to 6-6
Briana B. Morrison Adapted from William Collins
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Automatic.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition.
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-MST -based.
University of Eastern Finland School of Computing P.O. Box 111 FIN Joensuu FINLAND Tel fax K-means*:
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Demonstration.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax K-means example.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Comparison.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Department.
Artificial Intelligence
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Before Between After.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Subtraction: Adding UP
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
& dding ubtracting ractions.
Essential Cell Biology
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Presentation transcript:

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition University of Joensuu, Department of Computer Science PUMS –seminaari Turku Pasi Fränti, Juhani Saastamoinen, Evgeny Karpov, Ville Hautamäki, Tomi Kinnunen, Ismo Kärkkäinen

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Research Group Pasi Fränti Professor Juhani Saastamoinen Project manager Evgeny Karpov Project researcher Ville Hautamäki Project researcher Tomi Kinnunen Researcher Ismo Kärkkäinen Clustering algorithms PUMS project

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax PUMS & JoY Speaker Recognition PUMS season : –Identification, no verification –Port it in mobile phone –Feature fusion –Real-time

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Application Scenarios Speaker Verification Speaker Identification Speaker Recognition Whose voice is this?Is this Bob’s voice? (Claim) + Verification Imposter! ? Identification

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Identification System Recognition: min. MSE within DB over input speech Signal Processing Speaker Modelling Feature Vectors Speech Audio Add trained speaker profiles Use all profiles in recognition Decision Speaker Profile Database

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax sprofiler Results Fusion Speech features (HY) ProfMatch srlib Real-time SpeakerProfiler Winsprofiler Epocsprofiler console UI Windows Series60 TCL/TK (HY) console UI common speaker recognition app. interface DB

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Planned Results sprofiler Fusion Speech features (HY) ProfMatch srlib Real-time SpeakerProfiler Winsprofiler Epocsprofiler DB Applications Access control Teleconference Large scale database Mobile phone login? Results common speaker recognition app. interface Segmentation VAD common speaker recognition app. interface Verification

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax System in Mobile Phone Port to Symbian OS with Series 60 UI platform

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Symbian Phones Series 60 phone features: –16 MB ROM –8 MB RAM –176 x 208 display –32-bit ARM- processor –No floating-point unit!!! Series 80 Series 60 UIQ

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax FFTGEN Multiplication results must fit in 32 bits: truncate multiplication inputs FFTGEN: Truncate to 16/16 bits (“16/16 FFT”) 32-bit multiplication result FFT layer inputFFT Twiddle FactorX X 16-bit integer FFT layer output (part of it) Crop-off for next layer: 16 bits! 16-bit integer 16 used bits16 crop-off bits

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Proposed Information Preserving “22/10 FFT” Approximate DFT operator F with G Increase ||F-G||, preserve more signal information –minimize maximum relative error in scaled sine values with respect to scale; 980 good for FFT sizes up to 1024 –Truncate multiplication inputs to 22/10 bits (signal/op) 22 used bits 10 crop-off bits 32-bit multiplication result X 32-bit integer, 22 bits used16-bit integer, 10 bits used 32-bit integer FFT layer inputFFT Twiddle FactorX FFT layer output (part of it) Crop-off for next layer: 10 bits

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Scale of Error in Proposed FFT 16/1622/10 Log10 of relative error in FFT elements FFTGEN22/10 FFT average standard deviation

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Mobile Phone Results TIMIT, 100 speakersrecog. rate (%)std. dev. (%) FLOAT100.0N/A FFTGEN FIXED MIXED100.0N/A MIXED implementation, signalrecog. rate (%)std. dev. (%) FLOAT, Symbian audio FLOAT, PC audio100.0N/A FIXED, Symbian audio FIXED, PC audio100.0N/A

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Improving Accuracy by Information Fusion Feature set 1... Feature set 2 Feature set 3 Classifier 1 Classifier 2 Classifier 3 score 1 score 2 score 3 Decision feature vector Score combiner (e.g. 5 MFCCs) (e.g. F0 +  -F0) (e.g. formants F1,F2,F3)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Information Fusion Results Decision- level fusion Score- level fusion Feature- level fusion BASELINE: Best individual Feature set combination MFCC +  MFCC All feature sets FMT +  FMT ARCSIN +  ARCSIN LPCC +  LPCC Fusion succesfull Fusion sucks N/A

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speech input stream Silence detection Feature extraction Pre-quantization Speaker database Speaker 1 model Speaker N model List of candidate speakers Active speakersPruned speakers Frame blocking Decision ? END... Fill buffer with new data All frames Non-silent frames Feature vectors Redused set of vectors Matching v v v v v v v Database pruning v v YesNo Vantage-point tree (VPT) indexing of the code vectors 1. Averaging 2. Random sampling 3. Decimation 4. Clustering (LBG) 1. Static pruning 2. Hierarchical pruning 3. Adaptive pruning 4. Confidence-based pruning Reducing # vectors Speed up NN search Reduce # speakers Real-Time Speaker Identification

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: Baseline System (TIMIT) (Average length of test utterance = 8.9 s) Real-time requirement satisfied 4 x realtime

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: Pre-Quantization (TIMIT) (Codebook size = 64) Averaging performs worst, clustering best About 2:1 speed-up to full search (no pre-quantization) without degradation in the accuracy 9 x realtime

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: Pruning Variants (TIMIT) (Codebook size = 64) 11 x realtime Recommended method : adaptive pruning (AP)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results: PQ, Pruning and PQP (TIMIT) (Codebook size = 64) 33 x realtime Recommended method : Combination of pre- quantization and pruning (PQP)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results : VQ vs. GMM (TIMIT) 13:1 speed-up without degradation 9:1 to 10:1 speed-up without degradation VQGMM Best time : 0.27 s = 33 x error rate 0.32 % Smallest error : s = 28 x realtime Best time : 0.18 s = 49 x error rate 0.16 % Smallest error : s = 49 x realtime (Average length of test utterance = 8.9 s)

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Results : VQ vs. GMM (NIST-1999) VQGMM 13:1 to 16:1 speedup with minor degradation 23:1 to 34:1 speedup with minor degradation Best time : 0.48 s = 63 x error rate % Smallest error : s = 3 x realtime Best time : 0.82 s = 37 x error rate % Smallest error: s = 0.8 x realtime (Average length of test utterance = 30.4 s)