University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Recognition Research in Joensuu Speech and Image Processing Unit (SIPU) Puheteknologian talviseminaari Pasi Fränti Joensuu
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Goals for PUMS season 3 (1/2) 1.Usability of automatic speaker identification in forensic applications 2.Compatibility with large databases 3.Automatization of LTAS + fusion with MFCC. 4.Voice activity detection
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Goals for PUMS season 3 (2/2) 5.Speaker verification in real (noisy) environment 6.Prototype for access control 7.Solving technical requirements for prototype in elevator. 8.Usability for detecting sound sources in general 9.Key word search (using HTK or Lingsoft Recognizer)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Research Group Pasi Fränti Professor Juhani Saastamoinen, PhLic Tomi Kinnunen, PhD (Singapore) Ville Hautamäki, MSc Ismo Kärkkäinen, MSc PUMS personnel Marko Tuononen, BSc Doctoral researchers Collaborators Rosa Gonzalez-Hautamäki, MSc Ilja Sidoroff Victoria Yanulevskaya Evgeny Karpov, MSc (NRC)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Applicability to forensic applications •Automatic speaker recognition study has been done. •Results are not reported but actions taken within tasks 3 and 4. •Material can be found in Kinnunen’s PhD thesis [4] and Niemi-Laitinen’s presentation.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Support for large databases - Not yet done -
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax LTAS and other features •Automatic calculation of LTAS done. Integration to WinSprofiler in progress. Reporting in progress. •Benefit of LTAS is merely its speed and ease of use: no difficult control parameters. •No additional benefit to recognition accuracy. MFCC includes the same information. •Could be used for preliminary pruning in case of large datasets.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Noise robustness of F0 feature Results reported in [3, 5]
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Voice activity detection •Software for speech segmentation (VoiceGrep). •Command line version for Linux. •Windows version in WinSprofiler. •Testing done in SIPU laboratory. –Labtec® pc mic 333, 44,1 kHz –Recordings were emphasized 24 dB by Audacity voice editor
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax a. Test material and results •Material –4 hours in total. –Bad quality recordings: 11 bits data, of which 4-5 informatio, and the rest noise. –VoiceGrep made 168 detections: –56 speech (33%) –112 non-speech (67%) •Material included 71 real speech segments: –Average segment length 16 s. –VoiceGrep found 25 of these (35 %)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax b. VoiceGrep overall results
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax c. VoiceGrep example (Correct detection) Start of the speech is detected correctly End of the speech is missed Play sample #1
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Door openingRunning water WalkingDoor 4d. VoiceGrep example (false detections) Play sample #2Play sample #3
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax e. VoiceGrep example (missed speech segment) Door Speech and walking Door Play sample #4
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax f. Entire data set (4 hours) Speech segments Result of VoiceGrep Data
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker verification in noisy environment •Systematic testing of the effective parameters has been reported in [1]. •Applicability of speaker verification in real environment has been reported in [2] and in Kinnunen’s PhD thesis [5]. •Additional testing will be done if enough time.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax a. Text-dependent verification in access control • Utilizing time series information improves recognition. • Best result if everyone has their own password.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Prototype for access control Microphone Motion detector Emergency button
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Calling elevator (technical requirements) •Communication with OPC-server: –Implemented with Matrikon server. •Program logic to elevator implemented: –Reads variables from OPC-server. –Interprets and shows elevator status. –Includes recording logic. •Speaker and voice related stuff: –Not yet implemented. –Main window does not show anything yet.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Usability for detecting sound sources in general - Not yet done -
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Keyword search - Not yet done -
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Publications (season 3) 1.J. Saastamoinen, Z. Fiedler, T. Kinnunen and P. Fränti, "On factors affecting MFCC-based speaker recognition accuracy", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, , October H. Gupta, V. Hautamäki, T. Kinnunen and P. Fränti, "Field evaluation of text-dependent speaker recognition in an access control application", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, , October T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0 Modeling for Text-Independent Speaker Recognition" Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, , October 2005.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Theses (season 3) Opinnäytetyöt 4.T. Kinnunen, "Optimizing Spectral Feature Based Text Independent Speaker Recognition”, PhD thesis, University of Joensuu, June R. Gonzalez-Hautamäki, "Fundamental Frequency Estimation and Modeling for Speaker Recognition”, MSc thesis, University of Joensuu, July 2005.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Speaker Verification Speaker Identification Speaker Recognition Whose voice is this?Is this Bob’s voice? (Claim) + Verification Imposter! ? Identification Applications scenarios
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 1: Console program
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 2: WinSprofiler
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 3: Symbian Port to Symbian OS with Series 60 UI platform
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 4: Door SProfiler Opening laboratory door by speaking
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Software 5: Lift SProfiler (to appear in season 4 perhaps…)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Future development (1) VAD WinSprofiler Windows (JoY) Mobile Series 60 (JoY) SRLIB: MSE GMM MFCC VQ DB support LTAS F0 extraction fusion by weighted MSE Keyword search Software integration
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Future development (2) Classifier fusion srlib DB Access control Speech analyzer tool Forensic applications Segmentation VAD common speaker recognition app. interface Verification Calling elevator Keyword search Call center Applications
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Future development (3) •Implement and integrate F0, maybe also other formants (F1, F2). •Automatic voiced/unvoiced segmentation. •User enrollment. •Use of sequence information (triplets). •Development of WinSprofiler software to the direction of voice profiler and speech analyzer tool! Technical development
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax OPC server Machine room CAN Ethernet TCP/IP Microphone Display OPC client LiftCaller SRLIB 3.0 Approach detection DCOM Lift car & hardware Our PC GW box Future development (4) Elevator prototype
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 1: Teleconferencing Unkonwn Bob Minna Alice VPN Paul Speaker Recognition Speaker Recognition Speaker Recognition Speaker Recognition Speaker Recognition Alice Bob Minna Unknown Verified & allowed Not registered
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 2: Call-center • Speech is the main tool for people in call-center • Voice login of personell •Removes the need for manual entry
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 3: Language recognition •Related problem to speaker recognition – the same research groups usually study both problems. •Not trivial to solve. •Studied a lot for Asian languages, even for rare languages that do not have any ”written form”.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Vision 4: Medical applications •Doctor use voice to record summary of patient meetings. •Access by keyword search. •Annotation. •Authentication of speaker.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Thank for you patience! Questions?