A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.

Slides:



Advertisements
Similar presentations
Presented by Eroika Jeniffer.  We want to set tasks that form a representative of the population of oral tasks that we expect candidates to be able to.
Advertisements

Brief introduction on Logistic Regression
Experimental Research Designs
VALIDITY.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Chapter 3 Preparing and Evaluating a Research Plan Gay and Airasian
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Why is ASR Hard? Natural speech is continuous
Marketing Research and Information Systems
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Real-Time Odor Classification Through Sequential Bayesian Filtering Javier G. Monroy Javier Gonzalez-Jimenez
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 14.
Chapter 8 Experimental Research
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
T tests comparing two means t tests comparing two means.
Statistical inference. Distribution of the sample mean Take a random sample of n independent observations from a population. Calculate the mean of these.
IAFPA 2007 Plymouth, July 22-25, 2007 Developments in automatic speaker recognition at the BKA Michael Jessen, Bundeskriminalamt Franz Broß, Univ. Applied.
Bug Localization with Machine Learning Techniques Wujie Zheng
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
Interaction Effects and Theory Testing Kaiser et al. (2006) social identity theory –tested hypotheses about attention to prejudice cues in the environment.
Chapter 5: Producing Data “An approximate answer to the right question is worth a good deal more than the exact answer to an approximate question.’ John.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
1 Paired Differences Paired Difference Experiments 1.Rationale for using a paired groups design 2.The paired groups design 3.A problem 4.Two distinct ways.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
INFORMATION NETWORKS DIVISION COMPUTER FORENSICS UNCLASSIFIED 1 DFRWS2002 Language and Gender Author Cohort Analysis of .
© aSup-2007 CHI SQUARE   1 The CHI SQUARE Statistic Tests for Goodness of Fit and Independence.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Variational Bayesian Methods for Audio Indexing
1 Sampling This lecture ties into Terre Blanche chapter 15 Now you have a design, and a protocol plus measures – who do you run it on? Selecting a group.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Campaigns and Data Analytics Why it is important to know YOU.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
URBDP 591 A Lecture 16: Research Validity and Replication Objectives Guidelines for Writing Final Paper Statistical Conclusion Validity Montecarlo Simulation/Randomization.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
Anil Alexander 1, Oscar Forth 1, Marianne Jessen 2 and Michael Jessen 3 1 Oxford Wave Research Ltd, Oxford, United Kingdom 2 Stimmenvergleich, Wiesbaden,
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Research proposal (Lecture 3) Dr.Rehab F Gwada. Objectives of the Lecture The student at the end of this lecture should Know Identify Target Population.
Chapter 11: Test for Comparing Group Means: Part I.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Reza Yazdani Albert Segura José-María Arnau Antonio González
Lecture Nine - Twelve Tests of Significance.
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
Conditional Random Fields for ASR
3.0 Map of Subject Areas.
Martin Rajman, Martin Vesely
Chapter Eight: Quantitative Methods
When we free ourselves of desire,
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Review: What influences confidence intervals?
Decision Making Based on Cohort Scores for
Introduction to Basic Statistical Methodology
Update on “Channel Models for 60 GHz WLAN Systems” Document
What are their purposes? What kinds?
A maximum likelihood estimation and training on the fly approach
Analyzing Reliability and Validity in Outcomes Assessment
Presentation transcript:

A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal Institute of Technology Lausanne Signal Processing Institute Thursday 4th August 2005 Marrakesh, Morocco

Outline Evaluation of the strength of evidence in forensic automatic speaker recognition using a corpus-based Bayesian Interpretation (B.I.) method Mismatched recording conditions of the databases and why they influence the estimation of the strength of evidence Methodology for the creation of a database to handle the mismatch in recording conditions Estimation and compensation of the mismatch using this database Conclusion

Bayesian Interpretation in Forensic Automatic Speaker Recognition Evidence (E): score obtained comparing statistical model of suspect’s voice and a questioned recording (trace) H 0 – The suspected speaker is the source of the trace H 1 - Another speaker is the source of the trace Likelihood Ratio (LR) is the relative probability of observing a particular score of E, with respect to two competing hypotheses

Corpus-based Bayesian Interpretation in FASR Relevant Speakers’ models Feature extraction and modeling Potential population Database (P) Comparative Analysis Comparative Analysis Comparative Analysis Between-sources Variability (H 1 ) Evidence (E) Within-source Variability (H 0 ) Suspected Speakers’ models Feature extraction and modeling Suspected speaker Ref. database(R) Features Feature extraction Suspected Speaker Control Database (C) Features Feature extraction Trace

Likelihood Ratio Between-sources variability (H 1 ) Within-source variability (H o )

Databases required in the B.I. methodology Databases Required Suspect reference database (R) Suspect control database (C) A relevant potential population (P) database For performance evaluation purposes, a trace (T) database is used –Assumptions: –P database similar in recording conditions and linguistic contents to the R database –C database similar in recording conditions and linguistic contents to the questioned recording –Difficulty : Size, availability, and mismatch of databases Often not possible to record the suspect in conditions of the case or to obtain a representative potential population in matched conditions.

Requirements for forensic speaker database Some practical considerations: –Attempt to simulate real world conditions closely –Contain similar-sounding subjects as far as possible. i.e., same language and accent, preferably the same sex. –Technical conditions (recording conditions, transmission channels) –Statistically significant number of speakers –Sufficient duration of recordings to create statistical models of speakers

Mismatched recording conditions of the database What if it is not possible to obtain all the files in the same recording conditions ? –What if the trace and the three databases (P,R,C) are obtained in different recording conditions ? This kind of situation often appears in forensic speaker recognition casework, as –The conditions in which the questioned recording (trace) and the suspected speaker’s recordings (R, C) were obtained are beyond the control of the forensic expert. –A potential population database (P) in the recording conditions of the trace is not available. In this presentation the assumptions made are: –that the conditions of the trace and P are known –the conditions of R and C can be chosen correspondingly

Using databases with mismatched recording conditions FBI NIST 2002 Database : 2 conditions (Microphone -Telephone) The extent of mismatch can be measured using statistical testing [Alexander et al ’04]

The influence of mismatched potential population database E H 1 scores (matched conditions) Pot Pop. H 1 scores (mismatched conditions) Ho scores (matched conditions) The influence of mismatch can be the difference between an LR 1 P,R,C,T in conditions C1,C2,C2,C2

Mismatched databases Sit. No. PCRT 1.C2C2 C2C2 C2C2 C2C2 2.C2C2 C2C2 C2C2 C1C1 3.C2C2 C2C2 C1C1 C2C2 4.C2C2 C1C1 C2C2 C2C2 5.C1C1 C2C2 C2C2 C2C2 6.C2C2 C1C1 C2C2 C1C1 7.C2C2 C1C1 C1C1 C2C2 8.C1C1 C1C1 C1C1 C1C1 Where C 1 represents recording condition 1 (e.g. GSM) C 2 represents recording condition 2 (e.g. PSTN) Corpus-based B.I. and mismatch in different situations

Handling mismatch In order to estimate and compensate for mismatched conditions, we require: –Several databases containing the same speakers in the different recording conditions. Recording several such databases for each case is very expensive. –A methodology to estimate and compensate for the ‘shift’ in scores resulting from a mismatched recording condition.

A prototype forensic speaker recognition database IPSC03 –Institut de Police Scientifique, UNIL and the Signal Processing Institute, EPFL 70 male speakers in three different recording conditions: –Public switched telephone network (PSTN) –global system for mobile communications (GSM) –calling room acoustic recording using digital recorder controlled and uncontrolled speaking modes controlled conditions in a quiet room over 4800 audio files totalling over 40 hours

Features of the IPSC03 database Forensically realistic questioned recordings Statistically significant number of speakers Same telephones for all speakers to minimize the effects of individual handsets Prompted and free questioned recordings in the form of threatening telephone calls Digital recordings at the source and after having passed through a transmission channel.

Database Layout Speaker DigitalPSTNGSMDigital ReferenceTraceControl 9 read traces 2 spontaneous traces 2 read references 1 semi spontaneous references 1 spontaneous 1 read1 dialogue

Method for compensation of mismatch Statistical compensation of mismatch –Use a same set of speakers in two different conditions (C1 and C2) –Compare the test recording with models of the set of speakers in two conditions –Estimate the speaker independent ‘shift’ of scores due to the mismatched conditions Ideally using the whole set of the database speakers Practically, using a subset of this database

The influence of mismatched potential population database E H 1 scores (matched conditions) Pot Pop. H 1 scores (mismatched conditions) Ho scores (matched conditions) Score distribution compensation using distribution scaling: P,R,C,T in conditions C1,C2,C2,C2

C2C2 Experimental Framework The databases chosen are the subsets of the IPSC03 database C1C1 22 speakers in PSTN and GSM 50 speakers in PSTN conditions used as the relevant potential population 50 speakers in GSM conditions used as the mismatched potential population P in PSTN and GSM R, C and T in PSTN Simulated case: the suspect is the source of the questioned recording

Schema of a forensic database to handle mismatch C1C1 C2C2 Ci Ci CNCN Several speakers in one of the frequently observed conditions C i A smaller database of speakers in several recording conditions Shift due to difference in conditions

Compensating for Mismatch We propose to use the following normalization: [Similar to feature mapping technique Reynolds, ICASSP03] We choose a sub-database of 22 speakers in both PSTN and GSM conditions, and use these scores to estimate the shift between conditions

Matched conditions The LR estimated in matched conditions is 354 H1 matched (PSTN-PSTN) H0 matched (PSTN-PSTN) E=12.42

Mismatched conditions The LR estimated in mismatched conditions is 73,678 because of the shift in potential population scores H0 matched (PSTN-PSTN) H1 mismatched (GSM-PSTN) E=12.42

H 1 matched H 1 mismatched H 0 matched H 1 mismatch compensated Compensation for mismatch The LR estimated using statistical compensation (29.34) is more representative of the strength of evidence obtained in matched conditions. E=12.42

Conclusion The strength of evidence (likelihood ratio) depends on the recording conditions of the databases used. In certain cases, it may not be possible to obtain all the databases in the same conditions. A forensic potential population database should include several smaller databases in different conditions containing the same speakers. These databases can be used to estimate and reduce the effects of mismatch due to different recording conditions.

Questions? Thank you for your attention