The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall,

Slides:



Advertisements
Similar presentations
IELTS and the Academic Reading Construct Tony Green Cyril Weir Centre for Research in English Language Learning and Assessment The researchers would like.
Advertisements

Plasticity, exemplars, and the perceptual equivalence of ‘defective’ and non-defective /r/ realisations Rachael-Anne Knight & Mark J. Jones.
Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
Voice quality variation with fundamental frequency in English and Mandarin.
1.0 Introduction Traditional View of phonetic laryngeal contrasts (/t/~/d/, VOICING): F0 drop, F1 drop, pulsing in the gap, CV Ratio, etc. (Kingston et.
Speech Productions of French- English Bilingual Speakers in Western Canada Nicole Netelenbos Fangfang Li.
Synthesizing naturally produced tokens Melissa Baese-Berk SoundLab 12 April 2009.
Effects of Competence, Exposure, and Linguistic Backgrounds on Accurate Production of English Pure Vowels by Native Japanese and Mandarin Speakers Malcolm.
ANOVA & Pairwise Comparisons
T - voicing in American English A comparison of initial d and voiced t.
Hillenbrand: Vowels1 The Acoustics and Perception of American English Vowels.
Does radical type frequency reliably affect character recognition? Zih-Nian, Cong & Jei-Tun, Wu Department of Psychology, National Taiwan University, Taipei,
Interlanguage Production of English Stop Consonants: A VOT Analysis Author: Liao Shu-jong Presenter: Shu-ling Hung (Sherry) Advisor: Raung-fu Chung Date:
1 The Effect of Pitch Span on the Alignment of Intonational Peaks and Plateaux Rachael-Anne Knight University of Cambridge.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
Using Creaky Voice Index in Forensic Phonetics – Is it valid and is it reliable? ____________________________ Tuija Niemi-Laitinen Forensic Scientist/Technical.
Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge
Accenting, Givenness, and Syntactic Role By E.G. Bard and M.P. Aylett Presented by David Vespe.
Introduction to experimental errors
1 Validation and Verification of Simulation Models.
Praat Fadi Biadsy.
From Controlled to Natural Settings
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Chapter 4 Introduction to Database Development. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Context for database.
Correlates of 2D:4D Ratios and Fluctuating Asymmetry Patrick J. Cooper The Pennsylvania State University, Altoona INTRODUCTION The 2D:4D ratio refers to.
Source/Filter Theory and Vowels February 4, 2010.
Introduction To know how perceptual and attentional processes and properties of words guide the eyes through a sentence, the following issues are particularly.
Senior Project – Electrical Engineering Tool for Improving Non-Native French Speech Pronunciation Joseph Ciaburri Advisor – Professor Catravas,
English versus French: Determinants of eye movement control in reading Sébastien Miellet, Cyril Pernet, Patrick J. O’Donnell, and Sara C. Sereno Department.
Present Experiment Introduction Coarticulatory Timing and Lexical Effects on Vowel Nasalization in English: an Aerodynamic Study Jason Bishop University.
Study of Word-Level Accent Classification and Gender Factors
Segmental factors in language proficiency: Velarization degree as a signature of pronunciation talent Henrike Baumotte and Grzegorz Dogil {henrike.baumotte,
©2010 John Wiley and Sons Chapter 11 Research Methods in Human-Computer Interaction Chapter 11- Analyzing Qualitative.
Vowels, part 4 March 19, 2014 Just So You Know Today: Source-Filter Theory For Friday: vowel transcription! Turkish, British English and New Zealand.
Estimate of Swimming Energy Expenditure Utilizing an Omnidirectional Accelerometer and Swim Performance Measures Jeanne D. Johnston and Joel M. Stager,
Chapter 8 Experimental Design: Dependent Groups and Mixed Groups Designs.
An investigation of postvocalic /r/ in Glaswegian adolescents Jane Stuart-Smith and Robert Lawson Department of English Language, University of Glasgow.
CSD 5100 Introduction to Research Methods in CSD Observation and Data Collection in CSD Research Strategies Measurement Issues.
Human Computer Interaction
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
This chapter is extracted from Sommerville’s slides. Text book chapter
1 Methods for detection of hidden changes in the EEG H. Hinrikus*, M.Bachmann*, J.Kalda**, M.Säkki**, J.Lass*, R.Tomson* *Biomedical Engineering Center.
Chapter 4 Introduction to Database Development. Outline Context for database development Goals of database development Phases of database development.
Professor Chung Raung-Fu Student: Wang Yi-wen M98C0102.
Why is Research Important?. Basic Research Pure science or research Research for the sake of finding new information and expanding the knowledge base.
Dissociating Semantic and Phonological Processing in the Left Inferior Frontal Gyrus PM Gough, AC Nobre, JT Devlin* Dept. of Experimental Psychology, Uni.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
Using Ultrasound Technology to Improve Tense/Lax Distinctions in ESL Learners Bryan Meadows, Gwanhi Yun, Diana Archangeli, Jeff Mielke, and Beth Lukes.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
2.3 Markedness Differential Hypothesis (MDH)
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Tone sandhi and tonal coarticulation in Fuzhou Min Yang Li 李杨 Phonetics Laboratory, DTAL University of Cambridge 1.
A PRODUCTION STUDY ON PHONOLOGIZATION OF /U/-FRONTING IN ALVEOLAR CONTEXT Reiko Kataoka 10 January 2009 LSA annual meeting.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Anil Alexander 1, Oscar Forth 1, Marianne Jessen 2 and Michael Jessen 3 1 Oxford Wave Research Ltd, Oxford, United Kingdom 2 Stimmenvergleich, Wiesbaden,
/u/-fronting in RP: a link between sound change and diminished perceptual compensation for coarticulation? Jonathan Harrington, Felicitas Kleber, Ulrich.
Introduction Method Experiment 2 In spoken word recognition, phonological and indexical properties (i.e., characteristics of the speaker’s voice) of a.
Methodological Issues Themes in Psychology. Snapshot Study Snapshot study: takes place at just one point in time, potentially with one participant for.
Aim To test Cherry’s findings on attention ‘more rigorously’. Sample
An Introduction to : a closer look at analysing vowels
Lab Roles and Lab Report
University of Silesia Acoustic cues for studying dental fricatives in foreign-language speech Arkadiusz Rojczyk Institute of English, University of Silesia.
From Controlled to Natural Settings
Scientific Method 1. State the Problem 2. Research
Essential Question: How do scientists develop explanations?
Analyzing F0 and vowel formants of Persian based on long-term features
Within-speaker variability in long-term F0
CLASSROOM ENVIRONMENT AND THE STRATIFICATION OF SENIOR HIGH SCHOOL STUDENT’S MATHEMATICS ABILITY PERCEPTIONS Nelda a. nacion 5th international scholars’
Presentation transcript:

The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall, Gea de Jong, Linda Shockey

Introduction Formant measurement implicitly required legally in the UK in speaker comparison cases Measurements on analogue spectrograms had to be by hand and eye Measurements on digital spectrograms can be assisted by formant trackers, LPC is common

Introduction How replicable are measurements by eye on digital spectrograms?

Introduction How replicable are measurement by eye on digital spectrograms? If LPC tracking is used what can lead to variability?

Introduction How replicable are measurement by eye on digital spectrograms? If LPC tracking is used what can lead to variability? −Software settings

Introduction How replicable are measurement by eye on digital spectrograms? If LPC tracking is used what can lead to variability? −Software settings −Point at which data is extracted

Study Aims What is required in order to make measurements more replicable?

Study Aims What is required in order to make measurements more replicable? If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements?

Study Aims What is required in order to make measurements more replicable? If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements? If method of analysis is the same does this lead to statistically improved reliability between laboratories?

Aims continued We are aiming to find a reliable means of obtaining formant values We are examining reliability, not validity

Data read speech from Cambridge DyViS database male Standard Southern British English aged speakers:Set 1 (20 speakers) Set 2 (20 speakers)

Data 6 monophthongs: / i ː, æ, ɑː, ɔː, ʊ, u ː / 6 repetitions per vowel per speaker elicited in hVd contexts in sentences: It’s a warning we’d better HEED today. It’s only one loaf, but it’s all Peter HAD today. We worked rather HARD today. We built up quite a HOARD today. He insisted on wearing a HOOD today. He hates contracting words, but he said a WHO’D today.

Measurements Analysts from 3 labs – Cambridge, Plymouth, Reading Task: to measure F1, F2, F3 for each vowel token using Praat Set 1 – using individual – but constrained- methods Set 2 – after a meeting at which a single method is agreed

Set 1 Methods Measure the formants at a relatively early point in the vowel

Set 1 Methods Measure the formants at a relatively early point in the vowel Measure formants over no more than 5 glottal pulses

Set 1 Methods Measure the formants at a relatively early point in the vowel Measure formants over no more than 5 glottal pulses Use either: −LPC tracking checked against the spectrogram or

Set 1 Methods Measure the formants at a relatively early point in the vowel Measure formants over no more than 5 glottal pulses Use either: −LPC tracking checked against the spectrogram or −hand/eye measures

Set 2 Method Measure towards the start of the vowel

Set 2 Method Measure towards the start of the vowel Measure in a relatively steady early part of the vowel

Set 2 Method Measure towards the start of the vowel Measure in a relatively steady early part of the vowel Measure around the vowel's maximum intensity

Set 2 Method Measure towards the start of the vowel Measure in a relatively steady early part of the vowel Measure around the vowel's maximum intensity Use a single time slice

Set 2 Method (continued) Use the LPC formant tracker adjusted for best visual fit

Set 2 Method (continued) Use the LPC formant tracker adjusted for best visual fit When values generated by Praat are judged by visual inspection to be incorrect, replace them by correct values from a time-slice immediately preceding or following the slice being measured.

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1 Set 2

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1 Set 2

Statistical Analysis 3 formants  6 vowels  2 datasets = 36 tests Two-way ANOVA - repeated measures on the factor Lab (3) - between-groups factor Speaker (20) If Lab signficant at p < 0.05: Pairwise comparisons with Sidak correction

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1 Set 2

Results: HAD, F1 Lab1 Lab2 Lab3 Lab: significant Set 1 Set 2

Results: HAD, F1 Lab1 Lab2 Lab3 Lab: significant Set 1 Set 2

Results: HAD, F1 Lab1 Lab2 Lab Set 1 Set 2 Lab: significant Lab: significant but pairwise comparisons NS

Results: HAD, F1 Lab1 Lab2 Lab3 Lab: significant Set 1 Set 2 NS Lab: significant but pairwise comparisons NS

Results: HAD, F2

Lab1 Lab2 Lab3 Set 1 Set 2 NS Lab: not significant NS

Results: HAD, F3

Lab1 Lab2 Lab3 Set 1 Set 2 Lab: significant Lab: not significant NS NS

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 main effect

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 pairwise comparisons

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 improvement

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 improvement

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 Set 2: good news

Effect of Lab - 6 vowels Set 1 F1F2F3 heedsigNSsig hadsigNSsig hardsig hoardsig who’dsig NS hoodsig

Effect of Lab - 6 vowels Set 1 Set 2 F1F2F3F1F2F3 heedsigNSsig NSsig hadsigNSsig NS hardsig NS sig hoardsig NS who’dsig NSsig hoodsig NS sig NS

Influence of Speaker Interaction Lab x Speaker significant (p < 0.05) for F1-F3 of all 6 vowels for both Set 1 and Set 2  certain speakers lead to measurement differences among labs for example…

F3 of HARD (Set 2) means by speaker

Agreement across labs in most cases, but certain individuals lead to measurement differences among labs

F3 of HARD (Set 2) means by speaker Agreement across labs in most cases, but certain individuals lead to measurement differences among labs

Subject 42 HARD6 F3 = 3325 Hz Subject 42 HARD4 F3 = 2219Hz Subject 42 HARD2 F3 = 2579Hz Difficult cases: subject 42 F3

Difficult cases: subject 43 F3 Subject 43 HARD2 F3? Subject 43 HARD1 F3? Visual inspection Visual inspection vs formant tracker Visual inspection

Subject 43 HARD2 F3? Subject 43 HARD1 F3? Visual inspection Tracker

The effect of intraspeaker variability, possibly voice quality This can affect: −The visibility of formants −The functioning of the LPC tracker for example…

The effect of intraspeaker variability Subject 37: HAD1 F1=??Subject 37: HAD6 F1..had today.

Discussion: Laboratory Effects Do different laboratories produce different formant values?

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES

Discussion: Laboratory Effects Do different laboratories produce different values formant values? YES Does replicating the measurement method reduce these differences?

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES Does replicating the measurement method reduce these differences? YES

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES Does replicating the measurement method reduce these differences? YES Could these be reduced further?

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES Does replicating the measurement method reduce these differences? YES Could these be reduced further? YES

Other sources of variability Settings (e.g. No. of poles; No of Formants in Praat)

Other sources of variability Settings The exact point in the vowel at which the measure is taken

Other sources of variability Settings The exact point in the vowel at which the measure is taken The ‘readability’ of the spectrogram which can be affected by speaker characteristics

Conclusion Developing standard ways of collecting formant values could assist comparisons between experts in case work If records are kept relating to time points, software and settings then the measurement process can be replicated

Acknowledgements IAFPA Research Grant for travel expenses Economic and Social Research Council UK for funding the DyViS Project ‘Dynamic Variability in Speech: A Forensic Phonetic Study of British English’ [RES ] Other members of the DyViS project – Francis Nolan and Toby Hudson