Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.

Slides:



Advertisements
Similar presentations
1 Speech Sounds Introduction to Linguistics for Computational Linguists.
Advertisements

Accessing spoken words: the importance of word onsets
How does first language influence second language rhythm? Laurence White and Sven Mattys Experimental Psychology Bristol University.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
The perception of dialect Julia Fischer-Weppler HS Speaker Characteristics Venice International University
Catia Cucchiarini Quantitative assessment of second language learners’ fluency in read and spontaneous speech Radboud University Nijmegen.
Phonetic variability of the Greek rhotic sound Mary Baltazani University of Ioannina, Greece  Rhotics exhibit considerable phonetic variety cross-linguistically.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
Speech Group INRIA Lorraine
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Exploring Universal Attribute Characterization of Spoken Languages for Spoken Language Recognition.
A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University.
An Elitist Approach to Articulatory-Acoustic Feature Classification in English and in Dutch Steven Greenberg, Shawn Chang and Mirjam Wester International.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Phonetics Linguistics for ELT B Ed TESL 2005 Cohort 2.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Phonology, phonotactics, and suprasegmentals
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Phonetics and Phonology
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
English Linguistics: An Introduction
The relationship between objective properties of speech and perceived pronunciation quality in read and spontaneous speech was examined. Read and spontaneous.
Speech Perception 4/4/00.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak and Tom Ko Hong Kong University of Science and Technology.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Performance Comparison of Speaker and Emotion Recognition
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Yr 7.  Pupils use mathematics as an integral part of classroom activities. They represent their work with objects or pictures and discuss it. They recognise.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Unit One Basic Concepts: Syllables, Stress & Rhythm.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Sentence Durations and Accentedness Judgments
ASR-based corrective feedback on pronunciation: does it really work?
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Research on the Modeling of Chinese Continuous Speech Recognition
Network Training for Continuous Speech Recognition
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
Presentation transcript:

Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch

Contents Introduction Objectives Trajectory clustering: short introduction Speech material Evaluation of trajectory clustering: ASR Phonetic and linguistic analysis –Relationship between trajectory clusters and transcription variants –Relationship between trajectory clusters and linguistic properties Summary

Syllable-length acoustic models are expected to be better suited for modelling long-term spectral and temporal dependencies in speech –No need for precise segmental modelling A large number of factors affect the way syllables are pronounced: –Phonetic context –Position in a multisyllabic word and in a sentence –Lexical stress and accent –Speaking rate –etc. Introduction (1/2)

Introduction (2/2) Because of the diverse sources of pronunciation variation, it may be necessary to create multi-path syllable models to capture variation that makes a difference for ASR performance. Methods to alleviate the data sparsity problem (Sethy & Narayanan, 2003): –Combining syllable models for frequent syllables with triphones covering the less frequent syllables –Bootstrapping the topologies and observation densities of the syllable models using triphones

To study trajectory clustering as a method of building multi-path syllable models. To investigate whether there is a relationship between phonetic/linguistic properties and the results of trajectory clustering. –Such a relationship could be utilised in building or adapting multi-path syllable models. Objectives

Deriving homogeneous clusters of longer-length models directly from the speech signal: –Sound intervals regarded as continuous trajectories along time in observation space –Sound intervals clustered based on the similarity of the trajectories –An individual path created for each cluster –Parallel paths used during recognition Trajectory Clustering (Han et al., 2005)

Female read speech from the Spoken Dutch Corpus Speech Material StatisticTrainingTestDevelopment Word Tokens215,81012,32711,822 Speakers166 Duration20:15:4401:08:5401:06:21

Evaluation of Trajectory Clustering: ASR

Speech Recognition / Method Baseline: Triphone recogniser Experimental recognisers: –Syllable models for 94 most frequent syllables; triphones used to cover the rest of the syllables –The path topologies and observation densities of syllable models bootstrapped using triphones corresponding to canonical syllable transcriptions and trained further using Baum-Welch re-estimation –1-path mixed-model recogniser All tokens of a given syllable used for training the single path –2-path & 3-path mixed-model recognisers Trajectory clustering used to divide the syllable tokens for training the parallel paths

Speech Recognition / Results & Conclusions Recogniser TypeWER (%) Triphone9.2 ± Path Mixed-Model9.4 ± Path Mixed-Model8.7 ± Path Mixed-Model8.7 ± 0.5 Single path not sufficient to capture syllable-level variation 2-path syllable models capture important pronunciation variation and lead to improved recognition performance Undertraining of the 3-path syllable models hindering performance

Phonetic Analysis

To check whether syllable tokens with different phonetic transcriptions go into different clusters: 1.Phonetic distances between the pronunciation variants of each syllable were computed on the basis of articulatory features 2.A multidimensional scaling (MDS) analysis was carried out for 1- or 2- dimensional representations of the phonetic distances between the pronunciation variants 3.The MDS distance representations were compared with the clusters produced by trajectory clustering Phonetic Analysis / Method

Phonetic Analysis / Results VariantCountCluster 1Cluster 2 O757%43% w_O_f3382%18% j_O_f7100%0% Example: syllable /O_f/ 2-dimensional MDS distance representation Proportions of pronunciation variant tokens assigned to clusters

Phonetic Analysis / Conclusions Even though MDS produced phonetically solid distance representations, it appeared that there was no clear correspondence between the clusters of syllable transcription variants produced by the MDS analysis and the clusters produced by trajectory clustering. –Further analysis needed, as the varying numbers of tokens in the different clusters makes the interpretation of the results difficult.

Linguistic Analysis

To check whether syllable tokens with certain linguistic properties go into different clusters, a graphical representation was used to compare the 2-way clusters produced by trajectory clustering with 2-way clusters based on the following linguistic properties: –Duration (long vs. short syllable) –POS (function vs. content word) –Lexical stress (stressed vs. unstressed syllable) –Monosyllabicity (mono-syllabic vs. multisyllabic word) Linguistic Analysis / Method

Linguistic Analysis / Results (2/2) Proportion of SyllablesCorrespondence between Clusters and Linguistic Factors 5%Duration and POS 15%Duration 15%POS 65%None Overall pattern:

Linguistic Analysis / Conclusions There were hardly any syllables showing a systematic connection between the linguistic properties tested and the results of trajectory clustering.

Summary Improved ASR performance suggests that trajectory clustering is an attractive way of building multi-path syllable models There is no straightforward relationship between the acoustically defined clusters and the phonetic/linguistic factors tested in this study.  Designing or adapting multi-path syllable models based on such properties seems very difficult.

Questions?

Linguistic Analysis / Results (1/2) Example syllables: /t_ei_t/, /z_o/, and /h_a_r/