Presentation is loading. Please wait.

Presentation is loading. Please wait.

Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.

Similar presentations


Presentation on theme: "Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch."— Presentation transcript:

1 Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch

2 Contents Introduction Objectives Trajectory clustering: short introduction Speech material Evaluation of trajectory clustering: ASR Phonetic and linguistic analysis –Relationship between trajectory clusters and transcription variants –Relationship between trajectory clusters and linguistic properties Summary

3 Syllable-length acoustic models are expected to be better suited for modelling long-term spectral and temporal dependencies in speech –No need for precise segmental modelling A large number of factors affect the way syllables are pronounced: –Phonetic context –Position in a multisyllabic word and in a sentence –Lexical stress and accent –Speaking rate –etc. Introduction (1/2)

4 Introduction (2/2) Because of the diverse sources of pronunciation variation, it may be necessary to create multi-path syllable models to capture variation that makes a difference for ASR performance. Methods to alleviate the data sparsity problem (Sethy & Narayanan, 2003): –Combining syllable models for frequent syllables with triphones covering the less frequent syllables –Bootstrapping the topologies and observation densities of the syllable models using triphones

5 To study trajectory clustering as a method of building multi-path syllable models. To investigate whether there is a relationship between phonetic/linguistic properties and the results of trajectory clustering. –Such a relationship could be utilised in building or adapting multi-path syllable models. Objectives

6 Deriving homogeneous clusters of longer-length models directly from the speech signal: –Sound intervals regarded as continuous trajectories along time in observation space –Sound intervals clustered based on the similarity of the trajectories –An individual path created for each cluster –Parallel paths used during recognition Trajectory Clustering (Han et al., 2005)

7 Female read speech from the Spoken Dutch Corpus Speech Material StatisticTrainingTestDevelopment Word Tokens215,81012,32711,822 Speakers166 Duration20:15:4401:08:5401:06:21

8 Evaluation of Trajectory Clustering: ASR

9 Speech Recognition / Method Baseline: Triphone recogniser Experimental recognisers: –Syllable models for 94 most frequent syllables; triphones used to cover the rest of the syllables –The path topologies and observation densities of syllable models bootstrapped using triphones corresponding to canonical syllable transcriptions and trained further using Baum-Welch re-estimation –1-path mixed-model recogniser All tokens of a given syllable used for training the single path –2-path & 3-path mixed-model recognisers Trajectory clustering used to divide the syllable tokens for training the parallel paths

10 Speech Recognition / Results & Conclusions Recogniser TypeWER (%) Triphone9.2 ± 0.5 1-Path Mixed-Model9.4 ± 0.5 2-Path Mixed-Model8.7 ± 0.5 3-Path Mixed-Model8.7 ± 0.5 Single path not sufficient to capture syllable-level variation 2-path syllable models capture important pronunciation variation and lead to improved recognition performance Undertraining of the 3-path syllable models hindering performance

11 Phonetic Analysis

12 To check whether syllable tokens with different phonetic transcriptions go into different clusters: 1.Phonetic distances between the pronunciation variants of each syllable were computed on the basis of articulatory features 2.A multidimensional scaling (MDS) analysis was carried out for 1- or 2- dimensional representations of the phonetic distances between the pronunciation variants 3.The MDS distance representations were compared with the clusters produced by trajectory clustering Phonetic Analysis / Method

13 Phonetic Analysis / Results VariantCountCluster 1Cluster 2 O757%43% O_v13551%49% O_f65552%48% @_v2882%18% @_f2383%17% w_O_f3382%18% j_O_f7100%0% Example: syllable /O_f/ 2-dimensional MDS distance representation Proportions of pronunciation variant tokens assigned to clusters

14 Phonetic Analysis / Conclusions Even though MDS produced phonetically solid distance representations, it appeared that there was no clear correspondence between the clusters of syllable transcription variants produced by the MDS analysis and the clusters produced by trajectory clustering. –Further analysis needed, as the varying numbers of tokens in the different clusters makes the interpretation of the results difficult.

15 Linguistic Analysis

16 To check whether syllable tokens with certain linguistic properties go into different clusters, a graphical representation was used to compare the 2-way clusters produced by trajectory clustering with 2-way clusters based on the following linguistic properties: –Duration (long vs. short syllable) –POS (function vs. content word) –Lexical stress (stressed vs. unstressed syllable) –Monosyllabicity (mono-syllabic vs. multisyllabic word) Linguistic Analysis / Method

17 Linguistic Analysis / Results (2/2) Proportion of SyllablesCorrespondence between Clusters and Linguistic Factors 5%Duration and POS 15%Duration 15%POS 65%None Overall pattern:

18 Linguistic Analysis / Conclusions There were hardly any syllables showing a systematic connection between the linguistic properties tested and the results of trajectory clustering.

19 Summary Improved ASR performance suggests that trajectory clustering is an attractive way of building multi-path syllable models There is no straightforward relationship between the acoustically defined clusters and the phonetic/linguistic factors tested in this study.  Designing or adapting multi-path syllable models based on such properties seems very difficult.

20 Questions?

21 Linguistic Analysis / Results (1/2) Example syllables: /t_ei_t/, /z_o/, /l_@/ and /h_a_r/


Download ppt "Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch."

Similar presentations


Ads by Google