CROSS ENTROPY INFORMATION METRIC FOR QUANTIFICATION AND CLUSTER ANALYSIS OF ACCENTS Alireza Ghorshi Brunel University, London.

Slides:



Advertisements
Similar presentations
1 Speech Sounds Introduction to Linguistics for Computational Linguists.
Advertisements

Clustering Categorical Data The Case of Quran Verses
Transfer of English Phonology onto Mandarin L2 Speech.
Phonological Intervention Options: Variations of Minimal Pair Contrasts Minimal Pairs Maximal Oppositions Empty Set Multiple Oppositions.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
PHONETICS AND PHONOLOGY
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Introduction to Bioinformatics
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
Sonority as a Basis for Rhythmic Class Discrimination Antonio Galves, USP. Jesus Garcia, USP. Denise Duarte, USP and UFGo. Charlotte Galves, UNICAMP.
Chapter 11 Chi-Square Procedures 11.1 Chi-Square Goodness of Fit.
Chapter three Phonology
1 ENGLISH PHONETICS AND PHONOLOGY Lesson 3A Introduction to Phonetics and Phonology.
Phonetics and Phonology.
EXPECTED VALUE OF A RANDOM VARIABLE 1 The expected value of a random variable, also known as its population mean, is the weighted average of its possible.
Entropy in Machine Transliteration & Phonology Bhargava Reddy B.Tech Project.
Phonology, phonotactics, and suprasegmentals
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
Phonetics and Phonology
Isolated-Word Speech Recognition Using Hidden Markov Models
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Presented by Tienwei Tsai July, 2005
CONTINUOUS RANDOM VARIABLES AND THE NORMAL DISTRIBUTION.
Welcome.
Measures of Variation Among English and American Dialects Robert Shackleton U.S. Congressional Budget Office.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
Part aspiration (p. 56) aspiration, a period of voicelessness after the stop articulation and before the start of the voicing for the vowel.
A Process Control Screen for Multiple Stream Processes An Operator Friendly Approach Richard E. Clark Process & Product Analysis.
Design of Novel Two-Level Quantizer with Extended Huffman Coding for Laplacian Source Lazar Velimirović, Miomir Stanković, Zoran Perić, Jelena Nikolić,
The Sounds of English: an Introduction to English Phonetics.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Hello, Everyone! Part I Review Review questions 1.In what ways can English consonants be classified? 2. In what ways can English vowels be classified?
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Performance Comparison of Speaker and Emotion Recognition
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
2.3 Markedness Differential Hypothesis (MDH)
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
LANGUAGE, DIALECT, AND VARIETIES
PRESENTED BY: AMY E. LINGENFELTER Tackling English Pronunciation.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Phonemes and allophones
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Copyright © Cengage Learning. All rights reserved. 8 PROBABILITY DISTRIBUTIONS AND STATISTICS.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
an Introduction to English
Development of English 6
Introduction to Econometrics, 5th edition
Exploration: Accents of English
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EE513 Audio Signals and Systems
LECTURE 23: INFORMATION THEORY REVIEW
ENGLISH PHONETICS AND PHONOLOGY Week 2
Further Topics on Random Variables: Derived Distributions
Further Topics on Random Variables: Derived Distributions
Further Topics on Random Variables: Derived Distributions
Presentation transcript:

CROSS ENTROPY INFORMATION METRIC FOR QUANTIFICATION AND CLUSTER ANALYSIS OF ACCENTS Alireza Ghorshi Brunel University, London

ABSTRACT An accent metric is introduced based on the cross entropy (CE) of the probability models (HMMs) of speech from different accents. The CE metric has potentials for use in analysis, identification, quantification and ranking of the salient features of accents. The accent metric is used for phonetic-tree cluster analysis of phonemes, for cross-accent phonetic clustering and for quantification of the distances of phonetic sounds in different English accents.

INTRODUCTION Accent may be defined as a distinctive pattern of pronunciation, including phonemic systems, lexicon and intonation characteristics, of a community of people who belong to a national, regional or social grouping. There are two general methods for the classification of the differences between accents: 1- Historical Approach 2- Structural, Synchronic Approach

: Compares the historical roots of accents, the rules of pronunciations and how the rules change and evolve. : First proposed by Trubetzkoy in 1931, models an accents in terms of the differences in: a) Phonemic systems, i.e. the number and identity of phonemes. b) Lexical distribution of words. c) Phonotactic ( structural) distribution as in the pronunciation of ‘r’ in rhotic and non-rhotic accents. d) Phonetic (acoustic) realisation. e) Rhythms, intonations and stress patterns of accents. Historical approach: Compares the historical roots of accents, the rules of pronunciations and how the rules change and evolve. Structural, synchronic approach: First proposed by Trubetzkoy in 1931, models an accents in terms of the differences in: a) Phonemic systems, i.e. the number and identity of phonemes. b) Lexical distribution of words. c) Phonotactic ( structural) distribution as in the pronunciation of ‘r’ in rhotic and non-rhotic accents. d) Phonetic (acoustic) realisation. e) Rhythms, intonations and stress patterns of accents.

CROSS ENTROPY ACCENT METRIC This section proposed the cross entropy as a measure of the differences between the acoustic units of speech in different accents. 1) Cross Entropy of Accents 2) The Effects of Speakers on Cross Entropy

1) Cross Entropy of Accents Cross entropy is a measure of the differences between two probability distributions. There are a number of different definitions of cross entropy. The definition used in our paper is also known as Kullback-Leibler distance. Given the probability models P1(x) and P2(x) of a phoneme, or some other sound unit, in two different accents, a measure of their differences is the cross entropy of accents defined as:

 The cross entropy is a non-negative function.  It has a value of zero for two identical distribution.  It increases with the increasing dissimilarity between two distributions.

The cross entropies between two left-right N-state HMMs of speech with M-dimensional (cepstral) features is the sum of cross-entropies of their respective states obtained as: where p(xi|s) is the pdf of the ith mixture of speech in state s. Cross entropy is asymmetric CE(P1,P2)≠CE(P2,P1). A symmetric cross entropy measure can be defined as In the following the cross entropy distance refers to the symmetric measure and the subscript sym will be dropped. Therefore the total distance between two accents can be defined as:

where Nu is the number of speech units and Pi the probability of the ith speech unit. The cross-entropy distance can be used for a wide range of purposes including: a) To find the differences between two accents or the voices of two speakers. b) To cluster phonemes, speakers or accents. c) To rank voice or accent features.

2) The Effect of Speakers on Cross Entropy Speech models include speakers’ variability. Question: How much of the cross entropy of the voice models of two speakers is due to their accents and how much of it is due to the differences of their voice characteristics? It is assumed that the cross entropy due to the differences in speakers characteristics and the cross entropy due to accent characteristics are additive. An accent distance is defined as the difference between the cross entropies of inter-accent models and intra-accent models. The adjusted accent distance is

MINIMUM CROSS ENTROPY (MCE) FOR PHONETIC-TREE CLUSTERING Clustering is the grouping together of similar items. The minimum cross entropy (MCE) information criterion is used, in a bottom- up hierarchical clustering process to construct phonetic trees for different accents of English. The results of the application of MCE clustering for construction of phonetic- trees of American, Australian and British English are shown in Figures 1,2 and 3.

Figure 1- Phonetic-tree of American accent. The phonetic-tree of the American accent, Figure 1, confirms the reputation of the American English as being a ‘phonetic accent’ (i.e. an accent in which phonemes are clearly pronounced). The clustering of American phonemes more or less corresponds to how one would expect the phonemes to cluster. Figure 1- Phonetic-tree of American accent.

Figures 2 and 3, are more similar to each other than to American phonetic tree. This observation is also supported by the calculation of the cross entropy of these accents, presented in the next section. Figure 2- Phonetic-tree of Australian accent. Figure 3- Phonetic-tree of British accent. Figures 2 and 3, are more similar to each other than to American phonetic tree. This observation is also supported by the calculation of the cross entropy of these accents, presented in the next section. Figure 2- Phonetic-tree of Australian accent. Figure 3- Phonetic-tree of British accent.

Figure 4 shows a cross-accent phonetic-tree. This tree shows how the vowels in British accent cluster with the vowels in Australian accent. Figure 4- Cross-accent phonetic-tree between British and Australian (bold) accents.

CROSS ENTROPY QUANTIFICATION OF ACCENTS OF ENGLISH The plots in Figure 5 illustrate the results of measurements of inter-accent and intra-accent cross entropies. Figure 5- Plots of inter-accent and intra-accent cross entropies of a number of phonemes of American, British and Australian accents.

The results clearly show that in all cases the inter-accent model differences are significantly greater than the intra-accent model differences. Furthermore, the results show that in all cases the difference between Australian and British are less than the distance between American and British (or Australian). The results further show that the largest differences of Australian accents are in diphthongs. Table 1 shows the cross entropy distances between different accents. It is evident that of the three accent pairs Australian and British are closest. Furthermore American is closer to British than to Australian. Table 1- Cross entropy between American, British and Australian accents. Accents Metric Am-AuAm-BrBr-Au Cross Entropy

Table 2 shows cultivated Australian (CulAu) is closer to British than broad Australians (BrAu) and general Australian (GenAu) as expected. Table 2- Cross entropy between British and different Australian (Broad, General and Cultivated) accents. Accents Metric Br-BrAuBr-GenAuBr-CulAu Cross Entropy

CONCLUSION  we applied the cross-entropy measure for quantification of accents and for classifications of speech models.  The cross entropy is a general information measure that can be used for a wide range of speech applications from quantification of differences between accents and voices to maximum cross entropy discriminative speech modelling.  The consistency of phonetic-trees for different groups of the same accent shows that cross-entropy is a good measure for hierarchical clustering.  The cross entropy of inter-accent groups compared to that of the intra- accent groups clearly shows the level of dissimilarity of models due to accents.

End of Presentation Thank you