Download presentation
Presentation is loading. Please wait.
Published byAlberta Peters Modified over 9 years ago
1
Accent Modeling An Overview
2
02/09/07iCONS Group Presentation2 Prologue Our Initial Effort Enhancement of speaker recognition through score level fusion of Arithmetic Harmonic Sphericity (AHS) and Hidden Markov Model (HMM) techniques performance improvements of 22% and 6% true acceptance rate (at 5% false acceptance rate) on YOHO and USF multi-modal biometric datasets, respectively.
3
02/09/07iCONS Group Presentation3 Prologue…contd
4
02/09/07iCONS Group Presentation4 Prologue – what next Further improvement of recognition rate through speaker accent Speaker accent will play a critical role in the evaluation of biometric systems, since users will be international in nature. Incorporating accent model in the speaker recognition/verification system will be a key component that our study will focus on.
5
02/09/07iCONS Group Presentation5 Accent What is accent The cumulative auditory effect of those features of pronunciation which identify where a person is from regionally and socially. Difference between accent and dialect Accent is the negative (or rather colorful) influence of the first language (L1) of a speaker to a second language, while Dialects of a given language are differences in speaking style of that language (which all belong to L1) because of geographical and ethnic differences.
6
02/09/07iCONS Group Presentation6 Accent Factors affecting the level of accent Age at which speaker learns the second language. Nationality of speaker’s language instructor. Grammatical and phonological differences between the primary and secondary languages. Amount of interaction the speaker has with native language speakers.
7
02/09/07iCONS Group Presentation7 Applications of Accent Modeling Accent knowledge can be used for selection of alternative pronunciations or provide information for biasing a language model for speech recognition. Accent can be useful in profiling speakers for call routing in a call centre. Document retrieval systems. Speaker recognition systems.
8
02/09/07iCONS Group Presentation8 Examples of Accent - Native American English - Indian - Chinese - British - Japanese - Russian - Arabic - Greek
9
02/09/07iCONS Group Presentation9 World’s Major Languages
10
02/09/07iCONS Group Presentation10 Accent Classification System Speech Data (Training) Extract Accent Features Reference Accent Model 1 Speech Data (Testing) Extract Accent Features Classificatio n Speech Data (Training) Extract Accent Features Reference Accent Model N Score
11
02/09/07iCONS Group Presentation11 Accent– Research Work M. V. Chan, et.al., "Classification of speech accents with neural networks," IEEE World Congress on Computational Intelligence, vol.7, pp.4483-4486, 27 Jun-2 Jul 1994. L. M. Arslan, “Foreign Accent Classification in American English,” Ph. D. Dissertation, Duke University, 1996. C. Teixeira, I. Trancoso, and A. Serralheiro, “Accent identification,” In Proc. International Conference on Spoken Language Processing, vol.3, pp.1784- 1787, 1996. P. Fung and W.K. Liu, "Fast Accent Identification and Accented Speech Recognition," in Proc. ICASSP'99, vol.1, pp. 221-224, 1999. T. Chen, et.al., "Automatic accent identification using Gaussian mixture models," ASRU '01, pp. 343- 346, 9-13 Dec. 2001. P. Angkititrakul, J.H.L. Hansen, "Stochastic Trajectory Model Analysis for Accent Classification”, Inter. Conf. on Spoken Language Processing, vol. 1, pp. 493-496, Sept. 2002. X. Lin, S. Simske, "Phoneme-less hierarchical accent classification," Signals, Systems and Computers, vol.2, pp. 1801-1804, 7-10 Nov. 2004.
12
02/09/07iCONS Group Presentation12 Research Work … Contd F. Farahani, et.al., "Speaker identification using supra-segmental pitch pattern dynamics," in Proc. ICASSP‘04, vol.1, pp. I-89-92, 17-21 May 2004. M. M. Tanabian, et.al., "Automatic speaker recognition with formant trajectory tracking using CART and neural networks," Canadian Conference on Electrical and Computer Engineering, pp. 1225- 1228, 1-4 May 2005. S. Gray, J. H. L. Hansen, "An integrated approach to the detection and classification of accents/dialects for a spoken document retrieval system," ASRU '05, pp. 35- 40, 27 Nov-1 Dec. 2005. P. Angkititrakul, J. H. L. Hansen, "Advances in Phone-based Modeling For Automatic Accent Classification," IEEE Transactions on Audio, Speech, and Language Processing, vol.14, pp. 634- 646, March 2006. K. Bartkova, D. Jouvet, "Using Multilingual Units for Improved Modeling of Pronunciation Variants," in Proc. ICASSP‘06, vol.5, pp. V-1037- V-1040, 14- 19 May 2006. A. Ikeno, J. H. L. Hansen, "Perceptual Recognition Cues in Native English Accent Variation: "Listener Accent, Perceived Accent, and Comprehension,” in Proc. ICASSP‘04, vol.1, pp. I-401- I-404, 14-19 May 2006.
13
02/09/07iCONS Group Presentation13 Accent Classification Tree Speech Dataset Accent Features: Modeling: Classification/Decision Pitch Stochastic Trajectory ModelsArtificial Neural Networks Gaussian Mixture ModelsHidden Markov Models Formant Trajectories Energy Delta MFCCs MFCCsFormants
14
02/09/07iCONS Group Presentation14 Foreign Accent Classification in American English - Dataset Dataset consists of neutral American English, German, Spanish, Chinese, Turkish, French, Italian, Hindi, Rumanian, Japanese, Persian and greek accents. All speech was sampled at 8000 Hz Totally, 43 speakers used microphone input and 68 speakers used telephone input, in a quiet office environment.
15
02/09/07iCONS Group Presentation15 Formant Frequency Analysis Formants represent those frequencies which encompass the majority of the acoustic energy from source to output with an acoustic tube model as the system. Second and Third formants are particularly favorable for accent classification
16
02/09/07iCONS Group Presentation16 Mel Scale Vs Accent Scale
17
02/09/07iCONS Group Presentation17 Accent Classifier The features consisted of 8 dimensional ASCCs, energy along with their delta features. The IW-FS, CS-FS, and CS-PS classified with 74.5%, 61.3%, and 68.3% respectively. Using a test word count of 7-8 words, accent classification accuracy among 4 accents is 93%.
18
02/09/07iCONS Group Presentation18 Computer Vs Humans
19
02/09/07iCONS Group Presentation19 Conclusions about specific features Word-final stop release time is longer among foreign accents Slope of intonation contour for isolated words is more negative for Chinese speakers, and more positive for German speakers than native speakers Voice onset time for unvoiced stops is not a significant contributor for accents considered in this study. Second and third formant positions are different for native and non native speakers.
20
02/09/07iCONS Group Presentation20 Accent Classification/Detection using ANN Demographic data including speaker’s age, percentage of time in a day when English used as communication and the number of years English was spoken were used as features, along with speech features: average pitch frequency and averaged first three formant frequencies were given as inputs to the neural network. A dataset of 10 native and 12 non-native speakers were used. F2 and F3 distributions of native and non-native groups show high dissimilarity. Three neural network classification techniques namely competitive learning, counter propagation and back propagation were compared. Back propagation gave a detection rate of 100% for training data and 90.9% for testing data.
21
02/09/07iCONS Group Presentation21 Phoneme less Hierarchical Accent Classification WSJCAM0 & TIDIGITS were used to train British and American accents respectively. IViE & Voicemail were used to test British and American accents respectively. 13 dimensional MFCCs were used as features and 64-component Gaussian Mixture Model was used for modeling.
22
02/09/07iCONS Group Presentation22 Results show an average 7.1% error rate reduction relatively when compared to direct accent classification.
23
02/09/07iCONS Group Presentation23 Accent Classification Application
24
02/09/07iCONS Group Presentation24 Advances in Phone Based Modeling Conventional HMMs assumes that the sequence of features are produced by a piecewise stationary process. Hidden Markov Modeling assumes that adjacent frames are acoustically uncorrelated. Also that the state dependant duration distributions are exponentially decreasing.
25
02/09/07iCONS Group Presentation25 Why Phone Based Modeling? Capturing the temporal variation of acoustic signal is an important aspect of speech recognition. A better framework for modeling the evolution of the spectral dynamics of speech Flexibility and power due to whole segment classification, in contrast to frame by frame classification
26
02/09/07iCONS Group Presentation26 Trajectories of the phoneme sequence /aa/ - /r/ from the word ‘Target’
27
02/09/07iCONS Group Presentation27 Stochastic Trajectory Model An STM represents the acoustic observations of a phoneme as clusters of trajectories in a parametric space. If X is a sequence of N points : Where each point is a D-dimensional vector, X is obtained by resampling a sequence of d frames along the linear time scale.
28
02/09/07iCONS Group Presentation28 Stochastic Trajectory Model The resampled N-Frame vector vector X is considered to be underlying trajectory of the original X with d frames. The pdf of a segment X given a duration d and the segment symbol s is: Where is the set of all trajectory components associated with , is the probability of observing trajectory, given that the segment is, with the constraint that is the pdf of the vector sequence X, given component trajectory, duration, symbol.
29
02/09/07iCONS Group Presentation29 Stochastic Trajectory Model The distribution assigned to each of the samples points on a trajectory is characterized by a multivariate Gaussian distribution with a mean vector, and covariance matrix. With the assumption of frame independent trajectories, the pdf is modeled as, The training algorithm performs maximum likelihood estimation of the parameters of the gaussian distribution.
30
02/09/07iCONS Group Presentation30 Accent Classification System
31
02/09/07iCONS Group Presentation31 Performance – Male and Female Chinese vs American-English
32
02/09/07iCONS Group Presentation32 Further Investigation Further study of accent classification and detection. Study of accent in a linguistic point of view. Experimentation and formulation of accent modeling and classification. Combination of Accent information with my previous work to achieve speaker recognition enhancement.
33
Questions Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.