Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level.

Similar presentations


Presentation on theme: "Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level."— Presentation transcript:

1 Accent Modeling An Overview

2 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level fusion of Arithmetic Harmonic Sphericity (AHS) and Hidden Markov Model (HMM) techniques  performance improvements of 22% and 6% true acceptance rate (at 5% false acceptance rate) on YOHO and USF multi-modal biometric datasets, respectively.

3 02/09/07iCONS Group Presentation3 Prologue…contd

4 02/09/07iCONS Group Presentation4 Prologue – what next  Further improvement of recognition rate through speaker accent  Speaker accent will play a critical role in the evaluation of biometric systems, since users will be international in nature.  Incorporating accent model in the speaker recognition/verification system will be a key component that our study will focus on.

5 02/09/07iCONS Group Presentation5 Accent  What is accent  The cumulative auditory effect of those features of pronunciation which identify where a person is from regionally and socially.  Difference between accent and dialect  Accent is the negative (or rather colorful) influence of the first language (L1) of a speaker to a second language, while Dialects of a given language are differences in speaking style of that language (which all belong to L1) because of geographical and ethnic differences.

6 02/09/07iCONS Group Presentation6 Accent  Factors affecting the level of accent  Age at which speaker learns the second language.  Nationality of speaker’s language instructor.  Grammatical and phonological differences between the primary and secondary languages.  Amount of interaction the speaker has with native language speakers.

7 02/09/07iCONS Group Presentation7 Applications of Accent Modeling  Accent knowledge can be used for selection of alternative pronunciations or provide information for biasing a language model for speech recognition.  Accent can be useful in profiling speakers for call routing in a call centre.  Document retrieval systems.  Speaker recognition systems.

8 02/09/07iCONS Group Presentation8 Examples of Accent  - Native American English  - Indian  - Chinese  - British  - Japanese  - Russian  - Arabic  - Greek

9 02/09/07iCONS Group Presentation9 World’s Major Languages

10 02/09/07iCONS Group Presentation10 Accent Classification System Speech Data (Training) Extract Accent Features Reference Accent Model 1 Speech Data (Testing) Extract Accent Features Classificatio n Speech Data (Training) Extract Accent Features Reference Accent Model N Score

11 02/09/07iCONS Group Presentation11 Accent– Research Work  M. V. Chan, et.al., "Classification of speech accents with neural networks," IEEE World Congress on Computational Intelligence, vol.7, pp.4483-4486, 27 Jun-2 Jul 1994.  L. M. Arslan, “Foreign Accent Classification in American English,” Ph. D. Dissertation, Duke University, 1996.  C. Teixeira, I. Trancoso, and A. Serralheiro, “Accent identification,” In Proc. International Conference on Spoken Language Processing, vol.3, pp.1784- 1787, 1996.  P. Fung and W.K. Liu, "Fast Accent Identification and Accented Speech Recognition," in Proc. ICASSP'99, vol.1, pp. 221-224, 1999.  T. Chen, et.al., "Automatic accent identification using Gaussian mixture models," ASRU '01, pp. 343- 346, 9-13 Dec. 2001.  P. Angkititrakul, J.H.L. Hansen, "Stochastic Trajectory Model Analysis for Accent Classification”, Inter. Conf. on Spoken Language Processing, vol. 1, pp. 493-496, Sept. 2002.  X. Lin, S. Simske, "Phoneme-less hierarchical accent classification," Signals, Systems and Computers, vol.2, pp. 1801-1804, 7-10 Nov. 2004.

12 02/09/07iCONS Group Presentation12 Research Work … Contd  F. Farahani, et.al., "Speaker identification using supra-segmental pitch pattern dynamics," in Proc. ICASSP‘04, vol.1, pp. I-89-92, 17-21 May 2004.  M. M. Tanabian, et.al., "Automatic speaker recognition with formant trajectory tracking using CART and neural networks," Canadian Conference on Electrical and Computer Engineering, pp. 1225- 1228, 1-4 May 2005.  S. Gray, J. H. L. Hansen, "An integrated approach to the detection and classification of accents/dialects for a spoken document retrieval system," ASRU '05, pp. 35- 40, 27 Nov-1 Dec. 2005.  P. Angkititrakul, J. H. L. Hansen, "Advances in Phone-based Modeling For Automatic Accent Classification," IEEE Transactions on Audio, Speech, and Language Processing, vol.14, pp. 634- 646, March 2006.  K. Bartkova, D. Jouvet, "Using Multilingual Units for Improved Modeling of Pronunciation Variants," in Proc. ICASSP‘06, vol.5, pp. V-1037- V-1040, 14- 19 May 2006.  A. Ikeno, J. H. L. Hansen, "Perceptual Recognition Cues in Native English Accent Variation: "Listener Accent, Perceived Accent, and Comprehension,” in Proc. ICASSP‘04, vol.1, pp. I-401- I-404, 14-19 May 2006.

13 02/09/07iCONS Group Presentation13 Accent Classification Tree Speech Dataset Accent Features: Modeling: Classification/Decision Pitch Stochastic Trajectory ModelsArtificial Neural Networks Gaussian Mixture ModelsHidden Markov Models Formant Trajectories Energy Delta MFCCs MFCCsFormants

14 02/09/07iCONS Group Presentation14 Foreign Accent Classification in American English - Dataset  Dataset consists of neutral American English, German, Spanish, Chinese, Turkish, French, Italian, Hindi, Rumanian, Japanese, Persian and greek accents.  All speech was sampled at 8000 Hz  Totally, 43 speakers used microphone input and 68 speakers used telephone input, in a quiet office environment.

15 02/09/07iCONS Group Presentation15 Formant Frequency Analysis  Formants represent those frequencies which encompass the majority of the acoustic energy from source to output with an acoustic tube model as the system.  Second and Third formants are particularly favorable for accent classification

16 02/09/07iCONS Group Presentation16 Mel Scale Vs Accent Scale

17 02/09/07iCONS Group Presentation17 Accent Classifier  The features consisted of 8 dimensional ASCCs, energy along with their delta features.  The IW-FS, CS-FS, and CS-PS classified with 74.5%, 61.3%, and 68.3% respectively.  Using a test word count of 7-8 words, accent classification accuracy among 4 accents is 93%.

18 02/09/07iCONS Group Presentation18 Computer Vs Humans

19 02/09/07iCONS Group Presentation19 Conclusions about specific features  Word-final stop release time is longer among foreign accents  Slope of intonation contour for isolated words is more negative for Chinese speakers, and more positive for German speakers than native speakers  Voice onset time for unvoiced stops is not a significant contributor for accents considered in this study.  Second and third formant positions are different for native and non native speakers.

20 02/09/07iCONS Group Presentation20 Accent Classification/Detection using ANN  Demographic data including speaker’s age, percentage of time in a day when English used as communication and the number of years English was spoken were used as features, along with speech features: average pitch frequency and averaged first three formant frequencies were given as inputs to the neural network.  A dataset of 10 native and 12 non-native speakers were used.  F2 and F3 distributions of native and non-native groups show high dissimilarity.  Three neural network classification techniques namely competitive learning, counter propagation and back propagation were compared.  Back propagation gave a detection rate of 100% for training data and 90.9% for testing data.

21 02/09/07iCONS Group Presentation21 Phoneme less Hierarchical Accent Classification  WSJCAM0 & TIDIGITS were used to train British and American accents respectively.  IViE & Voicemail were used to test British and American accents respectively.  13 dimensional MFCCs were used as features and 64-component Gaussian Mixture Model was used for modeling.

22 02/09/07iCONS Group Presentation22  Results show an average 7.1% error rate reduction relatively when compared to direct accent classification.

23 02/09/07iCONS Group Presentation23 Accent Classification Application

24 02/09/07iCONS Group Presentation24 Advances in Phone Based Modeling  Conventional HMMs assumes that the sequence of features are produced by a piecewise stationary process.  Hidden Markov Modeling assumes that adjacent frames are acoustically uncorrelated.  Also that the state dependant duration distributions are exponentially decreasing.

25 02/09/07iCONS Group Presentation25 Why Phone Based Modeling?  Capturing the temporal variation of acoustic signal is an important aspect of speech recognition.  A better framework for modeling the evolution of the spectral dynamics of speech  Flexibility and power due to whole segment classification, in contrast to frame by frame classification

26 02/09/07iCONS Group Presentation26 Trajectories of the phoneme sequence /aa/ - /r/ from the word ‘Target’

27 02/09/07iCONS Group Presentation27 Stochastic Trajectory Model  An STM represents the acoustic observations of a phoneme as clusters of trajectories in a parametric space.  If X is a sequence of N points : Where each point is a D-dimensional vector, X is obtained by resampling a sequence of d frames along the linear time scale.

28 02/09/07iCONS Group Presentation28 Stochastic Trajectory Model  The resampled N-Frame vector vector X is considered to be underlying trajectory of the original X with d frames. The pdf of a segment X given a duration d and the segment symbol s is:  Where is the set of all trajectory components associated with , is the probability of observing trajectory, given that the segment is, with the constraint that  is the pdf of the vector sequence X, given component trajectory, duration, symbol.

29 02/09/07iCONS Group Presentation29 Stochastic Trajectory Model  The distribution assigned to each of the samples points on a trajectory is characterized by a multivariate Gaussian distribution with a mean vector, and covariance matrix. With the assumption of frame independent trajectories, the pdf is modeled as,  The training algorithm performs maximum likelihood estimation of the parameters of the gaussian distribution.

30 02/09/07iCONS Group Presentation30 Accent Classification System

31 02/09/07iCONS Group Presentation31 Performance – Male and Female Chinese vs American-English

32 02/09/07iCONS Group Presentation32 Further Investigation  Further study of accent classification and detection.  Study of accent in a linguistic point of view.  Experimentation and formulation of accent modeling and classification.  Combination of Accent information with my previous work to achieve speaker recognition enhancement.

33 Questions Thank You


Download ppt "Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level."

Similar presentations


Ads by Google