Download presentation
Presentation is loading. Please wait.
Published byShannon Bates Modified over 9 years ago
2
2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ. http://mirlab.org/jang
3
-2- Intro to Stress Detection zStress detection (SD) for English yGiven an English word and its pronunciation yDetect the stress position of the pronunciation zApplications yComputer-assisted pronunciation training (CAPT) zSimilar to… yTone recognition in Mandarin Chinese yIntonation scoring
4
-3- Examples of Stress in English Words zFor multi-syllablic English word, there is a stressed syllable zExample yDictionary: stressed at syllable 1 yTomorrow: stressed at syllable 2 yInternational: stressed at syllable 3
5
-4- Steps in Stress Detection zPreprocessing yUse forced alignment to find vowel locations zFeature extraction yExtract feature for each vowel zModel construction yBuild a classifier for vowel-based stress detection zPost processing yCreate a word-based stress detection
6
-5- Forced Alignment (1/2) zA process used for align an utterance and the corresponding canonical phonetic alphabets zExample: International
7
-6- Forced Alignment (2/2) zApplications of forced alignment ySpeech scoring (based on timber only) yUtterance verification zOur forced alignment engine yASRA (Automatic Speech Recognition & Assessment): For voice command recognition and speech assessment (scoring)
8
-7- Corpora for Stress Detection zMerriam Webster dictionary yWebsiteWebsite zSome statistics y# pronunciations: 21950 yUsable files: 14994 xNo. of syllables > 1 xAvailable in our dictionary xValid output from ASRA zIn-house recordings yRecordings from MSAR for several years yAvailable upon request
9
-8- Speech Corpus for Lexical Stress Detection z Merriam Webster Online Dictionary’s Lexical Pronunciation –http://www.merriam-webster.com –All utterance are pronunciated by Native Speakers Stress Position Number of Syllable 2345678 1 5090242146536010 2 169116541324147900 3 03489264502700 4 00342427242 5 000130110 6 0000070 7 0000000 Total 678144232749876138232 Total utterances14992 Total Syllables43212 Stressed Syllables14992 Unstressed Syllables28220 Stressed : Unstressed1 : 1.9 Sample Rate16000 Resolution16 Channelmono
10
-9- Stress Detection based on Vowel Classification zSD is based on vowel classification due to the following observations yEach word has a stressed syllable yEach syllable is usually composed of a consonant and a vowel yVowels are always voiced (have pitch) zTherefore yEach vowel is classified into “unstressed” or “stressed” yTo determine stressed syllable in an utterance xMax likelihood of the class “Stressed” xMin likelihood of the class “Unstressed” xDifference of the above two
11
-10- Features for vowels zVowel-based features yPitch: min, mean, max, range, std, slope, etc. yVolume: min, mean, max, range, std, slope, etc. yDuration (normalized by speech rate) yLegendre polynomial fitting for pitch & volume ySpectral emphasized version of the above y…
12
-11- Lexical Stress Detection – Experiment 1 Feature Set E : Root Mean Square Energy D : Duration P : Pitch S : Root Mean Square Spectral Emphasis Energy PS: Pitch Slope CE: Legendre Coefficient of Root Mean Square Energy Contour CP: Legendre Coefficient of Pitch Contour CS: Legendre Coefficient of Spectral Emphasis Energy Contour 10-fold Cross Validation Classifier: SVM
13
-12- 3 Syllables word 1 st 2 nd 3 rd 1 st 96.08%3.10%0.83% 2 nd 8.28%86.58%5.14% 3 rd 31.90%5.75%62.36% 4 Syllables word 1 st 2 nd 3 rd 4 th 1 st 96.13%2.37%1.51%0% 2 nd 8.91%87.76%2.34%0.98% 3 rd 21.62%2.46%73.95%0.97% 4 th 38.24%5.88%2.94%52.94% 5 Syllables word 1 st 2 nd 3 rd 4 th 5 th 1 st 100%0% 2 nd 8.16%88.44%2.72%0.68%0% 3 rd 19.33%1.78%76.67%1.78%0.44% 4 th 13.64%13.22%2.48%70.66%0% 5 th 100%0% 2 Syllables word 1 st 2 nd 1 st 95.13%4.87% 2 nd 25.67%74.33%
14
-13- Lexical Stress Detection – Experiment 2 10-fold Cross Validation Classifier: SVM Syllable Number-Independent Classifier vs. Syllable Number-dependent Classifier Feature Set Max. Root Mean Square Energy Mean Root Mean Square Energy Max. Pitch Median Pitch Duration Max. Spectral Emphasis Root Mean Square Energy Mean Spectral Emphasis Root Mean Square Energy Pseudo-Slope of Pitch Contour Legendre Polynomials Coefficients of Pitch Contour Legendre Polynomials Coefficients of RMS Energy Contour Legendre Polynomials Coefficients of Spectral Emphasis RMS Energy
15
-14- Lexical Stress Detection – Experiment 3 GMMC: Gaussian Mixture Model Classifier NBC: Naïve Bayes Classifier QC: Quadratic Classifier SVMC: Support Vector Machine Classifier Feature Set Max. Root Mean Square Energy Mean Root Mean Square Energy Max. Pitch Median Pitch Duration Max. Spectral Emphasis Root Mean Square Energy Mean Spectral Emphasis Root Mean Square Energy Pseudo-Slope of Pitch Contour Legendre Polynomials Coefficients of Pitch Contour Legendre Polynomials Coefficients of RMS Energy Contour Legendre Polynomials Coefficients of Spectral Emphasis RMS Energy 10-fold Cross Validation
16
-15- Lexical Stress Detection – Error Analysis z Error Types: 1.Wrong ground truth / More than 1 pronunciations of the word –conduct 2 [kənˋdʌkt] / [ˋkɑndʌkt] 2.Complex Word with 2 primary stressed syllables –worldwide 2 [`wɝld`waɪd] –histochemistry 5 [ˋhɪstəˋkɛmɪstrɪ] 3.Word with Primary stressed and Secondary stressed syllable –deposition 4 [͵dɛpəˋzɪʃən] –cafeteria 5 [͵kæfəˋtɪrɪə]
17
-16- Lexical Stress Detection – Error Analysis z Error Types: 4.Wrong result from Pitch Tracking –elegant 3 [ˋɛləgənt] 5.Wrong result from Forced Alignment –peremptory 4 [pəˋrɛmptərɪ]
18
-17- More on Stress Detection zASRA yChapter 20 of online tutorial on Audio Signal ProcessingAudio Signal Processing yDemo xRecognition goDemoVc.m in ASR Web xAssessment goDemoSa.m in ASR Web zStress detection yApplication noteApplication note yDemoDemo
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.