Download presentation
Presentation is loading. Please wait.
Published byJasmine White Modified over 8 years ago
1
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung
3
Introduction Tone recognition are definitely influenced by as least the following: 1. Speaker 2. The “prosodic state” 3. Co-articulation effect
4
Introduction Although the tones depend heavily on many intra-syllabic and prosodic behaviors which are definitely speaker dependent, the native speaker of Mandarin can easily recognize the tones This implies the tones should be classified by some “robust” prosodic cues, which remain useful across many different conditions.
5
Introduction in this paper we try to introduce robustness into prosodic features by different feature normalization schemes, based on the concept of affine invariance property proposed in recent years We also incorporate the prosodic features with the context information by tone posteriorgram analogous to the TANDEM system for speech recognition.
7
Prosodic feature set Num PitchMean and slop (3 segments) 6 Mean and slop (Preceding and following syllable) 4 First frame, last frame, minimal, maximal pitch value4 The last voiced frame pitch of preceding syllable1 The first voiced frame pitch of following syllable1 1 DurationDuration ratio with two adjacent syllables2 EnergyLog-energy difference with two adjacent syllables2
8
Affine Invariance property
10
Affine invariance for normalized pitch features
13
Invariance of duration and energy features
14
Pitch contour normalization schemes
15
Tone recognition 21-dimensional prosodic feature vector SVM Enh1 : current syllable Enh2 : current, preceding and following syllable
17
Corpus and experiment setup Sinica Continuous Speech Prosody Corpora (COSPRO) Contained 4672 utterances (more than 60,000 syllables), produced by 38 male and 40 female native speakers. SVM tone recognizers.
18
Experimental results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.