Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.

Similar presentations


Presentation on theme: "Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung."— Presentation transcript:

1 Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung

2

3 Introduction  Tone recognition are definitely influenced by as least the following: 1. Speaker 2. The “prosodic state” 3. Co-articulation effect

4 Introduction  Although the tones depend heavily on many intra-syllabic and prosodic behaviors which are definitely speaker dependent, the native speaker of Mandarin can easily recognize the tones  This implies the tones should be classified by some “robust” prosodic cues, which remain useful across many different conditions.

5 Introduction  in this paper we try to introduce robustness into prosodic features by different feature normalization schemes, based on the concept of affine invariance property proposed in recent years  We also incorporate the prosodic features with the context information by tone posteriorgram analogous to the TANDEM system for speech recognition.

6

7 Prosodic feature set Num PitchMean and slop (3 segments) 6 Mean and slop (Preceding and following syllable) 4 First frame, last frame, minimal, maximal pitch value4 The last voiced frame pitch of preceding syllable1 The first voiced frame pitch of following syllable1 1 DurationDuration ratio with two adjacent syllables2 EnergyLog-energy difference with two adjacent syllables2

8 Affine Invariance property

9

10 Affine invariance for normalized pitch features

11

12

13 Invariance of duration and energy features

14 Pitch contour normalization schemes

15 Tone recognition 21-dimensional prosodic feature vector SVM Enh1 : current syllable Enh2 : current, preceding and following syllable

16

17 Corpus and experiment setup  Sinica Continuous Speech Prosody Corpora (COSPRO)  Contained 4672 utterances (more than 60,000 syllables), produced by 38 male and 40 female native speakers.  SVM tone recognizers.

18 Experimental results

19


Download ppt "Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung."

Similar presentations


Ads by Google