Download presentation
Presentation is loading. Please wait.
Published byClement Lyons Modified over 8 years ago
1
Data-Driven Intonation Modeling Using a Neural Network and a Command Response Model Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki Minematsu (Dep. of Comm. Eng., The Univ. of Tokyo, Japan) Keikichi Hirose (Dep. of Frontier Eng., The Univ. of Tokyo, Japan)
2
Introduction to Corpus-Based Intonation Modeling Traditional approach: rules derived from linguistic expertise Human-dependent, too complicated and not satisfactory, because the phenomena involved are not completely understood Corpus-based approach: modeling derived from statistical analysis of speech corpora Automatic, and with the potential to improve as better and larger speech corpora become available
3
Outline of the Method Linguistic info. Text Input paramters F 0 Model parameters Neural network
4
F 0 Contour Model
5
Neural Network and the F 0 Model The F 0 Model provides: –Direct association with physical quantities, expressing the F 0 contour with a small number of parameters –Relatively good correspondence with syntactic structure A neural network was used because: –The mapping is nonlinear –The problem admits multiple solutions –Robust to ambiguities and imprecision
6
Input Layer Hidden Layer Output Layer Context Layer Input Layer Hidden Layer Output Layer State Layer (a) Elman network(b) Jordan network Input Layer Hidden Layer Output Layer (c) Multi-layer perceptron (MLP) Neural Network Structure
7
Input Features Position of accentual phrase within utterance No. of morae in accentual phrase Accent type of accentual phrase No. of words in accentual phrase POS category of first word Conjugation category of first word Conjugation of first word POS category of last word Conjugation category of last word Conjugation of last word 18 15 9 8 37 7 37 7 Input Feature Number of classes
8
Isshuukanbakari nyuuyookuo shuzaishita. (一週間ばかりニューヨークを取材した) “nyuuyookuo” Position of accentual phrase within utterance: 2 No. of morae in accentual phrase: 6 Accent type of accentual phrase: 3 No. of words in accentual phrase: 2 POS, conjugation type/category of first word: noun/-/- POS, conjugation type/category of last word: particle/-/-
9
t Command ApAp AaAa t0t0 t1t1 t2t2 tAtA tBtB tDtD tCtC t Waveform Output Features Mora containing accent nucleus
10
Phrase command magnitude (A p ) Accent command amplitude (A a ) Phrase command delay (t 0 off ) Accent command onset delay (t 1 off ) Accent command reset delaty (t 2 off ) Phrase command flag Continuous Binary Output Feature Type
11
phrase command flag: 1 if phrase command exists at the accentual phrase t 0off = t A -t 0 t 1off = t A -t 1 (accent type 1) or t B - t 1 (others) t 2off = t C -t 2 t A : beginning of accentual phrase at segmental level t B : beginning of 2nd mora t C : accent nucleus at segmental level t D : end of accentual phrase at segmental level t 0, t 1, t 2 : F 0 Model parameters Notation:
12
Training Training data: 388 sentences (2803 accentual phrases) Validation data: 50 sentences (317 accentual phrases) Test data: 48 sentences (262 accentual phrases) Epoch (cycles): 12 to 30
13
Experimental Results (1): Predicting the Occurrence of Phrase Commands Elman108130372.19 Elman208229372.22 Elman507833372.11 No. of phrase commands: 111 No. of non-phrasal boundaries: 151 Total no. of prosodic boundaries: 262
14
Experimental Results (2): Phrase Command Parameters
15
Elman20284.74.8 Elman50284.44.6 Experimental Results (3): Accent Command Parameters
16
Elman100.214 Elman200.211 Elman500.232 F 0 Experimental Results (4): Comparison with Natural F 0 Contours
17
A practical example
18
Comments Neural network modeling of F 0 Model parameters offers the following advantages: –Data-driven modeling –Relatively good perceptual results However: –Still prone to errors –Neural network modeling is still little understood For now on: –Compare with other methods (tree regression, etc.) –Deeper insight into the effects of each input feature
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.