Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics, Academia Sinica, Taipei, Taiwan 子計畫五「韻律屬性與語音事件偵測之研究」 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Outline Research Direction Introduction Speech materials Discourse Prosodic Attributes Analysis of prosodic boundary Analysis of prosodic highlight Findings so far 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Research Direction Argument Prosody model – Discourse structure (DS) Serving to group phrases and utterances to form speech paragraphs and spoken discourse – Information structure (IS) Serving to realize information weighting in continuous speech In addition to prosody from segmental, lexical, phonological and syntactic levels; discourse prosody is also an intrinsic part of naturally occurring speech which the human ear is sensitive to, and which cannot be pinned down from analysis of sentence prosody, nor entirely by corresponding text transcription. (Tseng, Interspeech 2010) Abundant Information Lexical Syntactic Phonological Duration F0 Amplitude Segmental Discourse Structure Information Structure 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Introduction Cues of prosody model – Discourse structure→ Prosodic boundaries – Information structure→ Prosodic highlight (perceived emphasis) Goals: – Acoustic attributes and discriminative analysis for prosodic boundaries cross genres (Tseng et al, 2008, 2009) – Seeing how perceived prosodic highlights can be explained by systematic patterns by genre, discourse structure, information weighting acoustic manifestations (Tseng et al, 2011) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Speech Materials--Taiwan Mandarin Read speech Plain text of 26 discourse pieces by M051 and F051 (CNA) (about 45 and 46 minutes, 160MB) 34 simulated pieces of weather broadcast by M054 and F054 (WB) (about 23 and 27 minutes, 95MB) Spontaneous speech – NTU DSP lecture by LSL (one male speaker, about 30 minutes) (SpnL/LEC) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Annotations Preprocessing Automatic Segmental labeling using the HTK and manually spot-checked for phone boundaries. Manual labeling of perceived prosodic boundaries by HPG protocols. Manual labeling of perceived focus and prominence – prosodic highlight 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Annotation Rationale Labeling Perceived Boundary Breaks Labeling Perceived Prosodic Highlight (emphasis, accent) DefinitionCharacteristics B1normal syllabic boundary No identifiable pauses B2prosodic word boundary Before a slight change of tone of voice follows. B3prosodic phrase boundary A clearly perceived pause. B4breath group boundary Clearly heard change of breath B5prosodic group boundary Final lengthening followed by a complete stop before new paragraph, with change of break. Definition E0unstressed portions marked by reduced pitch, volume and/or segment reduction E1normal pitch, volume with no segmental reduction E2higher pitch or louder volume irrespective of speaker’s tone of voice or intention E3higher pitch or louder volume marked by speaker’s tone of voice or intention 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Annotations Examples phone boundary layer→ perceived prosodic boundary layer→ perceived prosodic highlight layer→ “ 以自有品牌建立起國際品牌形象 ” 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Acoustic Features and Methodology Acoustic features Vowel-based F0 Syllable-based duration Vowel-based intensity Methodology Multiple regression model (Tseng et al 2005) 2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang High layer information Intrinsic attributes PW SYL BG SYL PPh PW SYL Residues
Discourse Prosodic Attributes Examples: 3-PPh paragraph (Tseng et al, 2010) PW Layer PPh Layer PG Layer Normalized F0 Syllable Position PG Initial PG Medial PG Final Normalized F0 Syllable Position Normalized Duration Syllable Position Normalized Intensity PG Initial PG Medial PG Final PG Initial PG Medial PG Final 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Prosodic Boundary Phrases are not only major and minor phrases Acoustic realization of prosodic boundaries – Pre-boundary F0 lowering, Duration lengthening Intensity decay – Boundary pause – Post-boundary F0 reset Duration shortening Intensity jump 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
How Reliable Is Pause Duration ? (1/2) Cross genres, speakers and language – systematic pattern by pause duration, i.e. B3<B4<B5 μ / σB3B4B5 RS_CNA_M051P249 / /124621/113 RS_CNA_F051P229 / / /237 RS_WB_M054165/145490/123555/166 SpnL_LSL423/429739/ /498 Pause duration (ms) by break (B3, B4 and B5 and genre Read Speech (RS) CNA, weather broadcast WB; spontaneous speech (Spnl) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
How Reliable Is Pause Duration ? (2/2) B3 (PPh) boundaries vary a great deal Pause duration—not reliable How is PPh boundary B3 be perceived? – (Tseng et al, 2009) Plotting of the distribution of pause duration of discourse boundary breaks B2, B3 and B4 in read speech (RS) CNA for speakers F051P (left) and M051P (right). 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Comparison of Discourse Boundary Discrimination (Tseng et al, 2009) Cross-feature Comparison by Corpus CNA_F051 CNA_M051 LEC Discrimination: LEC Cross-feature comparison of mean value by corpus (LEC, CNA_F051 and CNA_M051 from top to bottom; the horizontal axis represents indexes of feature type; the vertical axis denotes mean value of each feature). 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Analysis of Perceived Emphasis Annotations (1/3) Distribution of Perceived Emphasis 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Combined Emphasis(E2+E3)
Analysis of Perceived Emphasis Annotations (2/3) Perceived Emphasis Scale – Not only perceived emphasis but syntax constraint 2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang
Analysis of Perceived Emphasis Annotations (3/3) Distribution of Perceived Emphasis by phrase boundaries – LEC: post-boundary = pre-boundary – CNA: post-boundary > pre-boundary – WB: post-boundary < pre-boundary 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Emphasis Loading Why? – Estimate information weighting in continuous speech Methodology – Normalize length of PPh – Estimation Syl PPh N 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Results of Emphasis Loading Within PPh by Relative Syllable Position Within BG and PG by Relative PPh Position Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Acoustic Characteristics of Prosodic Highlights (1/2) Emphasis vs. no-emphasis without considering PPh- positions Mean values of acoustic correlates by emphasis/no-emphasis and genres Significant acoustic factors by genres LEC: Duration Average F0 (F-ratio=846) F0 range Intensity (F-ratio=873) CNA Average F0 (F-ratio=492) Intensity (F-ratio=364) WB Intensity (F-ratio=196) Duration (F-ratio=170) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Acoustic Characteristics of Perceived Highlights (2/2) Emphasis vs. no-emphasis with considering PPh- positions PPh-Initial PPh-Final PPh-Medial LEC Duration Average F0 F0 range Intensity CNA Average F0 Intensity Duration in PPh-Medial position only WB Intensity by all PPh positions Duration in PPh-Medial position only by all PPh positions 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Analysis of Perceived Emphasis by Decision Tree Toolkit Why? Evaluating the most significant factors for classification Methodology: Results: Decision Tree-CNA Decision Tree-WB Decision Tree-LEC 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會
Discourse Pattern of Emph vs. No- Emph—CNA CNA Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh CNA Normalized Duration Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Normalized F0 Normalized intensity Syllable position Normalized Duration Normalized F0 Normalized intensity 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Removing emphasis effect
Discourse Pattern of Emph vs. Non- Emph—WB WB Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh WB Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Syllable position Normalized Duration Normalized F0 Normalized intensity Normalized Duration Normalized F0 Normalized intensity 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Removing emphasis effect
Discourse Pattern of Emph vs. Non- emph —LEC Normalized Duration Normalized F0 Normalized intensity LEC Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh LEC Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Syllable position Normalized Duration Normalized F0 Normalized intensity 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Removing emphasis effect
Findings Prosodic boundary – Pause duration could be random – Boundary neighborhood contrast is more significant. Prosodic highlights – Speech mode (genre) related – Independent of discourse structure – underlying linguistic structures can be derived Future directions – Speech technology development could benefit from more understanding of information structure in relation to prosodic highlight. 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會