Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics,

Slides:



Advertisements
Similar presentations
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Advertisements

The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity Kjelgaard & Speer 1999 Kent Lee Ψ 526b 16 March 2006.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Spoken Language Analysis Dept. of General & Comparative Linguistics Christian-Albrechts-Universität zu Kiel Oliver Niebuhr 1 At the Segment-Prosody.
Varied, Vivid Expressive How can you use your voice to engage, express, and create meaning?
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
FLST: Prosodic Models FLST: Prosodic Models for Speech Technology Bernd Möbius
Prosodic analysis: theoretical value and practical difficulties Anne Wichmann Nicole Dehé.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
J-ToBi Jennifer J. Venditti Presentation by James Rishe.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Chapter three Phonology
Producing Emotional Speech Thanks to Gabriel Schubiner.
STUDY OF ENGLISH STRESS AND INTONATION
Prosody and NLP Seminar by Nikhil: Adith: Prachur: 06D05011 We have a presentation this Friday ?
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
The “interpretative” foundation of Intonation Unit (IU) or Intonation Phrase (  ). Amedeo De Dominicis Conferenza annuale A.I.S.V (Università degli.
AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence,
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Fujisaki Model 對應階層性語 流韻律架構 HPG 在國語的應 用與分析 中央研究院語言學研究所 蘇昭宇
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
National Taiwan University, Taiwan
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Levels of Linguistic Analysis
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Investigating Pitch Accent Recognition in Non-native Speech
Studying Intonation Julia Hirschberg CS /21/2018.
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Discourse Structure in Generation
Levels of Linguistic Analysis
Recognizing Structure: Dialogue Acts and Segmentation
Jennifer J. Venditti Presentation by James Rishe
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Automatic Prosodic Event Detection
Presentation transcript:

Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics, Academia Sinica, Taipei, Taiwan 子計畫五「韻律屬性與語音事件偵測之研究」 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Outline Research Direction Introduction Speech materials Discourse Prosodic Attributes Analysis of prosodic boundary Analysis of prosodic highlight Findings so far 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Research Direction Argument Prosody model – Discourse structure (DS) Serving to group phrases and utterances to form speech paragraphs and spoken discourse – Information structure (IS) Serving to realize information weighting in continuous speech In addition to prosody from segmental, lexical, phonological and syntactic levels; discourse prosody is also an intrinsic part of naturally occurring speech which the human ear is sensitive to, and which cannot be pinned down from analysis of sentence prosody, nor entirely by corresponding text transcription. (Tseng, Interspeech 2010) Abundant Information Lexical Syntactic Phonological Duration F0 Amplitude Segmental Discourse Structure Information Structure 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Introduction Cues of prosody model – Discourse structure→ Prosodic boundaries – Information structure→ Prosodic highlight (perceived emphasis) Goals: – Acoustic attributes and discriminative analysis for prosodic boundaries cross genres (Tseng et al, 2008, 2009) – Seeing how perceived prosodic highlights can be explained by systematic patterns by genre, discourse structure, information weighting acoustic manifestations (Tseng et al, 2011) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Speech Materials--Taiwan Mandarin Read speech Plain text of 26 discourse pieces by M051 and F051 (CNA) (about 45 and 46 minutes, 160MB) 34 simulated pieces of weather broadcast by M054 and F054 (WB) (about 23 and 27 minutes, 95MB) Spontaneous speech – NTU DSP lecture by LSL (one male speaker, about 30 minutes) (SpnL/LEC) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Annotations Preprocessing Automatic Segmental labeling using the HTK and manually spot-checked for phone boundaries. Manual labeling of perceived prosodic boundaries by HPG protocols. Manual labeling of perceived focus and prominence – prosodic highlight 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Annotation Rationale Labeling Perceived Boundary Breaks Labeling Perceived Prosodic Highlight (emphasis, accent) DefinitionCharacteristics B1normal syllabic boundary No identifiable pauses B2prosodic word boundary Before a slight change of tone of voice follows. B3prosodic phrase boundary A clearly perceived pause. B4breath group boundary Clearly heard change of breath B5prosodic group boundary Final lengthening followed by a complete stop before new paragraph, with change of break. Definition E0unstressed portions marked by reduced pitch, volume and/or segment reduction E1normal pitch, volume with no segmental reduction E2higher pitch or louder volume irrespective of speaker’s tone of voice or intention E3higher pitch or louder volume marked by speaker’s tone of voice or intention 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Annotations Examples phone boundary layer→ perceived prosodic boundary layer→ perceived prosodic highlight layer→ “ 以自有品牌建立起國際品牌形象 ” 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Acoustic Features and Methodology Acoustic features Vowel-based F0 Syllable-based duration Vowel-based intensity Methodology Multiple regression model (Tseng et al 2005) 2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang High layer information Intrinsic attributes PW SYL BG SYL PPh PW SYL Residues

Discourse Prosodic Attributes Examples: 3-PPh paragraph (Tseng et al, 2010) PW Layer PPh Layer PG Layer Normalized F0 Syllable Position PG Initial PG Medial PG Final Normalized F0 Syllable Position Normalized Duration Syllable Position Normalized Intensity PG Initial PG Medial PG Final PG Initial PG Medial PG Final 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Prosodic Boundary Phrases are not only major and minor phrases Acoustic realization of prosodic boundaries – Pre-boundary F0 lowering, Duration lengthening Intensity decay – Boundary pause – Post-boundary F0 reset Duration shortening Intensity jump 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

How Reliable Is Pause Duration ? (1/2) Cross genres, speakers and language – systematic pattern by pause duration, i.e. B3<B4<B5 μ / σB3B4B5 RS_CNA_M051P249 / /124621/113 RS_CNA_F051P229 / / /237 RS_WB_M054165/145490/123555/166 SpnL_LSL423/429739/ /498 Pause duration (ms) by break (B3, B4 and B5 and genre Read Speech (RS) CNA, weather broadcast WB; spontaneous speech (Spnl) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

How Reliable Is Pause Duration ? (2/2) B3 (PPh) boundaries vary a great deal Pause duration—not reliable How is PPh boundary B3 be perceived? – (Tseng et al, 2009) Plotting of the distribution of pause duration of discourse boundary breaks B2, B3 and B4 in read speech (RS) CNA for speakers F051P (left) and M051P (right). 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Comparison of Discourse Boundary Discrimination (Tseng et al, 2009) Cross-feature Comparison by Corpus CNA_F051 CNA_M051 LEC Discrimination: LEC Cross-feature comparison of mean value by corpus (LEC, CNA_F051 and CNA_M051 from top to bottom; the horizontal axis represents indexes of feature type; the vertical axis denotes mean value of each feature). 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Analysis of Perceived Emphasis Annotations (1/3) Distribution of Perceived Emphasis 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Combined Emphasis(E2+E3)

Analysis of Perceived Emphasis Annotations (2/3) Perceived Emphasis Scale – Not only perceived emphasis but syntax constraint 2011/07/12 NGASR 2011 暑期研習會 Jr-Feng Huang

Analysis of Perceived Emphasis Annotations (3/3) Distribution of Perceived Emphasis by phrase boundaries – LEC: post-boundary = pre-boundary – CNA: post-boundary > pre-boundary – WB: post-boundary < pre-boundary 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Emphasis Loading Why? – Estimate information weighting in continuous speech Methodology – Normalize length of PPh – Estimation Syl PPh N 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Results of Emphasis Loading Within PPh by Relative Syllable Position Within BG and PG by Relative PPh Position Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Acoustic Characteristics of Prosodic Highlights (1/2) Emphasis vs. no-emphasis without considering PPh- positions Mean values of acoustic correlates by emphasis/no-emphasis and genres Significant acoustic factors by genres LEC: Duration Average F0 (F-ratio=846) F0 range Intensity (F-ratio=873) CNA Average F0 (F-ratio=492) Intensity (F-ratio=364) WB Intensity (F-ratio=196) Duration (F-ratio=170) 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Acoustic Characteristics of Perceived Highlights (2/2) Emphasis vs. no-emphasis with considering PPh- positions PPh-Initial PPh-Final PPh-Medial LEC Duration Average F0 F0 range Intensity CNA Average F0 Intensity Duration in PPh-Medial position only WB Intensity by all PPh positions Duration in PPh-Medial position only by all PPh positions 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Analysis of Perceived Emphasis by Decision Tree Toolkit Why? Evaluating the most significant factors for classification Methodology: Results: Decision Tree-CNA Decision Tree-WB Decision Tree-LEC 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會

Discourse Pattern of Emph vs. No- Emph—CNA CNA Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh CNA Normalized Duration Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Normalized F0 Normalized intensity Syllable position Normalized Duration Normalized F0 Normalized intensity 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Removing emphasis effect

Discourse Pattern of Emph vs. Non- Emph—WB WB Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh WB Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Syllable position Normalized Duration Normalized F0 Normalized intensity Normalized Duration Normalized F0 Normalized intensity 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Removing emphasis effect

Discourse Pattern of Emph vs. Non- emph —LEC Normalized Duration Normalized F0 Normalized intensity LEC Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh LEC Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Initial PPhMedial PPhFinal PPh Syllable position Normalized Duration Normalized F0 Normalized intensity 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會 Removing emphasis effect

Findings Prosodic boundary – Pause duration could be random – Boundary neighborhood contrast is more significant. Prosodic highlights – Speech mode (genre) related – Independent of discourse structure – underlying linguistic structures can be derived Future directions – Speech technology development could benefit from more understanding of information structure in relation to prosodic highlight. 2011/07/12Jr-Feng Huang NGASR 2011 暑期研習會