On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica.

Slides:



Advertisements
Similar presentations
Information structuring in English dialogue class 4
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity Kjelgaard & Speer 1999 Kent Lee Ψ 526b 16 March 2006.
Frequency, Pitch, Tone and Length October 15, 2012 Thanks to Chilin Shih for making some of these lecture materials available.
Speech and speaker normalization (in vowel normalization)
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
Tone, Accent and Stress February 14, 2014 Practicalities Production Exercise #2 is due at 5 pm today! For Monday after the break: Yoruba tone transcription.
M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
Prosodic marking of appositive relative clause types in spoken discourse: pragmatic and phonetic analyses of a British English corpus Cyril Auran & Rudy.
Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics,
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
Sound and Speech. The vocal tract Figures from Graddol et al.
1 ENGLISH PHONETICS AND PHONOLOGY Lesson 3A Introduction to Phonetics and Phonology.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Prosody and NLP Seminar by Nikhil: Adith: Prachur: 06D05011 We have a presentation this Friday ?
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
The “interpretative” foundation of Intonation Unit (IU) or Intonation Phrase (  ). Amedeo De Dominicis Conferenza annuale A.I.S.V (Università degli.
Next Generation Speech Science and Technologies - A Cross-Country Joint Project for Collaboration between Speech Research Labs in Taiwan and in Japan Lin-shan.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Fujisaki Model 對應階層性語 流韻律架構 HPG 在國語的應 用與分析 中央研究院語言學研究所 蘇昭宇
Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Determining query types by analysing intonation.
Frequency, Pitch, Tone and Length October 16, 2013 Thanks to Chilin Shih for making some of these lecture materials available.
Introduction to Computational Linguistics
Generation of F 0 Contours Using a Model-Constrained Data- Driven Method Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki Minematsu.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
A Fully Annotated Corpus of Russian Speech
Lecture 1 Lec. Maha Alwasidi. Branches of Linguistics There are two main branches: Theoretical linguistics and applied linguistics Theoretical linguistics.
National Taiwan University, Taiwan
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Detecting Accent Sandhi in Japanese Using a Superpositional F0 Model Atsuhiro Sakurai Hiromichi Kawanami Keikichi Hirose Depart. of Communication and Information.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Data-Driven Intonation Modeling Using a Neural Network and a Command Response Model Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki.
Linguistic knowledge for Speech recognition
Knowledge-Based Organ Identification from CT Images
Meanings of Intonational Contours
Representing Intonational Variation
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Automatic Prosodic Event Detection
Presentation transcript:

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica

Applying the Fujisaki model to Mandarin –1. Phonetics Lab, Academia Sinica, Taiwan ( PI: Prof. Chiu-yu Tseng Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003) –2. Hirose Lab, Tokyo University, Japan ( PI: Pro. Keikichi Hirose Mandarin--manual extraction of Fujisaki parameters Japanese—automatic extraction of Fujisaki parameter –3. DSP and Speech Technology Lab, CUHK, Hong kong ( PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William Mandarin—manual extraction of Fujisaki parameters

Outline Introduction--the Fujisaki model Auto-extraction comparison– methods used at two labs to generate the Fujisaki parameters 1.Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 2004, 2005, 2006) 2.Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Narusawa 2002, 2003) Manual extraction—Method used at CUHK to extract Fujisaki parameters –DSP and Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)

log (F0)=base frequency+ phrase components +accent components The Fujisaki Model (Fujisaki & Hirose 1984) = phrase components accent components superposed model +

Auto-extraction based on Mixdorff’s method (2000, 2003) High-frequency contour (HFC) Low-frequency contour (LFC) Original F 0 contour highpass filter (stop frequency at 0.5 Hz)

Decision of phrase commands Low-frequency contour (LFC) from Mixdorff’s method Position of local minimum optimization Perceptual phrase boundary The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan evaluation :

Phonetics Lab, Academia Sinica-- Auto-extraction results of Mandarin ( Mixdorff 2003)

Hirose Lab — Auto extraction (Narusawa 2002, 2003) Residual contour-- target of phrase components Original f0 contour Derivative-- target of phrase components

Decision of phrase commands The optimum I can be selected when c(I) is maximum. Dynamic Programming (DP) Residual contour

Hirose Lab— Compensation from text analysis to aid auto-extraction Using parsed text to adjust extracted Fujisaki parameter

Hirose Lab— Auto-extraction of Japanese (Narusawa 2002, 2003) Original method –An accent component should be located on a phrase component. New method –Pause is considered. –Correction after using information from parsed text.

Auto-extraction of phrase components—Comparison of 2 labs Phrase components –Phonetics Lab, IL, AS (modified Mixdorff 2003): Pre-extraction of phrase components--relatively close. –Hirose Lab: Pre-extraction-- not as close, but the final output can be compensated by text analysis. 1.Auto-extract acoustic signal f0 contour 2.Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)

Manual adjustment--Gu, CUHK Note: 1. Insertion of phrase components is subjective. 2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)

Manual adjustment--Gu, CUHK

Possible Future Considerations (1/2) 1. Distinguishing acoustic feature is only pause? duration? Or f0? 2. Or combination of acoustic features—pause, duration, and/or f0? –E.g. Test if duration can compensate F0 reset

Possible Future Considerations (2/2) Improving auto-extraction of tone components 3. The concept of tone nucleus –By retaining only the nucleus of syllable while ignoring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment) –By ignoring horizontal f0 variation (from Gu’s manual adjustment)

One major ambiguity among 3 labs— phrase component unit selection 1. Phonetics Lab, Academia Sinica, Taiwan – Mandarin prosodic phrase (intonation and phrase) 2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu) 3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected: PPh—adjusted from visual display PW—adjusted from perceptual decision

Why Prosodic Unit Selection can be a problem unique to Mandarin? Japanese: Bunsetsu--compound word consisting of two or more content words Mandarin: 1.Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to maintain the tendency of one application of phrase component function. 2. HKCU--Manual adjustment can be accurate but not systematic enough. e.g. A phrase component sometimes corresponds to a prosodic phrase, sometimes shorter.

Concluding Remarks 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming. 2. What possible improvement can auto-extraction borrow from manual adjustment? –Focusing on nucleus (syllable) –Understanding more of acoustic properties (F0, duration…) 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. –Linguistic information—parsing (text analysis and syntax), semantics and pragmatics –Cognitive information---speech planning and processing