Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica.

Similar presentations


Presentation on theme: "On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica."— Presentation transcript:

1 On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica

2 Applying the Fujisaki model to Mandarin –1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/)http://phslab.ling.sinica.edu.tw/ PI: Prof. Chiu-yu Tseng Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003) –2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/)http://www.gavo.t.u-tokyo.ac.jp/ PI: Pro. Keikichi Hirose Mandarin--manual extraction of Fujisaki parameters Japanese—automatic extraction of Fujisaki parameter –3. DSP and Speech Technology Lab, CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/)http://dsp.ee.cuhk.edu.hk/ PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William Mandarin—manual extraction of Fujisaki parameters

3 Outline Introduction--the Fujisaki model Auto-extraction comparison– methods used at two labs to generate the Fujisaki parameters 1.Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 2004, 2005, 2006) 2.Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Narusawa 2002, 2003) Manual extraction—Method used at CUHK to extract Fujisaki parameters –DSP and Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)

4 log (F0)=base frequency+ phrase components +accent components The Fujisaki Model (Fujisaki & Hirose 1984) = phrase components accent components superposed model +

5 Auto-extraction based on Mixdorff’s method (2000, 2003) High-frequency contour (HFC) Low-frequency contour (LFC) Original F 0 contour highpass filter (stop frequency at 0.5 Hz)

6 Decision of phrase commands Low-frequency contour (LFC) from Mixdorff’s method Position of local minimum optimization Perceptual phrase boundary The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan evaluation :

7 Phonetics Lab, Academia Sinica-- Auto-extraction results of Mandarin ( Mixdorff 2003)

8 Hirose Lab — Auto extraction (Narusawa 2002, 2003) Residual contour-- target of phrase components Original f0 contour Derivative-- target of phrase components

9 Decision of phrase commands The optimum I can be selected when c(I) is maximum. Dynamic Programming (DP) Residual contour

10 Hirose Lab— Compensation from text analysis to aid auto-extraction Using parsed text to adjust extracted Fujisaki parameter

11 Hirose Lab— Auto-extraction of Japanese (Narusawa 2002, 2003) Original method –An accent component should be located on a phrase component. New method –Pause is considered. –Correction after using information from parsed text.

12 Auto-extraction of phrase components—Comparison of 2 labs Phrase components –Phonetics Lab, IL, AS (modified Mixdorff 2003): Pre-extraction of phrase components--relatively close. –Hirose Lab: Pre-extraction-- not as close, but the final output can be compensated by text analysis. 1.Auto-extract acoustic signal f0 contour 2.Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)

13 Manual adjustment--Gu, CUHK Note: 1. Insertion of phrase components is subjective. 2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)

14 Manual adjustment--Gu, CUHK

15 Possible Future Considerations (1/2) 1. Distinguishing acoustic feature is only pause? duration? Or f0? 2. Or combination of acoustic features—pause, duration, and/or f0? –E.g. Test if duration can compensate F0 reset

16 Possible Future Considerations (2/2) Improving auto-extraction of tone components 3. The concept of tone nucleus –By retaining only the nucleus of syllable while ignoring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment) –By ignoring horizontal f0 variation (from Gu’s manual adjustment)

17 One major ambiguity among 3 labs— phrase component unit selection 1. Phonetics Lab, Academia Sinica, Taiwan – Mandarin prosodic phrase (intonation and phrase) 2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu) 3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected: PPh—adjusted from visual display PW—adjusted from perceptual decision

18 Why Prosodic Unit Selection can be a problem unique to Mandarin? Japanese: Bunsetsu--compound word consisting of two or more content words Mandarin: 1.Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to maintain the tendency of one application of phrase component function. 2. HKCU--Manual adjustment can be accurate but not systematic enough. e.g. A phrase component sometimes corresponds to a prosodic phrase, sometimes shorter.

19 Concluding Remarks 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming. 2. What possible improvement can auto-extraction borrow from manual adjustment? –Focusing on nucleus (syllable) –Understanding more of acoustic properties (F0, duration…) 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. –Linguistic information—parsing (text analysis and syntax), semantics and pragmatics –Cognitive information---speech planning and processing


Download ppt "On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica."

Similar presentations


Ads by Google