Download presentation
Presentation is loading. Please wait.
Published byLewis Manning Modified over 9 years ago
1
Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan Advanced Institute of Science and Technology (JAIST)
2
Our research approach focuses on … not text-to-singing (lyric-to-singing) synthesis singing ♪ ♪ ♪ ♪ but speech-to-singing synthesis (vocal conversion). ⇒ Clarifying acoustic differences between singing and speaking. ⇒ Developing novel applications for computer music production. speech singing ♪ ♪ ♪ ♪
3
Vocal conversion system is - based on speech manipulation system STRAIGHT (Kawahara et al,1998) and - comprises three types of model; F0 control model Duration control model Spectral control model
4
Speaking voice: reading the lyrics of a song. Musical score Synchronization information cv ccccvv
5
Musical score Musical notes F0 control model: Adding four types of F0 fluctuation into musical note. F0 contour of singing voice Melody contour Vibrato : Quasi-periodic frequency modulation with 4 - 7 Hz. Preparation : Deflection in the opposite direction of note change observed just before note change. Fine fluctuation : irregularly fluctuations higher than 10 Hz in full contour. Overshoot : Deflection exceeding the target note after note change.
6
Speaking voice STRAIGHT (analysis part)
7
Spectral sequenceAP sequence Duration control model: is lengthened according to the fix rate. is not lengthened. is lengthened so that the duration of the whole combination corresponds to the note duration.
8
Lengthened Spectral and AP sequence Spectral envelope and AP of vowel part. Modified spectral envelope and AP Spectral control model1: Adding singing formant by emphasizing peak of spectral envelope and dip of AP.
9
Modified spectral and APGenerated F0 contour Synthesized singing voice STRAIGHT (synthesis) Adding an amplitude modulation (AM) of formants synchronized with vibrato by adding AMs into amplitude envelope of the synthesized singing voice during vibrato. Spectral control model 2: Synthesized singing voice (final version)
10
♪ Speaking voice (input): (male → female) ♪ Synthesized singing voice: (male → female → chorus)
11
Thank you!!
12
12
13
lips teeth ・ alveolar arch palateglottis voicedunvoicedvoicedunvoicedvoicedunvoiced fricative /z/1.37/s/1.18/h/1.28 plosive /d/1.00/t/1.09/g/1.14/k/0.97 semivowel/w/2.61/r/2.12 nasal /m/1.35/n/1.50 ♪ Calculating the ratios of the duration of each consonant in singing-voices to read speech We can control phoneme duration by controlling articulation manner rather than articulation positions: fricative 1.28, plosive 1.00, semivowel 2.37, nasal 1.43, /y/ 1.22
14
♪ Singers’ formant: Remarkable peak of spectral at around 3 kHz. (Sundberg, 1974) ♪ Amplitude modulation of formants synchronized with vibrato. (Hirano, 1985) Both features are remarkably contained to a professional singing-voice.
15
2000 Hz Spectral control 1: Singing formant that is a remarkable peak of spectrum at around 3 kHz. Spectral control 2: Amplitude modulation of formants synchronized with vibrato in F0.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.