Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan.

Slides:



Advertisements
Similar presentations
Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Advertisements

Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Nasal Stops.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.
Speech Science XII Speech Perception (acoustic cues) Version
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Center for Advanced Sound Technologies, Yamaha Corporation VOCALOID Commercial singing synthesizer based on sample concatenation Hideki Kenmochi, Hayato.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
The Human Voice. I. Speech production 1. The vocal organs
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
Please be Seated. The physics of sound: What makes musical tones different? Special Lecture for the 2005 Year of Physics in coordination with the French.
L 17 The Human Voice. The Vocal Tract epiglottis.
Speech perception Relating features of hearing to the perception of speech.
2. ARTICULATION AND FORMANTS
On Timbre Phy103 Physics of Music. Four complex tones in which all partials have been removed by filtering (Butler Example 2.5) One is a French horn,
1 Interspeech Synthesis of Singing Challenge, Aug 28, 2007 Formant-based Synthesis of Singing Sten Ternström and Johan Sundberg KTH Music Acoustics, Speech.
Recap: Vowels & Consonants V – central “sound” of the syllable C – outer “shell” of the syllable (C) V (C) (C)(C)(C)V(C)(C)(C)
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
Nasal Stops. Nasals Distinct vocal tract configuration Pharyngeal cavity Oral cavity (closed) Nasal cavity (open)
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Articulatory Synthesis of Singing Peter Birkholz Institute for Computer Science, University of Rostock Singing Synthesis Challenge 2007 at the Interspeech‘07,
A PRESENTATION BY SHAMALEE DESHPANDE
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2010.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Physics 1251 The Science and Technology of Musical Sound Unit 3 Session 31 MWF The Fundamentals of the Human Voice Unit 3 Session 31 MWF The Fundamentals.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
Björkner, Eva Researcher, Doctoral Student Address Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. Box 3000.
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
Speech Science VII Acoustic Structure of Speech Sounds WS
Speech Or can you hear me now?. Linguistic Parts of Speech Phone Phone Basic unit of speech sound Basic unit of speech sound Phoneme Phoneme Phone to.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information Motoyuki Suzuki, Toru Hosoya, Akinori Ito, and Shozo Makino EURASIP.
Physics 1251 The Science and Technology of Musical Sound
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Expressivity in Sound and Music Roberto Bresin, Sofia Dahl, Anders Friberg KTH, Stockholm – SOb project partner {roberto, sofia,
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
Acoustic Phonetics 3/14/00.
Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.
L 17 The Human Voice.
The Human Voice. 1. The vocal organs
CS 591 S1 – Computational Audio -- Spring, 2017
The Human Voice. 1. The vocal organs
Speech is made up of sounds.
The Vocal Pedagogy Workshop Session III – Articulation
2. ARTICULATION AND FORMANTS
Giovanni M. Di Liberto, James A. O’Sullivan, Edmund C. Lalor 
Manner of Articulation
The Human Voice.
Auditory Morphing Weyni Clacken
Presentation transcript:

Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan Advanced Institute of Science and Technology (JAIST)

Our research approach focuses on … not text-to-singing (lyric-to-singing) synthesis singing ♪ ♪ ♪ ♪ but speech-to-singing synthesis (vocal conversion). ⇒ Clarifying acoustic differences between singing and speaking. ⇒ Developing novel applications for computer music production. speech singing ♪ ♪ ♪ ♪

Vocal conversion system is - based on speech manipulation system STRAIGHT (Kawahara et al,1998) and - comprises three types of model; F0 control model Duration control model Spectral control model

Speaking voice: reading the lyrics of a song. Musical score Synchronization information cv ccccvv

Musical score Musical notes F0 control model: Adding four types of F0 fluctuation into musical note. F0 contour of singing voice Melody contour Vibrato : Quasi-periodic frequency modulation with Hz. Preparation : Deflection in the opposite direction of note change observed just before note change. Fine fluctuation : irregularly fluctuations higher than 10 Hz in full contour. Overshoot : Deflection exceeding the target note after note change.

Speaking voice STRAIGHT (analysis part)

Spectral sequenceAP sequence Duration control model: is lengthened according to the fix rate. is not lengthened. is lengthened so that the duration of the whole combination corresponds to the note duration.

Lengthened Spectral and AP sequence Spectral envelope and AP of vowel part. Modified spectral envelope and AP Spectral control model1: Adding singing formant by emphasizing peak of spectral envelope and dip of AP.

Modified spectral and APGenerated F0 contour Synthesized singing voice STRAIGHT (synthesis) Adding an amplitude modulation (AM) of formants synchronized with vibrato by adding AMs into amplitude envelope of the synthesized singing voice during vibrato. Spectral control model 2: Synthesized singing voice (final version)

♪ Speaking voice (input): (male → female) ♪ Synthesized singing voice: (male → female → chorus)

Thank you!!

12

lips teeth ・ alveolar arch palateglottis voicedunvoicedvoicedunvoicedvoicedunvoiced fricative /z/1.37/s/1.18/h/1.28 plosive /d/1.00/t/1.09/g/1.14/k/0.97 semivowel/w/2.61/r/2.12 nasal /m/1.35/n/1.50 ♪ Calculating the ratios of the duration of each consonant in singing-voices to read speech We can control phoneme duration by controlling articulation manner rather than articulation positions: fricative 1.28, plosive 1.00, semivowel 2.37, nasal 1.43, /y/ 1.22

♪ Singers’ formant: Remarkable peak of spectral at around 3 kHz. (Sundberg, 1974) ♪ Amplitude modulation of formants synchronized with vibrato. (Hirano, 1985) Both features are remarkably contained to a professional singing-voice.

2000 Hz Spectral control 1: Singing formant that is a remarkable peak of spectrum at around 3 kHz. Spectral control 2: Amplitude modulation of formants synchronized with vibrato in F0.