Informatique et Phonétique

Informatique et Phonétique
Master Plurital Informatique et Phonétique

Le but de ce cours est de présenter une approche similaire à celle du TAL …. … mais appliquée au signal sonore : Précédemment ... 1- L'analyse automatique du signal sonore 2- Un langage de requêtes sur le signal sonore 3- Des applications extérieures au signal

Cette année ... TTS "Text-To-Speech" J’ai été conçu... Ma voix...

Plan historique de la synthèse Text-To-Speech
synthèse par concaténation de diphones phonétiseur (graphème  phonème) Speech Markup languages

Forget about the « toothpaste » metaphor! (cf. Allen ’s words)
TTS = NLP + DSP TEXT SP EECH DIGITAL SIGNAL PROCESSING Mathematical models Algorithms Computations NATURAL LANGUAGE Linguistic formalisms Inference engines Logical inferences Phones Prosody TEXT-TO-SPEECH SYNTHESIZER Narrow Phonetic Transcription Forget about the « toothpaste » metaphor! (cf. Allen ’s words)

Challenges Intelligibility Naturalness! Engineering :
Coarticulation! Speech synthesis does not reduce to a mere concatenation of recorded phonemes Segmental quality Naturalness! Coarticulation again... Intonation and phoneme durations must be “coherent”; easy to produce unnatural prosody Engineering : low computational and memory cost Easy adaptation to other languages

Omer Dudley’s Voder (Bell Labs, 1936)

John Holmes’ formant synthesizer (1964)
Haskins Labs (1968) DecTalk (1983) InfoVox ( ) Rule-based Synthesis

Challenges? Intelligibility: more or less... Naturalness : NO
Engineering : low computational and memory cost : YES Easy adaptation to other languages : NO

Diphone concatenation (1977)
al is pS si~

Diphone concatenation (1977)

Christian Hamon’s PSOLA (1988)
Cnet (1990) Limsi (Paris , 1992)

MBROLA (1993) Based on the same Poisson’s sum formula as PSOLA, but using edited diphones Similar overall quality as PSOLA Same computational load Completely automatic!  can be used to create lots of compatible synthesizers J’ai été conçu... Ma voix...

The MBROLA project

Challenges? Intelligibility: YES Naturalness : more or less...
Engineering : low computational and memory cost : more or less... Easy adaptation to other languages : YES

Automatic unit selection
Diphone-based synthesis

Unit selection-based synthesis

Costs ? Open problem Concatenation cost ? Target cost ? Weights? Trained by resynthesizing the corpus and trying to minimize the difference between original and synthetic Towards passing the TURING test ? (ATR, 1996) (Accuvoice, 1996) (Univ. Edinburgh, 1997) (AT&T, 1998) (L&H, 1999)

Towards corpus-based techniques
1995-?: The database years For automatic phonetization For automatic generation of intonation and phoneme duration For automatic selection of units for concatenative synthesis

Large text and speech corpora
Tagged text corpora required for training language models for ASR, phonetizers and taggers for TTS Phonetically labeled speech corpora needed for ASR (multi-speaker, 100s hours) and TTS (single speaker, 1-5 hours) ELRA (European Language Resource Agency) and LDC (Language Data Consortium) collect and distribute databases From expert-based systems to corpus-based systems

présentation de Mbrola

Informatique et Phonétique

Similar presentations

Presentation on theme: "Informatique et Phonétique"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Informatique et Phonétique

Similar presentations

Presentation on theme: "Informatique et Phonétique"— Presentation transcript:

Similar presentations

About project

Feedback