Presentation is loading. Please wait.

Presentation is loading. Please wait.

Informatique et Phonétique

Similar presentations


Presentation on theme: "Informatique et Phonétique"— Presentation transcript:

1 Informatique et Phonétique
Master Plurital Informatique et Phonétique

2 Le but de ce cours est de présenter une approche similaire à celle du TAL …. … mais appliquée au signal sonore : Précédemment ... 1- L'analyse automatique du signal sonore 2- Un langage de requêtes sur le signal sonore 3- Des applications extérieures au signal

3 Cette année ... TTS "Text-To-Speech" J’ai été conçu... Ma voix...

4 Plan historique de la synthèse Text-To-Speech
synthèse par concaténation de diphones phonétiseur (graphème  phonème) Speech Markup languages

5 Forget about the « toothpaste » metaphor! (cf. Allen ’s words)
TTS = NLP + DSP TEXT SP EECH DIGITAL SIGNAL PROCESSING Mathematical models Algorithms Computations NATURAL LANGUAGE Linguistic formalisms Inference engines Logical inferences Phones Prosody TEXT-TO-SPEECH SYNTHESIZER Narrow Phonetic Transcription Forget about the « toothpaste » metaphor! (cf. Allen ’s words)

6 Challenges Intelligibility Naturalness! Engineering :
Coarticulation! Speech synthesis does not reduce to a mere concatenation of recorded phonemes Segmental quality Naturalness! Coarticulation again... Intonation and phoneme durations must be “coherent”; easy to produce unnatural prosody Engineering : low computational and memory cost Easy adaptation to other languages

7 Omer Dudley’s Voder (Bell Labs, 1936)

8 John Holmes’ formant synthesizer (1964)
Haskins Labs (1968) DecTalk (1983) InfoVox ( ) Rule-based Synthesis

9 Challenges? Intelligibility: more or less... Naturalness : NO
Engineering : low computational and memory cost : YES Easy adaptation to other languages : NO

10 Diphone concatenation (1977)
al is pS si~

11 Diphone concatenation (1977)

12 Christian Hamon’s PSOLA (1988)
Cnet (1990) Limsi (Paris , 1992)

13 MBROLA (1993) Based on the same Poisson’s sum formula as PSOLA, but using edited diphones Similar overall quality as PSOLA Same computational load Completely automatic!  can be used to create lots of compatible synthesizers J’ai été conçu... Ma voix...

14 The MBROLA project

15 Challenges? Intelligibility: YES Naturalness : more or less...
Engineering : low computational and memory cost : more or less... Easy adaptation to other languages : YES

16 Automatic unit selection
Diphone-based synthesis

17 Automatic unit selection
Unit selection-based synthesis

18 Automatic unit selection
Costs ? Open problem Concatenation cost ? Target cost ? Weights? Trained by resynthesizing the corpus and trying to minimize the difference between original and synthetic Towards passing the TURING test ? (ATR, 1996) (Accuvoice, 1996) (Univ. Edinburgh, 1997) (AT&T, 1998) (L&H, 1999)

19 Towards corpus-based techniques
1995-?: The database years For automatic phonetization For automatic generation of intonation and phoneme duration For automatic selection of units for concatenative synthesis

20 Large text and speech corpora
Tagged text corpora required for training language models for ASR, phonetizers and taggers for TTS Phonetically labeled speech corpora needed for ASR (multi-speaker, 100s hours) and TTS (single speaker, 1-5 hours) ELRA (European Language Resource Agency) and LDC (Language Data Consortium) collect and distribute databases From expert-based systems to corpus-based systems

21 présentation de Mbrola


Download ppt "Informatique et Phonétique"

Similar presentations


Ads by Google