Download presentation
Presentation is loading. Please wait.
1
Informatique et Phonétique
Master Plurital Informatique et Phonétique
2
Le but de ce cours est de présenter une approche similaire à celle du TAL …. … mais appliquée au signal sonore : Précédemment ... 1- L'analyse automatique du signal sonore 2- Un langage de requêtes sur le signal sonore 3- Des applications extérieures au signal
3
Cette année ... TTS "Text-To-Speech" J’ai été conçu... Ma voix...
4
Plan historique de la synthèse Text-To-Speech
synthèse par concaténation de diphones phonétiseur (graphème phonème) Speech Markup languages
5
Forget about the « toothpaste » metaphor! (cf. Allen ’s words)
TTS = NLP + DSP TEXT SP EECH DIGITAL SIGNAL PROCESSING Mathematical models Algorithms Computations NATURAL LANGUAGE Linguistic formalisms Inference engines Logical inferences Phones Prosody TEXT-TO-SPEECH SYNTHESIZER Narrow Phonetic Transcription Forget about the « toothpaste » metaphor! (cf. Allen ’s words)
6
Challenges Intelligibility Naturalness! Engineering :
Coarticulation! Speech synthesis does not reduce to a mere concatenation of recorded phonemes Segmental quality Naturalness! Coarticulation again... Intonation and phoneme durations must be “coherent”; easy to produce unnatural prosody Engineering : low computational and memory cost Easy adaptation to other languages
7
Omer Dudley’s Voder (Bell Labs, 1936)
8
John Holmes’ formant synthesizer (1964)
Haskins Labs (1968) DecTalk (1983) InfoVox ( ) Rule-based Synthesis
9
Challenges? Intelligibility: more or less... Naturalness : NO
Engineering : low computational and memory cost : YES Easy adaptation to other languages : NO
10
Diphone concatenation (1977)
al is pS si~
11
Diphone concatenation (1977)
12
Christian Hamon’s PSOLA (1988)
Cnet (1990) Limsi (Paris , 1992)
13
MBROLA (1993) Based on the same Poisson’s sum formula as PSOLA, but using edited diphones Similar overall quality as PSOLA Same computational load Completely automatic! can be used to create lots of compatible synthesizers J’ai été conçu... Ma voix...
14
The MBROLA project
15
Challenges? Intelligibility: YES Naturalness : more or less...
Engineering : low computational and memory cost : more or less... Easy adaptation to other languages : YES
16
Automatic unit selection
Diphone-based synthesis
17
Automatic unit selection
Unit selection-based synthesis
18
Automatic unit selection
Costs ? Open problem Concatenation cost ? Target cost ? Weights? Trained by resynthesizing the corpus and trying to minimize the difference between original and synthetic Towards passing the TURING test ? (ATR, 1996) (Accuvoice, 1996) (Univ. Edinburgh, 1997) (AT&T, 1998) (L&H, 1999)
19
Towards corpus-based techniques
1995-?: The database years For automatic phonetization For automatic generation of intonation and phoneme duration For automatic selection of units for concatenative synthesis
20
Large text and speech corpora
Tagged text corpora required for training language models for ASR, phonetizers and taggers for TTS Phonetically labeled speech corpora needed for ASR (multi-speaker, 100s hours) and TTS (single speaker, 1-5 hours) ELRA (European Language Resource Agency) and LDC (Language Data Consortium) collect and distribute databases From expert-based systems to corpus-based systems
21
présentation de Mbrola
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.