TEXT TO SPEECH SYNTHESIS

Slides:



Advertisements
Similar presentations
What is Word Study? PD Presentation: Union 61 Revised ELA guide Supplement (and beyond)
Advertisements

Teaching Pronunciation
Natural Language Processing Syntax. Syntactic structure John likes Mary PN VtVt NP VP S DetPNVtVt NP VP S Every man likes Mary Noun.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict.
Research-Based Instruction in Reading Dr. Bonnie B. Armbruster University of Illinois at Urbana-Champaign Archived Information.
Assistant Professor, (Program for Linguistics)
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
Phonology Phonology is essentially the description of the systems and patterns of speech sounds in a language. It is, in effect, based on a theory of.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Introduction to Linguistics and Basic Terms
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Stemming, tagging and chunking Text analysis short of parsing.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Sound and Speech. The vocal tract Figures from Graddol et al.
Chapter three Phonology
1 ENGLISH PHONETICS AND PHONOLOGY Lesson 3A Introduction to Phonetics and Phonology.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Phonetics and Phonology.
Lecture 1 Introduction: Linguistic Theory and Theories
Language: Form, Meanings and Functions
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
The Description of Speech
Recommendations for Morgan’s Instruction Instruction for improving reading fluency Instruction for improving word recognition, word decoding, and encoding.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Speech & Language Development 1 Normal Development of Speech & Language Language...“Standardized set of symbols and the knowledge about how to combine.
Phonetics and Phonology
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
The Linguistics of Second Language Acquisition
Language. Language Communication – transmitting information Many animals communicate Call systems – system of communication limited to a set number of.
Graphophonemic System – Phonics
1 Computational Linguistics Ling 200 Spring 2006.
1. Information Conveyed by Speech 2. How Speech Fits in with the Overall Structure of Language TWO TOPICS.
A Summary of Terminology in Linguistics. First Session Orientation to the Course Introduction to Language & Linguistics 1. Definition of Language 2. The.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Natural Language Processing Lecture 6 : Revision.
Today we will talk about: -Phonology -Phonemes -morphemes.
Korea Maritime and Ocean University NLP Jung Tae LEE
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 8.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
Teaching Pronunciation. The articulation of consonants and vowels and the discrimination of minimal pairs had shifted Emphasis on suprasegmental features.
Chapter 3 Culture and Language. Chapter Outline  Humanity and Language  Five Properties of Language  How Language Works  Language and Culture  Social.
Diagnostic Assessment: Salvia, Ysseldyke & Bolt: Ch. 1 and 13 Dr. Julie Esparza Brown Sped 512/Fall 2010 Portland State University.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Hello, Everyone! Part I Review Review questions 1.In what ways can English consonants be classified? 2. In what ways can English vowels be classified?
Artificial Intelligence: Natural Language
Universal properties of language From An Introduction to Language and Linguistics (Fasold & Connor-Linton (editors), 2006, Yule, 2003)
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
Chapter Five Language Description language study and linguistic study 1Applied Linguistics Chapter 5 by TIAN Bing.
SYNTAX.
Levels of Linguistic Analysis
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
Phonetics and Phonology.
Chapter 1 Introduction PHONOLOGY (Lane 335). Phonetics & Phonology Phonetics: deals with speech sounds, how they are made (articulatory phonetics), how.
Building awareness and concern for pronunciation by Joanne Kenworthy - Teaching English Pronunciation FONETICA Y FONOLOGIA II - ALEXANDRA NAIR ZUÑIGA.
Grammar Module 1: Grammar: what and why? (GM1)
Linguistic knowledge for Speech recognition
INTRODUCTION TO PHONETICS AND PHONOLOGY
Text-To-Speech System for English
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Levels of Linguistic Analysis
ENGLISH PHONETICS AND PHONOLOGY Week 2
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

TEXT TO SPEECH SYNTHESIS KESHU

INTRODUCTION Language is the ability to express one’s thoughts by means of a set of signs, whether graphical, gestual, acoustic or even musical. It is a distinctive feature of human beings who use such structured system

Speech Speech is major component of a language Oldest means of communication Levels of speech: 1. Acoustic 2. Phonetic 3. Phonological 4. Morphological 5. Syntactic 6. Semantic 7. Pragmatic

Perfect TTS Synthesizer Human beings The reading process involves: Seeing, Thinking, Saying, Hearing These are most complex processes Cannot be imitated

TTS Synthesizer System A text to speech synthesizer is a computer based system that should be able to read any text whether it was directly introduced into the computer or through character recognition system (OCR). And speech should be intelligible and natural.

Feature and Multilevel Data Structures Plays an important role in contemporary TTS systems for Natural Language Processing

Typical TTS Components Two components Natural Language Processing Module (NLP) Digital Signal Processing Module (DSP)

NLP and DSP Modules The NLP module is capable of producing a phonetic transcription of the text to be read, together with the desired intonation and rhythm. It takes in the text as input and give narrow phonetic transcription as output which is further forwarded to the DSP module. And the DSP module which transforms the symbolic information it receives into natural sounding speech. “Narrow phonetic transcription” which is taken as intermediate varies from synthesizer system to another.

NLP Module of typical TTS system Text Analyzer (Morpho Syntactic Analysis) Pre-processor Morphological Analyzer Contextual Analyzer Syntactic-Prosodic parser Letter to Sound Module

Preprocessor Takes in texts as strings of ASCII characters Transforms text into Broad Segmentation Units (BSU’s) following the set: A sequence of characters A sequence of digits A single punctuation mark or another special character A sequence of white space characters Eg: (I)()(know)()(1)(,)(000)()(words)(,)()(Dr)(.)() (Jones)(.) Rewrites the BSU’s into a list of word-like units and of syntax bearing punctuation marks called Final Segmentation Units are produced (FSU’s).

Preprocessor Sentence end detection (semicolon, period – ratio, time and decimal point, sentence ending respectively) Abbreviations (e.g. – for instance) Changed to their full form with the help of lexicons Acronyms (I.B.M – these can be read as a sequence of characters, or NASA which can be read following the default way) Numbers (Once detected, first interpreted as rational, time of the day, dates and ordinal depending on their context) Idioms (eg. “In spite of”, “as a matter of fact”– these are combined into single FSU using a special lexicon)

Morphological Analysis Task is to propose all possible parts of speech categories to each word taken individually on the basis of their spelling Words – Function and Content words

Function Words Function words (determiners, pronouns, prepositions, conjunctions..). Can be stored in a lexicon to get their parts of speech categories because of its size. Word he: <spel> = he <syn cat> = pronoun <syn num> = <syn gen> = masc <phon> = /h/

Content Words Content words- infinite in number Needs Morphology – part of linguistics that describes word forms as a function of reduced set of abstract semantically bearing units called morphemes. Inflectional, derivational and compound words (content words) are decomposed into their elementary graphemic units (morphemes) Uses regular grammars exploiting lexicons of stems and affixes which is the only way because of its infinite size

Contextual Analysis Considers words in their context Reduces the list of their parts of speech categories to a very restricted number of highly probable hypotheses, given the corresponding possible parts of speech of neighboring words. Achieved by N-grams, multi-layer perceptrons (Neural networks), local stochastic grammars (provided by expert linguistics) etc

Letter to Sound Module LTS module is responsible for the automatic determination of the phonetic transciption of the incoming text Cannot just look up in a pronunciation dictionary Do not follow the rule “one character = one phoneme” Examples Single character correspond to two phonemes -- x as /ks/ Several characters producing one phoneme— gh in thought Single character pronounced in different ways c in ancestor, ancient, epic Single phoneme resulting in several spellings – sh in dish, t in action, c in ancient

Letter to Sound Module Some of the cases to consider Consonants may be reduced or deleted in clusters (eg. t in softness) Assimilation which originates in articulatory constraints and leads to a change of some phonological features of a given phoneme (eg. obstacle) Heterophonic homographs which are pronounced differently even though when they have same spelling (eg. record, contrast) Phonetic liaisons which affect final consonants of French words immediately followed by a vocalic sound which results in pronunciation of characters that otherwise disappear or in a change of pronunciation Schwas (transformation of unstressed vowels into short central phonetic elements is done or simply deletes them – like in thoughtful and interesting Vowel lengthening, new words, proper nouns which are really dependent on the language of origin to know the correct pronunciation.

Two Basic Strategies Dictionary based and Rule-based

Dictionary Based Dictionary based consist of storing a maximum of phonological knowledge into a lexicon and entries are generally restricted to morphemes and pronunciation of surface forms is accounted by inflectional, derivational and compounding morphophonic rules which describe how the phonetic transcriptions of their morphemic constituents are modified when they are combined into words. For those words that are not in the lexicon are transcribed by rule.

Rule Based Rule based strategy which transfers most of the phonological competence of dictionaries into a set of letter to sound (grapheme to phoneme) rules. And those words which are pronounced in a such a particular way that they constitute a rule on their own are stored in exceptions directory.

Dictionary based and Rule based

Morpho-Phonemic Module in Dictionary based Morphophonology is concerned with phonological changes in the pronunciation of morphemes occurring in the process of word formation.

Morpho-Phonemic Module in Dictionary based This module deals with the phonological changes and one distinguishes the following in this module Rules for changing phonological features (eg. ion and ure in completion and exposure) Rules for deleting or inserting phonemes (eg. buses or landed) Rules that account stress shift in languages such as English or German (eg. adApt + ation = adaptation or which doesn’t change like in abOrt + ion = abOrtion). These are achieved by using rewrite rules and by using Two-level rules[Koskenniemi,1983].

LTS Transducer This is the key component that transforms graphemes to phones in the rule based strategy. This is achieved by following Expert rule based systems or trained rule based systems or by neural networks.

Phonetic Post Processing In order to increase the intelligibility and the naturalness of synthetic speech, some kind of phonetic post processing is required. After first phonemic transcription of each word has been obtained, this is applied so as to account for coarticulatory smoothing. This smoothing results in high quality speech.

Syntactic Prosodic Parser Prosody refers to certain properties of the speech signal which are related to audible changes in pitch, loudness, syllable length. This is also referred as intonation. The features of this are focus, relationships between words, finality. These have specific functions in speech communication.

Syntactic Prosodic parser Getting a speech with all those features is impossible. Focuses on obtaining an acceptable segmentation and translates it into the continuation or finality but ignores the relationships or contrastive meaning

Syntactic Prosodic Parser These prosodic groups are achieved by a recent very crude algorithm termed as chinks ‘n chunks by Liberman and Church [1992] in which prosodic phrases are accounted for by the simple regular rule A (minor) prosodic phrase = a sequence of chinks followed by a sequence of chunks

DSP Module Takes in the narrow phonetic transcription and gives out speech as output

Why we need TTS system There are several advantages of a high quality text to speech synthesis system Great use in Telecommunications, relay service, Language Education, aid to handicapped persons, talking books and toys, vocal monitoring, multimedia, man-machine communication etc

Conclusion There is longggg waaaay to reach to have a system similar to HAL (Space Odyssey) Development in technology and gaining interest in NLP makes everyone think optimistic about reaching the goal soon.