Pronunciation Variation: TTS & Probabilistic Models CMSC 35100 Natural Language Processing April 10, 2003.

Slides:

Advertisements

Similar presentations

CS : Speech, NLP and the Web/Topics in AI

Advertisements

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st March, 2011

Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.

1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.

Chapter 2 phonology. The phonic medium of language Speech is more basic than writing. Reasons? Linguists studies the speech sounds.

Hello, Everyone! Review questions  Give examples to show the following features that make human language different from animal communication system:

The sound patterns of language

Phonetics The study of the sounds of spoken language.

Digital Systems: Hardware Organization and Design

Chapter two speech sounds

Session 1: Basics of English phonetics

Speech sounds Articulation.

Natural Language Processing - Speech Processing -

CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech.

Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.

Chapter 6 Features PHONOLOGY (Lane 335).

Learning, Uncertainty, and Information Big Ideas November 8, 2004.

Recap: Vowels & Consonants V – central “sound” of the syllable C – outer “shell” of the syllable (C) V (C) (C)(C)(C)V(C)(C)(C)

COMP 4060 Natural Language Processing Speech Processing.

Chapter 2 Introduction to articulatory phonetics

Step 1: Memorize IPA - practice quiz today - real quiz on Tuesday (over consonants)! Phonology is about looking for patterns and arguing your assessment.

Classification of English vowels

Last minute Phonetics questions?

Speech Sounds of American English and Some Iranian Languages

The sounds of language Phonetics Chapter 4.

Phonological Processes

English Pronunciation Practice A Practical Course for Students of English By Wang Guizhen Faculty of English Language & Culture Guangdong University of.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Chapter 2 Speech Sounds Phonetics and Phonology

The Sounds of Language. Phonology, Phonetics & Phonemics… Phonology, Phonetics & Phonemics… Producing and writing speech sounds... Producing and writing.

An Introduction to Linguistics

Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.

Phonology, part 4: Distinctive Features

Phonological and Phonemic Awareness Jeanne M. Maggiacomo Spring 2014 EDC424.

Phonological Theory.

1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.

Phonetics Class # 2 Chapter 6. Homework (Ex. 1 – page 268)  Judge [d ] or [ ǰ ]  Thomas [t]  Though [ ð ]  Easy [i]  Pneumonia [n]  Thought [ θ.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-27: Phonology (quiz took place on 12/10/09; Lect 26.

Phonetics: Dimensions of Articulation October 13, 2010.

Entropy & Hidden Markov Models Natural Language Processing CMSC April 22, 2003.

Part aspiration (p. 56) aspiration, a period of voicelessness after the stop articulation and before the start of the voicing for the vowel.

Phonetics 2. Phonology 2.1 The phonic medium of language Sounds which are meaningful in human communication constitute the phonic medium of language.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Stops Stops include / p, b, t, d, k, g/ (and glottal stop)

Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.

Statistical NLP Spring 2011

Introduction to Language Phonetics 1. Explore the relationship between sound and spelling Become familiar with International Phonetic Alphabet (IPA )

Phonetics Definition Speech Organs Consonants vs. Vowels

ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. How each of the phonemes in English is articulated 2. The differences.

[fon Є tiks and fon Ɔ logi] Weeks 2-4 [wiks tu to for] Phonetics and Phonology.

LIN 3201 Sounds of Human Language Sayers -- Week 1 – August 29 & 31.

Phonetics Description and articulation of phones.

Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.

Welcome to all.

ARTICULATORY PHONETICS

Structure of Spoken Language

Course: Linguistics Lecturer: Phoenix Xu

Vowels and Consonant Serikova Aigerim.

Consonant articulation

Essentials of English Phonetics

Chapter 8 Practice Quiz.

Structure of Spoken Language

Phonetics & Phonology of English: How & Why We Speak the Way We Do

Jennifer J. Venditti Postdoctoral Research Associate

Structure of Spoken Language

Phonetics and Phonemics

CONSONANTS ARTICULATORY PHONETICS. Consonants When we pronounce consonants, the airflow out of the mouth is completely blocked, greatly restricted, or.

PHONETICS AND PHONOLOGY INTRODUCTION TO LINGUISTICS Lourna J. Baldera BSED- ENGLISH 1.

Presentation transcript:

Pronunciation Variation: TTS & Probabilistic Models CMSC Natural Language Processing April 10, 2003

Fast Phonology Consonants: Closure/Obstruction in vocal tract Place of articulation (where restriction occurs) –Labial: lips (p, b), Labiodental: lips & teeth (f,v) – Dental: teeth: (th,dh) –Alvoelar:roof of mouth behind teeth (t.d) –Palatal: palate: (y); Palato-alvoelar: (sh, jh, zh)… –Velar: soft palate (back): k,g ; Glottal Manner of articulation (how restrict) –Stop (t): closure + release; plosive (w/ burst of air) –Nasal (n): nasal cavity –Frictative (s,sh,) turbulence: Affricate: stop+fricative (jh, ch) –Approximant (w,l,r) –Tap/Flap: quick touch to alvoelar ridge

Fast Phonology Vowels: Open vocal tract: Articulator position Vowel height: position of highest point of tongue –Front (iy) vs Back (uw) –High: (ih) vs Low (eh) –Diphthong: tongue moves: (ey) Lip shape –Rounded: (uw)

Phonological Variation Consider t in context: –-talk: t – unvoiced, aspirated –-stalk: d – often unvoiced –-butter: dx – just flap, etc Can model with phonological rule –Flap rule: {t,d} -> [dx]/V’__V T,d becomes flap when between stressed & unstressed vowel

Phonological Rules & FSTs Foxes redux: –[ix] insertion: e:[ix] [+sibilant]:^_z q5q3q4 q0q1q2 ^:e, other # +sib other ^:e #,other # ^:e S,sh e:ix z:

Harmony Vowel harmony: –Vowel changes sound be more similar to other E.g. assimilate to roundness and backness of preceding Yokuts examples: –dub+hin -> dubhun –xil+hin -> xilhin –Bok’+al -> bok’ol –Xat+al -> xatal Can also be handled by FST

Text-to-Speech Key components: –Pronouncing dictionary –Rules Dictionary: E.g. CELEX, PRONLEX, CMUDict –List of pronunciations Different pronunciations, dialects Sometimes: part of speech, lexical stress –Problem: Lexical Gaps E.g. Names!

TTS: Resolving Lexical Gaps Rules applied to fill lexical gaps –Now and then Gaps & Productivity: –Infinitely many; can’t just list Morphology Numbers –Different styles, contexts: e.g. phone number, date,.. Names –Other language influences

FST-based TTS Components: –FST for pronunciation of words & morphemes in lex –FSA for legal morpheme sequences –FSTs for individual pronunciation rules –Rules/transducers for e.g. names & acronyms –Default rules for unknown words

FST TTS Enrich lexicon: –Orthographic + Phonological E.g. cat = c|k a|ae t|t; goose = g|g oo|uw s|s e|e Build FST for lexicon to intermediate –Use rich lexicon Build FSTs for pronunciation rules Names & Acronyms: Liberman&Church: wd list Generalization rules –Affixes: s, ville, son..; Compounds –Rhyming rules

Probabilistic Pronunciation Sources of variation: –Lexical variation: Represent in lexicon Differences in what segments form a word –E.g. vase, brownsville –Sociolinguistic variation: e.g. dialect, register, style –Allophonic variation: Differences in segment values in context –Surface form: phonetic & articulatory effects »E.g. t: about –Coarticulation: Dis/Assimilation, Deletion, Flapping, Vowel reduction, epenthesis

The ASR Pronunciation Problem Given a series of phones, what is the most probable word? Simplification: Assume phone sequence known, word boundaries known Approach: Noisy channel model Surface form is an instance of lexical form that has passed through a noisy communication path Model channel to remove noise, find original

Bayesian Model Pr(w|O) = Pr(O|w)Pr(w)/P(O) Goal: Most probable word –Observations held constant –Find w to maximize Pr(O|w)*Pr(w) Where do we get the likelihoods? – Pr(O|w) –Probabilistic rules (Labov) Add probabilities to pronunciation variation rules –Count over large corpus of surface forms wrt lexicon Where do we get Pr(w)? –Similarly – count over words in a large corpus

Automatic Rule Induction Decision trees –Supervised machine learning technique Input: lexical phone, context features Output: surface phone –Approach: Identify features that produce subsets with least entropy Repeatedly split inputs on features until some threshold Classification: –Traverse tree based on features of new inputs –Assign majority classification at leaf

Weighted Automata Associate a weight (probability) with each arc - Determine weights by decision tree compilation or counting from a large corpus start ax ix b aw aedx tend Computed from Switchboard corpus

Forward Computation For a weighted automaton and a phoneme sequence, what is its likelihood? –Automaton: Tuple Set of states Q: q0,…qn Set of transition probabilities between states aij, –Where aij is the probability of transitioning from state i to j Special start & end states –Inputs: Observation sequence: O = o1,o2,…,ok –Computed as: forward[t,j] = P(o1,o2…ot,qt=j|λ)p(w)=Σi forward[t-1,i]*aij*bjt –Sums over all paths to qt=j

Viterbi Decoding Given an observation sequence o and a weighted automaton, what is the mostly likely state sequence? –Use to identify words by merging multiple word pronunciation automata in parallel –Comparable to forward Replace sum with max Dynamic programming approach –Store max through a given state/time pair

Viterbi Algorithm Function Viterbi(observations length T, state-graph) returns best-path Num-states<-num-of-states(state-graph) Create path prob matrix viterbi[num-states+2,T+2] Viterbi[0,0]<- 1.0 For each time step t from 0 to T do for each state s from 0 to num-states do for each transition s’ from s in state-graph new-score<-viterbi[s,t]*at[s,s’]*bs’(ot) if ((viterbi[s’,t+1]=0) || (viterbi[s’,t+1]<new-score)) then viterbi[s’,t+1] <- new-score back-pointer[s’,t+1]<-s Backtrace from highest prob state in final column of viterbi[] & return