Structure of Spoken Language

Slides:



Advertisements
Similar presentations
CS : Speech, NLP and the Web/Topics in AI
Advertisements

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st March, 2011
Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
CS 551/651: Structure of Spoken Language Spectrogram Reading: Approximants John-Paul Hosom Fall 2010.
Sounds that “move” Diphthongs, glides and liquids.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Phonetics.
Hello, Everyone! Review questions  Give examples to show the following features that make human language different from animal communication system:
ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.
Speech Science XII Speech Perception (acoustic cues) Version
Digital Systems: Hardware Organization and Design
The Human Voice. I. Speech production 1. The vocal organs
Phonetics (Part 1) Dr. Ansa Hameed.
Speech Anatomy and Articulation
Spectrogram & its reading
Chapter 6 Features PHONOLOGY (Lane 335).
Phonetics III: Dimensions of Articulation October 15, 2012.
Linguistics I Chapter 4 The Sounds of Language.
Speech Sounds of American English and Some Iranian Languages
The sounds of language Phonetics Chapter 4.
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2010.
English Pronunciation Practice A Practical Course for Students of English By Wang Guizhen Faculty of English Language & Culture Guangdong University of.
1 CS 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
The Sounds of Language. Phonology, Phonetics & Phonemics… Phonology, Phonetics & Phonemics… Producing and writing speech sounds... Producing and writing.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Structure of Spoken Language
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 7.
1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.
CS 551/652: Structure of Spoken Language Lecture 2: Spectrogram Reading and Introductory Phonetics John-Paul Hosom Fall 2010.
Phonetics: Dimensions of Articulation October 13, 2010.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
LIN 3201 Sounds of Human Language Sayers -- Week 1 – August 29 & 31.
Stop Acoustics and Glides December 2, 2013 Where Do We Go From Here? The Final Exam has been scheduled! Wednesday, December 18 th 8-10 am (!) Kinesiology.
Stop + Approximant Acoustics
Ch4 – Features Features are partly acoustic partly articulatory aspects of sounds but they are used for phonology so sometimes they are created to distinguish.
ARTICULATORY PHONETICS
Phonetics Dimensions of Articulation
The Human Voice. 1. The vocal organs
Introduction to Linguistics
Structure of Spoken Language
Structure of Spoken Language
Structure of Spoken Language
Vowels and Consonant Serikova Aigerim.
Structure of Spoken Language
Consonant articulation
Essentials of English Phonetics
Chapter 8 Practice Quiz.
The articulation of consonants
Overview/review Transcription Describing Consonants
Structure of Spoken Language
The Human Voice. 1. The vocal organs
Speech is made up of sounds.
The Vocal Pedagogy Workshop Session III – Articulation
Spoken language phonetics: Transcription, articulation, consonants
Phonetics and Phonemics
Phonetics: The Sounds of Language
An Introduction to the Sound Systems in English and Hindi
Speech Perception (acoustic cues)
Manner of Articulation
What is phonetics? It is the study of the production, transmission and reception of speech sounds. It studies the medium of the spoken language. It looks.
Phonetics and Phonemics
CONSONANTS ARTICULATORY PHONETICS. Consonants When we pronounce consonants, the airflow out of the mouth is completely blocked, greatly restricted, or.
PHONETICS AND PHONOLOGY INTRODUCTION TO LINGUISTICS Lourna J. Baldera BSED- ENGLISH 1.
Presentation transcript:

Structure of Spoken Language CSE 551: Structure of Spoken Language Lecture 6: Characteristics of Place of Articulation; Phonetic Transcription John-Paul Hosom Fall 2004

Acoustic-Phonetic Features: Manner of Articulation Approximately 8 manners of articulation: Name Sub-Types Examples . Vowel vowel, diphthong aa, iy, uw, eh, ow, … Approximants liquid, glide l, r, w, y Nasal m, n, ng Stop unvoiced, voiced p, t, k, b, d, g Fricative unvoiced, voiced f, th, s, sh, v, dh, z, zh Affricate unvoiced, voiced ch, jh Aspiration h Flap dx, nx Change in manner of articulation usually abrupt and visible; manner provides much information about location of phonemes.

Acoustic-Phonetic Features: Place of Articulation Approximately 8 places of articulation for consonants: Name Examples . Labial p, b, m, (w) Labio-Dental f, v Dental th, dh Alveolar t, d, s, z, n, l Palato-Alveolar sh, zh, ch*, jh*, r** Palatal y Velar k, g, ng, (w) Glottal h *may start as alveolar (/t/, /d/) followed by palatal-alveolar ** /r/ is really a retroflex, and has a complex place of articulation Place of articulation more subject to coarticulation than manner; F2 trajectory important for identifying place of articulation.

Acoustic-Phonetic Features: Place of Articulation Labial (/p/, /b/, /m/, /w/): constriction (or complete closure) at lips the only unvoiced labial is /p/ the only nasal labial is /m/ characterized by F1, F2, (even) F3 of adjacent vowel(s) rapidly and briefly decreasing at border with labial

Acoustic-Phonetic Features: Place of Articulation Labio-Dental (/f/, /v/): produced by constriction between upper lip and lower teeth only fricatives are labio-dental in English can be characterized by rising formants into adjacent vowels (similar to characteristics of labials) Dental (/th/, /dh/): produced by constriction between tongue tip and upper teeth (sometimes tongue tip is closer to alveolar ridge) may be characterized by stronger energy above 6 KHz, but weaker than /sh/, /zh/ fricatives

Acoustic-Phonetic Features: Place of Articulation Alveolar (/t/, /d/, /s/, /z/, /n/, /l/): tongue tip is at or near alveolar ridge a large number of English consonants are alveolar primary cue to alveolars: F2 of neighboring vowel(s) is around 1800 Hz, except for /l/ /l/ has low F1 ( 500 Hz) and F2 ( 1000 Hz), high F3

Acoustic-Phonetic Features: Place of Articulation Palato-Alveolar (/sh/, /zh/, /ch/, /jh/, /r/): tongue is between alveolar ridge and hard palate 2 fricatives, 2 affricates, 1 retroflex retroflex has “depression” midway along tongue the palato-alveolar fricatives tend to have strong energy due to weak constriction allowing large airflow /r/ (and /er/) most easily identified by F3 below 2000 Hz Palatal (/y/): produced with tongue close to hard palate “extreme” production of /iy/ F1-F2 tend to be more spread than /iy/, F1 is lower than /iy/

Acoustic-Phonetic Features: Place of Articulation Velar (/k/, /g/, /ng/): produced with constriction against velum (soft palate) only plosives /k/ and /g/, and nasal /ng/ characteristic of velars is the “velar pinch”, in which F2 and F3 of neighboring vowel become very close at boundary with velar. More visible in front vowel /ih/

Acoustic-Phonetic Features: Place of Articulation Glottal (/h/): /h/ is the nominal glottal phoneme in English; in reality, the tongue can be in any vowel-like position the primary cue for /h/ is formant structure without voicing, an energy dip, and/or an increase in aspiration noise in higher frequencies.

Distinctive Phonetic Features: Summary Distinctive features may be used to categorize phonetic sub-classes and show relationships between phonemes There is often not a one-to-one correspondence between a feature value and a particular trait in the speech signal A variety of context-dependent and context-independent cues (sometimes conflicting, sometimes complimentary) serve to identify features Speech is highly variable, highly context-dependent, and cues to phonemic identity are spread in both the spectral and time domains. The diffusion of features makes automatic speech recognition difficult, but human speech recognition is able to use this diffusion for robustness.

+high  low +low  high back  round Redundancy Distinctive features are not always independent; some redundancy may be implied (especially binary features) Example: Spanish i e a o u High +  Low Back Round +high  low +low  high back  round +round  +back +low  +back +low  round back  low +round  low These relationships are language and feature-set specific. (from Schane, p. 35-38)

Redundancy Redundant information can be indicated by circling redundant features: i e a o u High +  Low Back Round Some redundancies are universal (can’t be +high and +low) Phonetic sequences also have constraints (redundant info.): English has no more than 3 word-initial consonants; in this case, first consonant is always /s/; next is always /p/, /t/, or /k/; third is always /r/ or /l/ (from Schane, p. 36-40)

Phonetic Transcription Given a corpus of speech data, it’s often necessary to create a transcription: • word level • phoneme level • time-aligned phoneme level • time-aligned detailed phoneme level (with diacritics) • other information: phonetic stress, emotion, syntax, repair Most common are word-level and time-aligned phoneme level. Time-aligned phonetic transcription examples: 0 110 .pau 110 180 h 180 240 eh 240 280 l 280 390 ow 390 540 .pau t uw .br

Phonetic Transcription Are phonemes precise quantities with exact boundaries? No… humans disagree on phonetic labels and boundary positions; disagreement may be a matter of interpretation of the utterance. Phonetic label agreement between humans: Full Labels Base Labels Broad Categories English 70% 71% 89% German 61% 65% 81% Mandarin 66% 78% 87% Spanish 74% 82% 90% Full, Base Label Set: 55 (English), 62 (German), 50 (Mandarin), 42 (Spanish) Broad Categories: 7 corresponding to manner of articulation *From Cole, Oshika, et al., ICSLP’94

Phonetic Transcription 70% agreement on 55 phonemes, 90% agreement on 7 categories?? Best phoneme-level automatic speech recognition results on TIMIT, with a 39-phoneme symbol set: 75.8% (Antoniou and Reynolds) Differences: Human agreement evaluated on spontaneous speech (stories), TIMIT is read speech Humans used 55 phonemes; 39 phonemes for evaluating TIMIT Phoneme agreement doesn’t translate into word accuracy… human word accuracy is typically an order of magnitude better than the best automatic speech recognition system.

Phonetic Transcription Phonetic label boundary agreement between humans: Agreement measured by comparing two manual labelings, A and B, and computing the percentage of cases in which B labels are within some threshold (20 msec) of A labels. agreement (%) threshold (msec) Average agreement of 93.8% within 20 msec threshold; Maximum agreement of 96% within 20 msec

Phonetic Transcription Is there a “correct” answer? No; inherently subjective although semi-arbitrary guidelines can be imposed. Is measuring accuracy meaningless? No; phonemes do have identity and order, although details may be subjective. Sometimes very precise (if semi-arbitrary) labels and boundaries are extremely important (e.g. concatenative text-to-speech databases). What about getting a computer to generate transcriptions? Advantages: consistent, fast Disadvantages: not accurate, compared to human transcription not robust to different speakers, environments

Phonetic Transcription Automatic Phonetic Alignment (assume phonetic identity is known): Two common methods: “Forced Alignment”: Use existing speech recognizer, constrained to recognize only the “correct” phoneme sequence. The search process used by HMM recognizers returns both phoneme identity and location. Location information is boundary information. (2) Dynamic Time Warping: (a) Use text-to-speech or utterance “templates” to generate same speech content with known boundaries. (b) Warp time scale of reference (TTS or template) with input speech to minimize spectral error. (c) Convert known boundary locations to original time scale.

Phonetic Transcription Accuracy of automatic alignment Speaker-independent alignment using Forced Alignment: agreement (%) threshold (msec)

Phonetic Transcription Comparing manual and automatic alignment of TIMIT corpus: Automatic method still makes “stupid” mistakes. Manual labeling criteria not rigorously defined. Performance degrades significantly in presence of noise. Assumes correct phonetic sequence is known…