Structure of Spoken Language

Slides:



Advertisements
Similar presentations
Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Advertisements

CS 551/651: Structure of Spoken Language Spectrogram Reading: Approximants John-Paul Hosom Fall 2010.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2008.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Acoustic Characteristics of Vowels
Chapter 2 phonology. The phonic medium of language Speech is more basic than writing. Reasons? Linguists studies the speech sounds.
Hello, Everyone! Review questions  Give examples to show the following features that make human language different from animal communication system:
Phonology, part 5: Features and Phonotactics
ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.
Digital Systems: Hardware Organization and Design
The Human Voice. I. Speech production 1. The vocal organs
ACOUSTICAL THEORY OF SPEECH PRODUCTION
Chapter two speech sounds
Phonetics (Part 1) Dr. Ansa Hameed.
Speech Anatomy and Articulation
Chapter 6 Features PHONOLOGY (Lane 335).
Chapter 2 Introduction to articulatory phonetics
Chapter 3 Phonetics: Describing Sounds. Phonetics -study of speech sounds Sounds and symbols --use a system of written symbols --one sound represents.
Speech Sounds of American English and Some Iranian Languages
The sounds of language Phonetics Chapter 4.
1 CS 551/651: Structure of Spoken Language Lecture 4: Characteristics of Manner of Articulation John-Paul Hosom Fall 2010.
Structure of Spoken Language
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
PHONETICS & PHONOLOGY COURSE WINTER TERM 2014/2015.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
An Introduction to Linguistics
Phonology, part 4: Distinctive Features
Phonological and Phonemic Awareness Jeanne M. Maggiacomo Spring 2014 EDC424.
1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.
CS 551/652: Structure of Spoken Language Lecture 2: Spectrogram Reading and Introductory Phonetics John-Paul Hosom Fall 2010.
Phonetics Class # 2 Chapter 6. Homework (Ex. 1 – page 268)  Judge [d ] or [ ǰ ]  Thomas [t]  Though [ ð ]  Easy [i]  Pneumonia [n]  Thought [ θ.
Phonetics: Dimensions of Articulation October 13, 2010.
Structure of Spoken Language
What is phonetics? Phonetics is the scientific study of speech sounds. It consists of three main sub-fields:  Articulatory phonetics  = how speech sounds.
Statistical NLP Spring 2011
Chapter II phonology II. Classification of English speech sounds Vowels and Consonants The basic difference between these two classes is that in the production.
Introduction to Language Phonetics 1. Explore the relationship between sound and spelling Become familiar with International Phonetic Alphabet (IPA )
Stop + Approximant Acoustics
Ch4 – Features Features are partly acoustic partly articulatory aspects of sounds but they are used for phonology so sometimes they are created to distinguish.
Phonetics Description and articulation of phones.
Welcome to all.
PHONETICS AND PHONOLOGY
CSE 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2005.
Phonetics : The Sounds of Language The science of phonetics attempts to describe all of the sounds used in all languages of the world. Knowing a language.
ARTICULATORY PHONETICS
Linguistics: Phonetics
Structure of Spoken Language
The Human Voice. 1. The vocal organs
Structure of Spoken Language
Vowels and Consonant Serikova Aigerim.
Structure of Spoken Language
Sounds of Language: fənɛ́tɪks
Essentials of English Phonetics
Structure of Spoken Language
The Human Voice. 1. The vocal organs
Introduction to Linguistics
Speech is made up of sounds.
Structure of Spoken Language
Phonetics & Phonology.
Spoken language phonetics: Transcription, articulation, consonants
Phonetics and Phonemics
Chapter 2 Phonology.
Speech Perception (acoustic cues)
Phonetics and Phonemics
CONSONANTS ARTICULATORY PHONETICS. Consonants When we pronounce consonants, the airflow out of the mouth is completely blocked, greatly restricted, or.
PHONETICS AND PHONOLOGY INTRODUCTION TO LINGUISTICS Lourna J. Baldera BSED- ENGLISH 1.
Presentation transcript:

Structure of Spoken Language CSE 551/652: Structure of Spoken Language Lecture 2: Spectrogram Reading and Introductory Phonetics John-Paul Hosom Fall 2005

Review: Power Spectrum DFT: in which x(k) is the amplitude at time sample k, n is a frequency value from 0 to N-1, N is the number of samples or frequency points of interest, and X(n) is the spectral-domain representation of x(k).

Note that the resonant frequencies, or formants, for the two Review: Formants Note that the resonant frequencies, or formants, for the two vowels /aa/ and /iy/ can be identified in the spectra. For recognition of phonemes, the spectral envelope is important (envelope = shape of spectrum without harmonics) ? ? envelope 0 1K 2K 3K 4K 0 1K 2K 3K 4K /aa/ 2048 samples /iy/ 2048 samples

Review: Visualization of the Speech Signal: Spectrograms The FFT window size has a large impact on visual properties: /aa/ freq (Hz) amp (FFT size = 5 msec) “wideband” = small time window = small FFT size /aa/ freq (Hz) (FFT size = 33 msec) “narrowband” = large time window = large FFT size

Review: Spectrogram Reading: Vowels 12 English vowels (not all are phonemic), 8 or 9 phonemic vowels: /iy/ beet (front, high, unrounded, tense) ! /ih/ bit (front, high, unrounded, lax) ! /eh/ bet (front, mid, unrounded, lax) ! /ae/ bat (front, low, unrounded, lax) ! /ix/* roses (back, high, unrounded, lax) (subst. /ih/) /ux/* suit (back, high, rounded, lax) (subst. /uw/) /ax/ * above (back/central, mid, unrounded, lax) (subst. /ah/) /uw/ boot (back, high, rounded, tense) ! /uh/ book (back, high, rounded, lax) ! /ah/ above (back/central, mid, unrounded, lax) ! /ao/ caught (back, low, rounded, tense) (subst. /aa/) /aa/ father (back, low, unrounded, tense) ! * these vowels are more centralized and shorter in duration

Review: Spectrogram Reading: Vowels 6 English diphthongs: /ey/ bay (front, midhigh, unrounded, tense) /ay/ bye (backfront, lowhigh, unrounded, tense) /oy/ boy (backfront, midhigh, roundedunrounded, tense) /yu/ beauty (frontback, high, unroundedrounded, tense) /aw/ about (back, midhigh, unroundedrounded, tense) /ow/ boat (back, mid, unroundedrounded, tense)

Review: Spectrogram Reading: Vowels Vowel formant frequencies (averages for English, males only): *from Ladefoged, p. 193

Visualization of the Speech Signal: Formants These formants can be modeled by a “damped sinusoid”, which has the following representations: where S(f) is the spectrum at frequency value f, A is overall amplitude, fc is the center frequency of the damped sine wave, and  is a damping factor (<< fc). [Olive, p. 48, 58] center freq. fc amplitude power (dB) 0 dB  time (msec) frequency (Hz)

Visualization of the Speech Signal: Formants The bandwidth is defined as the width of the spectral peak measured at the point where the linear spectral magnitude value is ½ the maximum value. A reduction of the signal by a factor of 2 is equivalent to a 3 dB change. 3 dB 0 dB power (dB) bandwidth frequency (Hz) Also, the resonator must have a value slightly less than 0 dB at 0 Hz.

Visualization of the Speech Signal: Formants Formants are specified by a frequency, F, and bandwidth, B. A neutral vowel (/ax/) theoretically has formants at 500 Hz, 1500 Hz, 2500 Hz, 3500 Hz, etc. The first formant is called F1, the second is called F2, etc. (The fundamental frequency, or pitch, is F0.) F1, F2, and sometimes F3 are usually sufficient for identifying vowels. Formants can be thought of as filters, which act on the source waveform. For vowels, the source waveform is air pushed through the vibrating vocal folds. Energy is lost (hence a damped sinusoid model) by sound absorption in the mouth. A digital model of a formant can be implemented using an infinite-impulse response (IIR) filter.

Visualization of the Speech Signal: Excitation/Source The vocal-fold vibration source looks like this: (Note: there are some gross simplifications here… we’ll go into more detail later in the course.) In fricatives and other unvoiced speech, the source is turbulent air: -6 dB/octave amplitude power (dB) time (msec) frequency (Hz) flat slope amplitude power (dB) time (msec) frequency (Hz)

Visualization of the Speech Signal: Pre-Emphasis Because the source for voiced sounds decreases at –6 dB/octave, a simple filter can be used to increase the spectral tilt by +6 dB/octave, thereby making voiced sounds spectrally flat and easier to visualize. (NOTE: unvoiced sounds then have spectral slope of +6 dB/octave) where x(n) is the time-domain speech signal at sample number n, and x(n) is the pre-emphasized speech signal at sample n. -6 dB/octave 0 dB/octave power (dB) frequency (Hz) frequency (Hz)

Spectrogram Reading: Vowels Vowel formants, Peterson and Barney data:

Spectrogram Reading: Vowels Ratios of 1st and 2nd formant, from Miller (1989) based on Peterson and Barney (1952) data:

Spectrogram Reading Why bother?? What’s the point of spectrogram reading? Do people read spectrograms as part of their job? Do computers “read” spectrograms in order to recognize speech? There are some jobs that require spectrogram reading (e.g. phonetic time alignment), but not many. Automatic speech recognition systems do not process speech in this way. Primary reason for spectrogram reading: If you’re going to work on a problem, it’s advisable to understand the nature of that problem. Spectrogram reading provides a direct method for “hands-on” learning of the characteristics of speech. Studying phonetics, signal processing, or techniques in speech recognition/speech synthesis does not fully convey of the complexity and structure of spoken language.

Phonetics: Introduction Phonology: A description of the systems and patterns of sounds that occur in a language (abstract). Phonetics: A branch of phonology that deals with individual speech sounds, their production, and their written representation. Phoneme: • A unit of speech that can be used to differentiate words (e.g. “cat” /k ae t/ vs. “bat” /b ae t/). • Phonemes identify minimal pairs in a language. • The set of phonemes in a language subject to interpretation; most languages have 20 to 40 phonemes.

Phonetics: Introduction An acoustic realization of a phoneme. (Many different phones may represent the same phoneme.) Allophone: A speech sound constituting one of the systematic phonetic variants of a given phoneme. Different allophones are predictable from environment (e.g. “toe”, “caught”, “fitness”, “writer”; “sill”, “still”, “spill”) “The phoneme /s/ consists of more than 100 allophones” − Pickett, The Acoustics of Speech Communication, p. 7. Phonemes indicated by / /; phones (allophones) indicated by [ ].

Phonetics: Introduction Syllable: • Unit of speech containing one or more phonemes. • A vowel in a syllable is called the syllable nucleus. • Most syllables contain one vowel (or diphthong); some contain only a lateral (“bott/le”) or nasal (“butt/on”) as the most intense sound. • Syllable boundaries sometimes ambiguous (“tas/ty” vs. “tast/y” vs. “ta/sty”) Coarticulation: The “blending” of two or more adjacent phones, causing a non-distinct boundary between them. Coarticulation is caused by smooth changes in the articulators (lips, tongue, jaw).

Phonetics: Introduction Coarticulation Example: y uw aa r “you are”: /y uw aa r/

Phonetics: Introduction (adapted from Schane, p. 4-6) Speech signal is continuous; we perceive discrete entities. (How many sound units are in the word “cat”?) One assumption of phonology: utterances can be represented as sequence of discrete units. Are such units purely an “invention” of linguistics? Spoonerisms (“belly jeans” vs. “jelly beans”) and rhymes indicate small units of language (Reverend William Archibald Spooner (1844-1930)) Utterances of the same word(s) have many differences… we’re usually only interested in those differences that are “linguistically significant” or that are “perceived as different”. Implies a somewhat subjective nature to phonology, whereas we want an objective measure of perceived or produced units.

Phonetics: Distinctive Phonetic Features • Phonemes do not differ randomly from one another; there are relationships among phonemes (e.g. /p/ vs. /t/ vs. /ah/) • A (distinctive) feature is a “phonetic property that can be used to classify sounds” [Ladefoged, p. 42] • Typically, features are associated with aspects of articulation • Features may be binary or multi-valued • Capital letters indicate feature name: Manner square brackets [] indicate feature value: [+fricative]

Phonetics: Distinctive Phonetic Features • Exact set of features and feature values depends on goals (no “right” or “wrong” set of features or values) • Distinctive features provide a vocabulary for describing speech • Are distinctive features purely an “invention” of linguistics? memory tasks show that when people forget a phoneme, they usually remember a phoneme with similar distinctive features

Phonetics: Distinctive Phonetic Features nasal tract (hard) palate velic port oral tract alveolar ridge velum (soft palate) lips tongue teeth pharynx glottis (space between vocal cords) tongue tip vocal cords (larynx) The Speech Production Apparatus (from Olive, p. 23)

Phonetics: Distinctive Phonetic Features* Feature Description _ Consonantal produced with a constriction along center line of oral cavity. Only vowels, /w/, /h/, and /y/ are not. Vocalic largely unobstructed vocal tract. Vowels and liquids (/l/, /r/) are vocalic; glides (/w/, /y/) are not. Anterior point of articulation near alveolar ridge, including all labial and dental sounds. Coronal articulation involves front of tongue Continuant no complete obstruction in oral cavity; only nasals, stops, and affricates are non-continuant Strident articulation with long, narrow constriction; such as /s/, /z/, /f/, /v/, /sh/, /zh/, /ch/, /jh/ Voiced vibration of the vocal folds occurs during articulation

Phonetics: Distinctive Phonetic Features* Feature Description _ Lateral contact between corona of tongue and roof of mouth, with lowering of sides of tongue (only /l/ in English) Nasal lowering of the velic port and opening of nasal cavity. High vowel with high tongue position (narrow constriction); in English, /iy/, /ih/, /uh/, /uw/ Low vowel with low tongue position (no constriction); /ae/, /ao/, /aa/ are (some) low vowels in English. Back vowels produce with tongue toward back of mouth; /uw/, /uh/, /ah/, /ao/, /aa/, /ow/ are back vowels Round articulation involving rounding of the lips; only /uw/, /ow/, ao/, and /uh/ are rounded in English. However, /uh/ may take an unrounded form. *Adapted from “Language” by C.E.Cairns and F. Williams in Normal Aspects of Speech, Hearing, and Language, edited by Minifie, Hixon, and Williams, 1973, p. 424, as printed in Daniloff p. 51.

Phonetics: More Distinctive Phonetic Features* Feature Description _ Sonorant “resonant quality” of a sound; vowels are +sonorant, stops and fricatives are –sonorant. Syllabic is the phoneme the main sound in a syllable? vowels are syllabic, stops are usually –syllabic, but there are syllabic nasals and liquids. Tense tense vowels are longer, more fully articulated, and more “distinct,” e.g. /iy ey uw ow aa/; lax vowels are less so, e.g. /ih eh uh ah/. Aspirated produced without a constriction in the vocal tract, but also without voicing (/h/). Glottalized produced with aperiodic or extremely low-frequency vibrations of the vocal cords. Diphthong a single phoneme composed of two or more other phonemes in sequence (/ay/, /oy/, /ei/, /aw/, /ow/) * from Schane, pp. 26-32

Phonetics: Distinctive Phonetic Features Physiological Features: • Manner stop /p/, fricative /s/, affricate /ch/, liquid /l/, /r/, glide /j/, /w/, nasal /m/, vowel /ah/, aspiration /h/ • Place bilabial /p/, labiodental /f/, dental /th/, alveolar /t/, palato-alveolar /r/, palatal /sh/, velar /k/, glottal /h/, front /iy/, mid /ah/, back /aa/ (can combine mid + back) • Height high /iy/, mid-high /ih/, mid /ax/, mid-low /eh/, low /aa/ or high /iy/, mid /eh/, low /aa/ (3 values, plus tense/lax) • Tense, Nasality, Rounding same as previous descriptions

Phonetics: Distinctive Feature Relationships: Vowels Front Back Unrounded Rounded High i (iy) ü i (ix) u (uw) Mid e (eh) ö ^ (ah) o (ow) Low æ (ae) œ a (aa)  (ao) Front, –Round Back, +Round Back, –Round Tense Lax High iy ih uw uh ix Mid ey eh ow ah, ax† Low ae ao aa * from Schane, pp. 12-13. †/ax/ is slightly more centralized than /ah/, and shorter in duration

Phonetics: Distinctive Phonetic Features: The Case of /ae/ /ae/ is classified in the preceding table as “lax”, but we have been considering it as “tense”. One Rule for Differentiating Tense/Lax: A lax vowel can never be a word-final stressed vowel e.g. /iy/ can be word final: “be” /b iy/, “tea” /t iy/ /ih/ can not be word final in one-syllable word: /b ih/, /t ih/ /ah/ can be word final, but only if unstressed. According to this rule, both /eh/ and /ae/ are lax, because they can not be word-final stressed vowels. In this case, the tense vowel in contrast to /eh/ is /ey/. However, /ae/ is long in duration (e.g. Forgie and Forgie (1959) and Peterson and Lehiste (1960)), making it acoustically more similar to a tense vowel. For spectrogram reading, we’re more concerned with acoustics, so we’ll call /ae/ a tense vowel, although others may call it lax.

Phonetics: Distinctive Phonetic Features: The Case of /ae/ Looking at 130,000 words in the CMU dictionary: PHN CNT PCNT EXAMPLES /iy/ 12945 0.10002 /ih/ 15 0.00012 “chui”, “des”, “kiwani”, “lui”, “moishe”, “pih”, “to” /eh/ 30 0.00023 “bienvenue”, “des”, “eh”, “moshe”, “yahweh”, “zeh” /ae/ 5 0.00004 “dhaka”, “lashua”, “losoya”, “pah”, “yeah” /uw/ 714 0.00552 /uh/ 2 0.00002 “l’heureux”, “milieu” /ah/ 6413 0.04955 /aa/ 170 0.00131 /ao/ 243 0.00188 /ey/ 962 0.00743 /ay/ 379 0.00293 /oy/ 167 0.00129 /yu/ 171 0.00132 /aw/ 226 0.00175 /ow/ 5137 0.03969 0.21280 21% of words end in vowel/diphthong

Phonetics: Distinctive Feature Relationships: Vowels Front Central Back iy ju uw High ih uh ey ix oy ow Mid ax ao eh ay ah aw Low ae aa from Ladefoged, pp. 38, 81

Phonetics: Distinctive Feature Relationships: Consonants Manner Voicing bilabial labio-dental dental alveolar palato- alveolar palatal velar glottal stops +voice b d g -voice p t k fricatives v dh z zh h f th s sh affricates jh ch nasals m n ng glides w y (w) retroflex r lateral l approximant obstruent from Olive, p. 28 and Daniloff, p. 56

m n ng p b t d k g ch jh s z sh zh f v th dh w r y l Phonetics: Distinctive Feature Relationships: Consonants Labial Coronal Dorsal -sibilant +nasal m n ng stop -nasal p b t d k g +sibilant ch jh s z sh zh fricative f v th dh -lateral w r y approximant +lateral l +anterior -anterior from Ladefoged, p. 44

Approximants: Terminology “Approximants” are NOT the same as “Semi-Vowels” (although Rabiner thinks they are the same…). American English /r/ is debatable, but we’ll exclude it from the Semi-Vowels for consistency. (Ladefoged p. 229) Approximants can be divided into two groups: Liquids and Glides Liquid = {/l/, /r/}, Glide = {/w/, /y/} (Again, Rabiner confuses things by mixing up these sets) Lateral = {/l/} Retroflex = {/r/, /er/, /axr/}. (In some cases, /er/ is considered a retroflex but /r/ isn’t; we’ll keep things simple by calling /r/ a retroflex). Central Approximants = {/r/, /w/, /y/}, Lateral Approximant = {/l/}

Approximant Semi-Vowel / Glide Liquid Retroflex Lateral Approximants: Terminology Approximant Semi-Vowel / Glide Liquid /y/ /w/ Retroflex Lateral /r, er, axr/ /l/ central approximants lateral approximant