Speech acoustics and phonetics Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) NATO-ASI “Dynamics.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Analysis of Spoken Language Department of General & Comparative Linguistics Christian-Albrechts-Universität zu Kiel Oliver Niebuhr 1 Vowel.
Sounds that “move” Diphthongs, glides and liquids.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
Acoustic Characteristics of Vowels
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Major branches of phonetics 1. Experimental – How are speech sounds studied? 2. Articulatory – How are speech sounds produced? 3. Acoustic – What is the.
Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.
From speech signal acoustics to perception Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC)
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Results ISI Variance in STP Corpus ISI Variance in BU Corpus * p
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Looking at Spectrogram in Praat cs4706, Jan 30 Fadi Biadsy.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and Tempo.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Sound and Speech. The vocal tract Figures from Graddol et al.
Morphological information and acoustic salience in Dutch compounds Victor Kuperman, IWTS Radboud University Nijmegen.
Praat Fadi Biadsy.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
An Elitist Approach to Articulatory-Acoustic Feature Classification in English and in Dutch Steven Greenberg, Shawn Chang and Mirjam Wester International.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Phonetic features in ASR: a linguistic solution to acoustic variation? Jacques Bistra Attilio.
SPEECH PERCEPTION DAY 16 – OCT 2, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Human and Machine Performance in Speech Processing Louis C.W. Pols Institute of Phonetic Sciences / ACLC University of Amsterdam, The Netherlands (Apologies:
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
Data Sampling & Progressive Training T. Shinozaki & M. Ostendorf University of Washington In collaboration with L. Atlas.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Assessment of Phonology
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
A Fully Annotated Corpus of Russian Speech
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Robust speaking rate estimation using broad phonetic class recognition Jiahong Yuan and Mark Liberman University of Pennsylvania Mar. 16, 2010.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Speech in the DHH Classroom A new perspective. Speech in the DHH Bilingual Classroom Important to look beyond the traditional view of speech Think of.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
SPEECH DYNAMICS Louis C.W. Pols ICPhS 2011 Keynote 'Speech Dynamics' Louis Pols 2 Speech Dynamics - There is no stationary speech (perhaps apart from.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Audio Books for Phonetics Research
Speech Perception (acoustic cues)
Presentation transcript:

Speech acoustics and phonetics Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002

July 1st, 2002Speech acoustics and phonetics, Il Ciocco2 Overview Dynamics in speech acoustics Contour modeling (mainly formants) Aspects of spectral undershoot Modeling V and C reduction Phonetic knowledge from speech corpora IFA, CGN, TIMIT, found speech Conclusions

July 1st, 2002Speech acoustics and phonetics, Il Ciocco4 Dynamics in speech acoustics Dynamics is the norm, not stationarity articulatory efficiency Dynamics is everywhere generally no word boundaries in speech deletion of words, syllables, phonemes; insertion within/between word coarticulation/assimilation vowel and consonant reduction Acoustic manifestations segment duration, F0, loudness, spectral quality

July 1st, 2002Speech acoustics and phonetics, Il Ciocco5 Dynamics is the norm The speaker speaks as sloppily as the listeners allow him to do in communication communicative efficiency Articulatory vs. perceptual efficiency do spectral transitions facilitate or hamper perception? —> see other presentation Speaker flexibility; speaking style (clear vs. sloppy); speaking rate

July 1st, 2002Speech acoustics and phonetics, Il Ciocco6 Dynamics is everywhere Deletion ‘bread and butter’ /brEmbY3/ ‘Amsterdam’ (Du) ‘koninklijke’ (Du) Insertion homorganic glide insertion: ‘die een’ (Du) Degemination ‘is zichtbaar’ (Du) /Is zIxtbar/ —>/IsIxbar/ Reduction, coarticulation, assimilation

July 1st, 2002Speech acoustics and phonetics, Il Ciocco7 Acoustic manifestations pitch, loudness, formant, component contours contour stylization (e.g., pitch in praat)praat contour modeling n-th degree curve fitting(D.van Bergem) Legendre polynomials)(R.van Son) 16 points per segment) (phoneme) segmentation by hand (time consuming; non-consistent) automatically (via forced phoneme recognition and a pronunciation lexicon with alternatives; systematic errors)

July 1st, 2002Speech acoustics and phonetics, Il Ciocco8 Contour modeling allows modeling of specific phenomena pitch accentuation (vs. vowel onset) reduction, centralization, undershoot allows generation of stimuli for perc. expts. phoneme identification in extending context 2-alternatives forced choice identif. of continua discrimination, RT allows statistics on large speech corpora TIMIT, CGN, IFA-corpus, Switchboard

July 1st, 2002Speech acoustics and phonetics, Il Ciocco9 Static vs. dynamic V recogn. see Weenink (2001) “Vowel normalizations with the TIMIT acoustic phonetic speech corpus”, IFA Proc. 24, males, both train & test sent. of TIMIT 35,385 vowel segments, hand segmented 13 monophthongeal vowel categories 1-Bark bandfilter anal. (18), intensity. normal. 3 frames per segment: central and 25 ms L/R

July 1st, 2002Speech acoustics and phonetics, Il Ciocco10 Some results Vowel classif. (%) with discriminant functions Condition# ItemsStatic 1 frame Dynamic 3 frames Original35, x13x(1…25) speaker normalized 35, V centers per speaker 5, x speaker normalized 5,

July 1st, 2002Speech acoustics and phonetics, Il Ciocco11 Formant tracks / speaking rate Ph.D. thesis Rob van Son (1993) “Spectro-temporal features of vowel segments” see also Speech Comm. 13, (Pols & vSon) 850-words text, read at normal and fast rate hand segmentation of 7 most freq. V + schwa formant tracks via 16 points per segm. or 5 Legendre polynomials influence of rate, V-dur., context, sent. acc. evidence for duration-controlled undershoot?

July 1st, 2002Speech acoustics and phonetics, Il Ciocco12 Some results no differences for F1/F2 in vowel center for normal- or fast-rate speech; only some over- all rise in F1 for fast rate (irrespective of V) same formant track shape (normalized to 16 points) for normal- or fast-rate speech same results when using the more elaborate Legendre polynomials Concl.: changes in V-duration do not change the amount of undershoot —> active control of articulation speed

July 1st, 2002Speech acoustics and phonetics, Il Ciocco13 Formant representations zeroth order Legendre Legendre polynomial coefficients (mean F i in vowel segment) second order polynomials (axes reversed) e e

July 1st, 2002Speech acoustics and phonetics, Il Ciocco14 Modeling vowel reduction Ph.D. thesis Dick van Bergem (1995) “Acoustic and lexical vowel reduction” see also Speech Communication 16, lexical V reduction Fr /betõ/ vs. Du acoustic V reduction /banan, bAnan, f(sent. acc., w. str., w. class): can-candy-canteen coarticulatory effects on the schwa C 2 V- and VC 2 -type nonsense words perceptual effects (full V or schwa, f.i. ‘ananas’)

July 1st, 2002Speech acoustics and phonetics, Il Ciocco15 Some results The schwa is not just a centralized vowel but something that is completely assimilated with its phonemic context t-nw-l

July 1st, 2002Speech acoustics and phonetics, Il Ciocco16 Modeling consonant reduction Sp. Comm. (1999) 28, (vSon & Pols) 20 min. speech, both spontaneous and read 2 x 791 similar VCV; hand segmented 5 aspects of V and C reduction related to coarticulation: F2 slope differences at CV- vs. VC-boundaries; F2 locus equations (F2 onset vs. F2 target) related to speaking effort: duration; spectral COG (mean freq.); V-C sound energy differences

July 1st, 2002Speech acoustics and phonetics, Il Ciocco17 Some results V markedly reduced in spontaneous speech lower F2-slope diff. in spontaneous speech —> decrease in articulation speed no systematic effect on F2 locus equation; V onsets and targets change in concert —> any V reduction mirrored by comparable change in C spont. sp.: V and C shorter; lower COG —> decrease in vocal and articulatory effort

July 1st, 2002Speech acoustics and phonetics, Il Ciocco18 Access to large corpora more, and more realistic, data phonetic knowledge via statistical analyses f.i. highly accessible IFA-corpus (free, SQL) see “Structure and access of the open source IFA-corpus”, IFA Proc. 24, (vSon & Pols) on-line 4 M/4F speakers, 5.5 hrs of speech from informal to read + sent., words, syllables ~ 50Kwords segm. and labeled at phoneme level

July 1st, 2002Speech acoustics and phonetics, Il Ciocco19 Some results speech + annot. + meta data: relational DB realization of final n, f.i. Du ‘geven’ Informal 5, Retelling 6, LF HF Narr. story 14, Sentences 14, Pseudo-sent 2, All43, ,2711,73036 Read

July 1st, 2002Speech acoustics and phonetics, Il Ciocco20 Spoken Dutch Corpus (CGN) 10 M words, 1,000 hrs of speech variety of styles, incl. telephone speech adult Dutch and Flemish speakers for linguistic and technological research see various LREC and ICSLP papers (2002) see also fully transcribed: orthogr., POS, lemmas partly transcr.: phonemic, prosodic, syntactic

July 1st, 2002Speech acoustics and phonetics, Il Ciocco21 TIMIT popular DB in acoustic phonetics and ASR also telephone version (NTIMIT) hand segmented & labeled at phoneme level 438 males, 192 females (8 dialect regions) 10 sent./sp. (2 fixed, 1 phon. compact, 7 diverse) sa1: “She had her dark suit in greasy wash water all year” includes separate test data (112 M, 56 F) e.g. Ph.D thesis X. Wang (1997) “Incorporating knowledge on segmental duration in HMM-based continuous speech recognition”

July 1st, 2002Speech acoustics and phonetics, Il Ciocco22 Useful info: durational variability Adopted from Wang (1998) normal rate=95 primary stress=104 word final=136 utterance final=186 overall average=95 ms

normalized phone durationspeaking rate all 3,696 training sent. (sx + si) of TIMIT training set 0

July 1st, 2002Speech acoustics and phonetics, Il Ciocco24 ‘found’ speech DARPA-LVSR community rather ambitious Broadcast News (BN), Sp.Comm. 37 (2002) < ’95 WSJ NAB read sp Market place 1996 F0-F5, FX partitioned hrs test unpartit non Engl. speech also < 10x RT audio training data 100 hrs10 hrs55 hrs+ 50 hrs+ 100 hrs text (for LM)430 K122 M540 M> 900 M best % WER on test set 27.0 %27.1 % 1:46 hrs 16.2 % 3 hrs 13.5 —>16.1 % 3 hrs (10xRT) For Proc. DARPA Workshops, see

July 1st, 2002Speech acoustics and phonetics, Il Ciocco25 Articul.-acoustic features in ASR “A Dutch treatment of an elitist approach to articulatory-acoustic feature classification”, Proc. Eurospeech-2001, (M. Wester et al.) “Integrating articulatory features into acoustic models for speech recognition”, Phonus 5, (K. Kirchhoff, 2000) “An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition”, JASA 111 (2), (J. Sun & L. Deng, 2002)

July 1st, 2002Speech acoustics and phonetics, Il Ciocco26 Conclusions examples of dynamics in speech acoustics going from formal to informal speech: less dynamics, more reduction (artic. guided) undershoot vs. speaking style sloppiness or articulatory limits? functionality of dynamics? —> other paper systematicity of dynamics? easing ASR, rules for TTS, acquiring knowledge?