1 Comparing Computational Algorithms for Modeling Phoneme Learning Ilana Heintz, Fangfang Li, and Jeff Holliday The Ohio State University MCWOP 2008, University.

Slides:

Advertisements

Similar presentations

FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.

Advertisements

A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.

A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) Introduction What is OLT ? OLT is a Computer Based Speech Training system (CBST)

09/01/10 Kuhl et al. (1992) Presentation Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., & Lindblom, B. (1992) Linguistic experience alters.

Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.

Plasticity, exemplars, and the perceptual equivalence of ‘defective’ and non-defective /r/ realisations Rachael-Anne Knight & Mark J. Jones.

Effects of Competence, Exposure, and Linguistic Backgrounds on Accurate Production of English Pure Vowels by Native Japanese and Mandarin Speakers Malcolm.

Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

Ling 240: Language and Mind Acquisition of Phonology.

Speech perception 2 Perceptual organization of speech.

Jessica E. Huber Ph.D. in Speech Science from University at Buffalo MA in Speech-Language Pathology, Certified Speech- Language Pathologist Assistant Professor,

Development of Speech Perception. Issues in the development of speech perception Are the mechanisms peculiar to speech perception evident in young infants?

Speech and speaker normalization (in vowel normalization)

Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.

TEMPLATE DESIGN © Self Organized Neural Networks Applied to Animal Communication Abstract Background Objective The main.

Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.

5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

Artificial neural networks:

Rhythmic Similarity Carmine Casciato MUMT 611 Thursday, March 13, 2005.

Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.

© sebis 1JASS 05 Information Visualization with SOMs Information Visualization with Self-Organizing Maps Software Engineering betrieblicher Informationssysteme.

SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?

Ameriranikistan Muhammad Ahmad Kyle Huston Farhad Majdeteimouri Dan Mackin.

GABRIELLA RUIZ LING 620 OHIO UNIVERSITY Cross-language perceptual assimilation of French and German front rounded vowels by novice American listeners and.

Lecture 09 Clustering-based Learning

Why is ASR Hard? Natural speech is continuous

SOMTIME: AN ARTIFICIAL NEURAL NETWORK FOR TOPOLOGICAL AND TEMPORAL CORRELATION FOR SPATIOTEMPORAL PATTERN LEARNING.

Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.

A Lecture about… Phonetic Acquisition Veronica Weiner May, 2006.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.

Background Infants and toddlers have detailed representations for their known vocabulary items Consonants (e.g., Swingley & Aslin, 2000; Fennel & Werker,

Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,

Artificial Neural Network Unsupervised Learning

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

Speech Perception 4/4/00.

Results Tone study: Accuracy and error rates (percentage lower than 10% is omitted) Consonant study: Accuracy and error rates 3aSCb5. The categorical nature.

1. Background Evidence of phonetic perception during the first year of life: from language-universal listeners to native listeners: Consonants and vowels:

Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.

Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.

Intelligibility of voiced and voiceless consonants produced by Lebanese Arabic speakers with respect to vowel length Romy Ghanem.

The Discrimination of Vowels and Consonants by Lara Lalonde, Jacynthe Bigras, Jessica Flanagan, Véronick Boucher, Janie Paris & Lyzanne Cuddihy.

Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.

4.2.6The effects of an additional eight years of English learning experience ＊ An additional eight years of English learning experience are not effective.

1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,

Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.

Cluster Analysis.

Unsupervised Learning Networks 主講人 : 虞台文. Content Introduction Important Unsupervised Learning NNs – Hamming Networks – Kohonen’s Self-Organizing Feature.

Bosch & Sebastián-Gallés Simultaneous Bilingualism and the Perception of a Language-Specific Vowel Contrast in the First Year of Life.

Visualization of Geospatial Data by Component Planes and U-matrix Marcos Aurélio Santos da Silva Antônio Miguel Vieira Monteiro José Simeão de Medeiros.

Phonetics and Phonology.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Reinforcement Look at matched picture after sound ends & it moves 10 trials (5 of each pairing) 2 or 4 blocks (2 pairs of words, 2 pairs of swoops) Participants.

Effects of Musical Experience on Learning Lexical Tone Categories

Self-Organizing Network Model (SOM) Session 11

The Human Voice. 1. The vocal organs

Carmine Casciato MUMT 611 Thursday, March 13, 2005

Theoretical Discussion on the

Automatic Speech Recognition

ARTIFICIAL NEURAL NETWORKS

The Human Voice. 1. The vocal organs

Carmine Casciato MUMT 611 Thursday, March 13, 2005

Assistive System Progress Report 1

Brain Mechanisms in Early Language Acquisition

Analyzing F0 and vowel formants of Persian based on long-term features

Presentation transcript:

1 Comparing Computational Algorithms for Modeling Phoneme Learning Ilana Heintz, Fangfang Li, and Jeff Holliday The Ohio State University MCWOP 2008, University of Minnesota

2 Research Questions How do children learn to discriminate between similar phonemic categories? How does adult feedback affect that process? How are adults able to understand children? In what ways exactly is child speech different from adult speech?

3 Narrowing it Down How do children learn the difference between close consonants, for instance, /s/ vs. /S/ vs. /c}/ What are the differences in the productions of each of these consonants? How do the consonants differ across languages? How do children’s productions differ from adult speech?

4 Modeled data Dental/ Alveolar Post-alveolarAlveo-palatal English[s][S] Japanese[s][c}] Mandarin[s][S][c}] Stimuli elicited by 160 children and 37 adults 3 word tokens per CV type, 1390 total stimuli Children aged 2-5 from America, Japan, and Songyuan, China (Mandarin speaking) Stimuli later used in perception tests with adults, here we only study the production data

Frequency (Hz) Hand-measured acoustic analyses Sound pressure level (dB/Hz) Frequency (Hz)

6 As reported in Li 2008 Hand-measured acoustic analyses

7 Hand-measured acoustic analyses: English-speaking children

8 These are great results… so why use computational methods? Automatically derive many features per stimulus Derive time-varying features across the stimulus Look at more interactions between features Build a model that can be used to talk about acquisition & feedback

9 Self-organizing maps: a result E,J,M /s/ English /S/, Japanese /c}/, Mandarin /S/

10 Setting up the Map Determine dimensionality of data: –4 variables –Independent or correlated –Data represented by four-dimensional numeric vector: –[100.23, , , 4.3] Neurons same dimensionality as data Determine number of neurons: 15 x 15

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27 Self-organizing maps: distance matrix E,J,M /s/ English /S/, Japanese /c}/, Mandarin /S/ All adult speakers

28 Self-organizing maps: Best-matching units, labeled E,J,M /s/ Mandarin /S/, Japanese /c}/ All adult speakers

29 English adults only /s/ /S/ English-speaking adults

30 Japanese adults only /s/ /c}/ Japanese-speaking adults

31 Mandarin adults only Mandarin-speaking adults /s/ /S/ /c}/

32 English-speaking children /s/ /S/ English-speaking children: all

33 Child-produced data on adult-trained map /s/ /S/ English-child data shown on English-adult map

34 English-speaking 2-year-olds English-speaking children: 2-year-olds /s/ /S/

35 English-speaking 3-year olds English-speaking children: 3-year-olds /s/ /S/

36 English-speaking 4-year-olds English-speaking children: 4-year-olds /s/ /S/

37 English-speaking 5-year-olds English-speaking children: 5-year-olds/s/ /S/

38 Conclusions Partially replicated some of the results of the hand-measured acoustic analysis with self-organizing maps Summing over four frequency regions of excitation pattern mirrored centroid results Less than 1390 stimuli split into 10 ms frames was enough to train 15 x 15 maps

39 More to do… Find better features for Mandarin, Japanese Incorporate dynamic features into the map Study the childrens’ productions more closely Incorporate notion of feedback by connecting the children and adult maps with Hebbian updates

40 References Cabrera, D., Ferguson, S. and Schubert, E “PsySound3: Software for acoustical and psychoacoustical analysis of sound recordings.” Proceedings of The 13th International Conference on Auditory Display. Montreal Canada. pp Glasberg, B.R and Moore, C.J “A model of loudness applicable to time-varying sounds.” Journal of the Audio Engineering Society. 50:5, Kohonen, T "Self-Organizing Map", 2nd ed., Springer-Verlag, Berlin Li, Fangfang “Universal Development in Context: the Case of Child Acquisition of Sounds Across Languages.” Lecture, University of Lethbridge. Vesanto J., Himberg J., Alhoniemi E., Parhankangas J “Self-organizing map in Matlab: the SOM Toolbox.” In Proceedings of the Matlab DSP Conference 1999, pages

41 English variables

42 Japanese variables

43 Mandarin variables