Words in puddles of sound

Slides:



Advertisements
Similar presentations
Psych 156A/ Ling 150: Acquisition of Language II Lecture 6 Words in Fluent Speech I.
Advertisements

Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?
Nonword Repetition and Sentence Repetition as Clinical Markers of SLI: The Case of Cantonese Stokes, F. S., Wong, M.Y.A., Fletcher, P., & Leonard, B. L.
Every child talking Nursery Clusters. Supporting speech, language and communication skills Nursery Clusters Cluster 3 Expressive Language.
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 8 Words in Fluent Speech.
PaPI 2005 (Barcelona, June) The perception of stress patterns by Spanish and Catalan infants Ferran Pons (University of British Columbia) Laura Bosch.
From linear sequences to abstract structures: Distributional information in infant-direct speech Hao Wang & Toby Mintz Department of Psychology University.
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 4 Words in Fluent Speech.
Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/
Language Acquisition Species-specific, species-universal accomplishment Central issue for cognitive science Important distinction between language comprehension.
Experimental evidence for product- oriented generalizations (or not) Vsevolod Kapatsinski Indiana University Dept. of Linguistics Cognitive Science Program.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 5 Words in Fluent Speech I.
Sentence Stress It happens in regular time segments.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Whenever you read a good book, somewhere in the world a door opens to allow in more light. Vera Nazarian.
Speech & Language Development 1 Normal Development of Speech & Language Language...“Standardized set of symbols and the knowledge about how to combine.
SPOKEN LANGUAGE COMPREHENSION Anne Cutler Addendum: How to study issues in spoken language comprehension.
Statistical Learning in Infants (and bigger folks)
Building a Lexicon Statistical learning & recognizing words.
ANN MORRISON, PH.D. PHONOLOGICAL AWARENESS. Is an umbrella term over the following: Listening for sounds Rhyming Syllabication Phonemic awareness – phonemic.
Statistical Learning in Infants (and bigger folks)
First Steps Daily Lesson Plan 1. Re-Reading (Fluency) 2. Word-Study 3. Writing 4. New Read Transition Lesson Plan 1. Instructional Reading 2. Word-Study.
Hao Wang, Toben Mintz Department of Psychology University of Southern California.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Psych 156A/ Ling 150: Acquisition of Language II
Source of change –Combination of feedback and explain- experimenter’s-reasoning led to greater learning than feedback alone Path of change –Children relied.
A Bayesian approach to word segmentation: Theoretical and experimental results Sharon Goldwater Department of Linguistics Stanford University.
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 9 Words in Fluent Speech II.
Presenter: Grace M. Wholley Advisor: Jessica F. Hay Department of Psychology, The University of Tennessee, Knoxville
1 Prepared by: Laila al-Hasan. 2 language Acquisition This lecture concentrates on the following topics: Language and cognition Language acquisition Phases.
The development of speech production
PSYC 206 Lifespan Development Bilge Yagmurlu.
The Audio-Lingual Method
CPSC 231 Organizing Files for Performance (D.H.)
Michael C. W. Yip The Education University of Hong Kong
Cognitive Processes in SLL and Bilinguals:
Sentence Production.
Copyright © American Speech-Language-Hearing Association
Lexical Development II: Word spurt
Chapter 12 Language development.
Transcribing foreign-accented English
Psych 156A/ Ling 150: Psychology of Language Learning
Helping our children to achieve in maths
Spoken Vs Written Language
Consonant variegations in first words: Infants’ actual productions of
Recasting REPEATING THE “RIGHT THING”
The BonPatron Vocabulary Guide
I love speaking English. What about you?
The American School and ToBI
aphasia treatment overviews spring 2017
things without support. * Putting on their coat and doing up the zip
The Audio-Lingual Method
Core Concepts Lecture 1 Lexical Frequency.
Features of the Academic Paper
عمادة التعلم الإلكتروني والتعليم عن بعد
Sorting "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
The Development of Children, Seventh Edition
Spoken Vs Written Language
Theories of Language Acquisition
Lexis and Semantics.
Roger Brown’s (1973) First Language Development Study and MLU
Where to start? Think of the area of the child’s communication which is impacting on them the most, in your opinion. Answer the following questions based.
Lexical selection: activates successive lexical items.
Lexis and Semantics Revision
Chapter 10 Language and Thought.
Modeling infant word segmentation: Another example of discovery fueled by CHILDES Alejandrina Cristia Laboratoire de Sciences Cognitives et Psycholinguistique.
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Single Subject/Small-N Designs
Teaching Listening Comprehension
Fromkin's Utterance Generator
Presentation transcript:

Words in puddles of sound Padraic Monaghan University of York Morten Christiansen Cornell University

Words in a “sea of sound” (Saffran, 2001) Discovering words from continuous speech with no reliable cues to word boundaries (Jones, 1918; Liberman et al., 1967) where words are realised variably (Pollack & Pickett, 1964)

Segmentation and sublexical cues Final syllables of words are longer (Klatt, 1975) hamster v. ham (Saffran, Newport, & Aslin (1996; Salverda & McQueen, 2004): First syllables of words are stressed ~60% of the time in English (Crystal & House, 1990; Pierrehumbert, 1981) Johnson & Jusczyk (2001); Thiessen & Saffran (2003) Certain diphones are more likely to occur across words than within words (Mattys et al., 2005)

Multiple cues in speech segmentation Hierarchical model (Mattys, White, & Melhorn, 2005)

whosalovelybabyyesyouareyourealovelybabyarentyouyesyouare Puddles whosalovelybabyyesyouareyourealovelybabyarentyouyesyouare In 5.5M words of child-directed speech: Utterance Length Percentage of Corpus 1 2 3 4 5 6 7 8 > 8 26.2 13.7 13.1 11.8 9.5 7.5 5.6 3.9 8.6

Lexical approach to segmentation Once you’ve got the words, segmentation is easy (Norris, 1994; 2007) Assume each utterance is a word until you know differently if it’s repeated, you keep it if it doesn’t occur again, you lose it

Aims of Modelling Utterances can’t be used as don’t know when it’s a single word, when it’s multiple (Brent & Cartwright, 1996) utterance boundaries are sufficient to get started single-word utterances are useful anchors for segmentation It is possible to distinguish (most) single-word from (most) multiple word utterances Proper nouns have a special role Frequent multiple-word sequences will be “lexicalised” (Tomasello, 2001)

Lexical approach to segmentation Familiar words used for segmentation by “Maggie” (Bortfeld et al., 2005): “maggie’s bike had big, black wheels” “hannah’s cup was bright and shiny” infants familiarised to “bike” more quickly than “cup” Proper nouns often occur as single utterances: 3.3% of utterances in “naomi” corpus in CHILDES Very high frequent words are useful for categorising content words (Monaghan, Chater, & Christiansen, 2005; Redington, Chater, & Finch, 1998)

Corpora 6 corpora from CHILDES: child-directed speech to children aged < 2:6 Orthographic transcription run through festival speech synthesiser (Black et al., 1990) Corpus Utterances Words MLU Reference Eve 18,280 62,734 3.43 Brown, 1973 Peter 21,311 74,185 3.48 Bloom et al, 1974 Nina 17,075 73,562 4.31 Suppes, 1974 Naomi 9,006 29,003 3.22 Sachs, 1983 Anne 28,250 96,008 3.49 Theakston et al., 2001 Aran 24,801 106,983

The model kitty thatsrightkitty sayitagain lookkitty LEXICON

The model kitty thatsrightkitty sayitagain lookkitty LEXICON

The model kitty thatsrightkitty sayitagain lookkitty LEXICON kitty 1.0

The model LEXICON kitty 0.99 kitty thatsrightkitty sayitagain lookkitty LEXICON kitty 0.99

The model LEXICON kitty 0.99 kitty thatsrightkitty sayitagain lookkitty LEXICON kitty 0.99

The model LEXICON kitty 1.99 thatsright 1.00 kitty thatsrightkitty sayitagain lookkitty LEXICON kitty 1.99 thatsright 1.00

The model LEXICON kitty 1.98 thatsright 0.99 kitty thatsrightkitty sayitagain lookkitty LEXICON kitty 1.98 thatsright 0.99

The model LEXICON kitty 2.98 thatsright 0.99 kitty thatsrightkitty sayitagain lookkitty LEXICON kitty 2.98 thatsright 0.99

The model LEXICON kitty 3.96 thatsright 0.97 sayitagain 0.99 look 1.00 thatsrightkitty sayitagain lookkitty LEXICON kitty 3.96 thatsright 0.97 sayitagain 0.99 look 1.00

More constraints in the model: Phonological glue oh okay noway nevertheless LEXICON oh kay n way evertheless Candidate words with recognised beginnings and endings admitted Candidate words which divide a recognised word-internal diphone rejected

More constraints in the model: Phonological glue oh okay no nevertheless GLUE Beg: oh End: oh Glue: oh LEXICON oh

More constraints in the model: Phonological glue oh okay no nevertheless GLUE Beg: oh End: oh Glue: oh LEXICON oh

More constraints in the model: Phonological glue oh okay no nevertheless GLUE Beg: oh End: oh Glue: oh LEXICON oh x ka? oh? ok?

More constraints in the model: Phonological glue oh okay no nevertheless GLUE Beg: oh, ka End: oh, ay Glue: oh, ok, ka, ay LEXICON oh okay

More constraints in the model: Phonological glue oh okay no nevertheless GLUE Beg: oh, ka End: oh, ay Glue: oh, ok, ka, ay LEXICON oh okay no nevertheless

Testing the model Decisions: Internal diphone glue constraint Legal beginnings/endings constraint Decay-rate Ordering of lexicon… Accuracy: Proportion of words segmented that are words Completeness: Proportion of words that are segmented Baseline segmentation: correct number of words in utterance, randomly positioned boundary (Brent & Cartwright, 1996) Included By length

Results: Accuracy t(5) = 19.637, p < .0001

Results: Completeness t(5) = 28.969, p < .0001

Results: Naomi’s Lexicon Top 10 after 1K utterances: Nomi Say No Yes The Okay Whatsthis Blanket Is What

Results: Naomi’s Lexicon Top 10 after 8K utterances: You Nomi The It To What I That’s No Your

Results: Naomi’s Lexicon 0.05 decay Dashed line shows mean word length in corpus

Results: Naomi’s Lexicon 0.01 decay

Summary Model based on puddles of sound accurate, complete reliance on Proper noun frequent words “pop” out same words useful for grammatical categorisation No mechanism for alternative, competing parses of speech first, cognitively plausible step for how lexicon may be generated Relative role of phonological glue, legal boundaries, sorting by length/frequency