Statistics and Rules in Language Acquisition: Constraints and the Brain Richard N. Aslin Department of Brain and Cognitive Sciences University of Rochester.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

Accessing spoken words: the importance of word onsets
Why it is Hard to Label Our Concepts Jesse Snedeker and Lila Gleitman Harvard and U. Penn.
All slides © S. J. Luck, except as indicated in the notes sections of individual slides Slides may be used for nonprofit educational purposes if this copyright.
Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
Word Imagery Effects on Explicit and Implicit Memory Nicholas Bube, Drew Finke, Darcy Lemon, and Meaghan Topper.
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
Is Recursion Uniquely Human? Hauser, Chomsky and Fitch (2002) Fitch and Hauser (2004)
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Speech perception 2 Perceptual organization of speech.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 6 Words in Fluent Speech I.
Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?
Segmenting Nonsense Sanders, Newport & Neville (2002) Ricardo TaboneLIN 7912.
Psych 216: Movement Attention. What is attention? There is too much information available in the world to process it all. Demonstration: change-detection.
Dissociable neural mechanisms supporting visual short-term memory for objects Xu, Y. & Chun, M. M. (2006) Nature, 440,
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 8 Words in Fluent Speech.
Phonetic Similarity Effects in Masked Priming Marja-Liisa Mailend 1, Edwin Maas 1, & Kenneth I. Forster 2 1 Department of Speech, Language, and Hearing.
Chapter 6: Visual Attention. Scanning a Scene Visual scanning – looking from place to place –Fixation –Saccadic eye movement Overt attention involves.
Phonetic Detail in Developing Lexicon Daniel Swingley 2010/11/051Presented by T.Y. Chen in 599.
Computational Analysis of Motor Learning. Three paradigms Force field adaptation Visuomotor transformations Sequence learning Does one term (motor learning)
Visual Hemifields and Perceptual Grouping Sarah Theobald & Nestor Matthews Department of Psychology, Denison University, Granville OH USA The human.
Neural correlates of continuous and categorical sinewave speech perception: An FMRI study Rutvik Desai, Einat Liebenthal, Eric Waldron, and Jeffrey R.
PaPI 2005 (Barcelona, June) The perception of stress patterns by Spanish and Catalan infants Ferran Pons (University of British Columbia) Laura Bosch.
Writing Workshop Find the relevant literature –Use the review journals as a first approach e.g. Nature Reviews Neuroscience Trends in Neuroscience Trends.
Beginning of Language Learning Language learning emerges from general communication skills. Emotion “Motherese/Parentese”:Special form of speech that caregivers.
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 4 Words in Fluent Speech.
Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/
Language Acquisition Species-specific, species-universal accomplishment Central issue for cognitive science Important distinction between language comprehension.
Baysian Approaches Kun Guo, PhD Reader in Cognitive Neuroscience School of Psychology University of Lincoln Quantitative Methods 2011.
Language Comprehension Speech Perception Naming Deficits.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 13 Learning Biases.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 5 Words in Fluent Speech I.
Cognitive Processes PSY 334 Chapter 2 – Perception.
Discrimination-Shift Problems Background This type of task has been used to compare concept learning across species as well as across a broad range of.
A Lecture about… Phonetic Acquisition Veronica Weiner May, 2006.
Sebastián-Gallés, N. & Bosch, L. (2009) Developmental shift in the discrimination of vowel contrasts in bilingual infants: is the distributional account.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Infant Speech Perception & Language Processing. Languages of the World Similar and Different on many features Similarities –Arbitrary mapping of sound.
JOBB FÉLTEKE DOMINANCIA A VIZUÁLIS STATISZTIKUS TANULÁS KEZDŐFÁZISÁBAN József Fiser, Matthew E. Roser *, Richard N. Aslin # & Michael S. Gazzaniga * Brandeis.
A chicken-and-egg problem
Statistical Learning in Infants (and bigger folks)
Building a Lexicon Statistical learning & recognizing words.
Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander.
Dr. Ramez. Bedwani.  Different methods of learning  Factors affecting learning.
Analysis of fMRI data with linear models Typical fMRI processing steps Image reconstruction Slice time correction Motion correction Temporal filtering.
Adele E. Goldberg. How argument structure constructions are learned.
Statistical Learning in Infants (and bigger folks)
Studying Memory Encoding with fMRI Event-related vs. Blocked Designs Aneta Kielar.
LATERALIZATION OF PHONOLOGY 2 DAY 23 – OCT 21, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
The Influence of Feature Type, Feature Structure and Psycholinguistic Parameters on the Naming Performance of Semantic Dementia and Alzheimer’s Patients.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Neurophysiologic correlates of cross-language phonetic perception LING 7912 Professor Nina Kazanina.
# Attentional Volleying Across Visual Quadrants Andrew S. Clement 1,2 & Nestor Matthews 1 1 Department of Psychology, Denison University, 2 Department.
Drummon, S. P. A., Brown, G. G., Gillin, J. C., Stricker, J. L., Wong, E. C., Buxton, R. B. Lecturer: Katie Yan.
Language Acquisition Computational Intelligence 4/7/05 LouAnn Gerken.
Evaluating Perceptual Cue Reliabilities Robert Jacobs Department of Brain and Cognitive Sciences University of Rochester.
What infants bring to language acquisition Limitations of Motherese & First steps in Word Learning.
Basic cognitive processes - 1 Psych 414 Prof. Jessica Sommerville.
An Eyetracking Analysis of the Effect of Prior Comparison on Analogical Mapping Catherine A. Clement, Eastern Kentucky University Carrie Harris, Tara Weatherholt,
Orienting Attention to Semantic Categories T Cristescu, JT Devlin, AC Nobre Dept. Experimental Psychology and FMRIB Centre, University of Oxford, Oxford,
PERCEPTUAL LEARNING AND CORTICAL SELF-ORGANIZATION Mike Kilgard University of Texas Dallas.
Brain Imaging Studies of Intelligence and Creativity: What is the Picture for Education? By: Richard J. Haier and Rex, E. Jung Brain Imaging Studies of.
Selective Attention
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 9 Words in Fluent Speech II.
Poverty of stimulus in the context of language Second Semester.
Vision Sciences Society Annual Meeting 2012 Daniel Mann, Charles Chubb
Cognitive Processes PSY 334
NATURE NEUROSCIENCE 2007 Coordinated memory replay in the visual cortex and hippocampus during sleep Daoyun Ji & Matthew A Wilson Department of Brain.
Psych 156A/ Ling 150: Psychology of Language Learning
Presentation transcript:

Statistics and Rules in Language Acquisition: Constraints and the Brain Richard N. Aslin Department of Brain and Cognitive Sciences University of Rochester CALACEI Conference, Trieste, Italy Tools to Study Language Acquisition in Early Infancy May 6, 2006

Outline 1. What is Statistical Learning (SL)? 2. How is SL constrained? 3.Neural correlates of visual SL 4.Implications of SL for rule learning (RL)

1. What is SL? Acquisition of structured information by listening or observing No reinforcement or feedback Sensitivity to frequency or probability distributions

Why is SL interesting? Something like SL must be how language is acquired  no instructor SL appears to be implausible –Computations involved (infinite # statistics) –Limits of information processing (real-time flow of input and demands on working memory)

Why word segmentation? Tractable problem Must be solved early by all language learners (words are defined similarly across languages) Illustrative of distributional learning mechanism that may apply more broadly

Sequence of elements: A-B-C-D-E-F-G-H-I-J-K-L... Test triplets: D-E-F vs. I-J-K Saffran, Aslin & Newport (1996)

Domains and species SL operates on human speech and tones (Saffran et al., 1996a,b; 1999), as well as on visual shapes in temporal (Fiser & Aslin, 2002; Kirkham et al., 2002) and spatial domains (Fiser & Aslin, 2001, 2002). SL operates in human adults, infants, tamarin monkeys (Hauser et al., 2001, 2004), and rats (Toro & Trobalon, 2005); rats fail higher- order SL.

2. How is SL constrained? Gestalt principles –Proximity (Newport & Aslin, 2004; Pena et al, 2002) –Similarity (Creel, Newport & Aslin, 2004) –Good continuation (Fiser, Scholl & Aslin, in press) Social/attentional cues (Yu, Ballard & Aslin, 2005) Preferred units over which statistics are computed (Newport, Weiss, Wonnacott & Aslin, 2004) Redundancy reduction (Fiser & Aslin, 2005) Primacy (Gebhart, Aslin & Newport, in preparation)

Happy birthday to you Twinkle twinkle little star Element similarity Twinkle twinkle little starhappy birthday to you

AgBhCi Creel, Newport & Aslin (2004) TPs between adjacent tones = 0.5 and 0.25 Same octave

ghi …...ABC... Different octaves TPs between adjacent tones = 0.5 and 0.25

Results Same OctaveDiff Octaves

Are syllables (CV) or segments (C and V) the preferred unit for SL? Saffran, Newport & Aslin (1996: adults) and Saffran, Aslin & Newport (1996: infants) assumed that syllable transitional probabilities were the relevant computational unit However, BOTH syllable and segment transitional probabilities in our artificial languages would parse the speech streams in the same way

Syllables AND Segments

Syllables NOT Segments 1.0.5

What about infants? No previous work has examined this question for statistical computations But there is a literature on infant perception of segments and syllables –Jusczyk & Derrah: 2 mos old - syllables –Mehler et al.; Jusczyk: development from syllables  segments? –Kuhl, Hillenbrand: 12 mos old - segments (or acoustic similarity)

Syllables AND Segments

Syllables NOT Segments 1.0.5

Infants: Syllables NOT segments

The Statistical ‘Garden Path’ Two languages with different words and partial overlap of syllables Expose to Lang A + Lang B (5 min each) No pause between languages Post-test: –words vs. partwords in A –words vs. partwords in B

5 min of exposure to Lang A or B alone 5 min of exposure each to Lang A+B chance

Add 30 sec pause between languages Change pitch of synthetic voice Triple duration of 2 nd language (15 min) chance

Eliminate syllable differences (all identical) –5 min exposure and test Lang A or B alone –Test for word vs. partword in each language chance Primacy: learning first structure ‘blocks’ new structure

3. Neural correlates of SL Statistical learning in the visual modality: spatial structure, not temporal structure How are higher-order visual features represented in the brain? –Hemisphere bias in SL and interhemispheric transfer –fMRI activations of brain regions during SL

Background Can mere exposure to a series of scenes enable adult learners to extract features defined by shape-conjunctions? (Fiser & Aslin, 2001)

Six base-pairs Fit three base-pairs into 3 X 3 grid

Testing phase 2AFC task Base-pair vs. Non-base pair E F IJ A B A B Base-pair 70% correct IF Non-base pair

Split the base-pairs Fiser, Roser, Aslin & Gazzaniga (in prep) 2 deg

Modified test phase Ipsilateral: Practice: RHTest: RH Practice: LHTest: LH Contralateral: Practice: RHTest: LH Practice: LH Test: RH Four lateralized test types

Subjects Normal subjects: Sixteen college students Callosotomy patient: V.P. (Corballis et al. Neurology 2001)

Results with normal subjects Equal learning in all conditions  interhemispheric transfer Chance

Contralateral: No interhemispheric information transfer Ipsilateral: Strong right hemisphere advantage * Chance Results with the split brain patient

Event-Related fMRI Design – LEARNING PHASE 2500 Baseline fix (4 TRs) /5000/7500 StimulusJitter Trials /5000/75000 StimulusJitter Trials Instructions 144 Stimuli each presented once – Divided into 3 Runs of 6 min each

TEST PHASE 2500 Baseline fix (4 TRs) /5000/7500 Stimulus + Response Jitter Trials /5000/75000 Stimulus + Response Jitter Trials Base PairNon Base Pair 48 test trials: 24 base-pairs, 24 non base-pairs  yes/no familiarity task

Learning Phase: final 1/3 vs. initial 1/3 Right Parietal Activation Consistent with split-brain findings

4. Implications of SL for RL Generalization to new tokens: Rule-learning –Gomez & Gerken (1999) –Marcus et al. (1999) –Pena et al. (2002) –Saffran & Wilson (2003) Not based on perceptual similarity Could be based on surrounding context (Mintz, 2003) and on category variability (Gomez, 2002; Gomez & Maye, 2005

What enables RL? Obtained with strings, not streams Pauses enable encoding of position info High variability in a sea of stability may induce categories by down-weighting the category exemplars and then enabling their differences to be learned after “frequent frames” (Mintz, 2003; Santelmann & Jusczyk, 1998) are established

RL vs. SL: Different mechanism? RL operates over categories rather than over surface forms. Computation of statistics over categories may involve the same SL mechanism as computation over surface forms  only a difference in input? RL in tamarins (Hauser, Weiss & Marcus, 2002) suggests that RL is not unique to language learning.

Conclusions Statistical learning is ubiquitous and powerful. SL must be constrained to operate efficiently and to extract the “right” structure. The search for neural correlates of SL is ongoing. Whether SL can also operate at the level of categories or whether RL involves a separate mechanism remains unclear.

Thanks to my collaborators and funding sources Elissa Newport Jenny Saffran Jozsef Fiser Andrea Gebhart Sarah Creel Matt Roser Mike Gazzaniga NIH, Packard Foundation, McDonnell Foundation

Blank

Why conditionalized statistics? Element frequency (N-gram) is a poor predictor of underlying structure. –Many high frequency sounds appear in multiple contexts –Conditional probabilities are computable by adults and infants (and in classical conditioning by rats, but not in speech) But element frequency can serve as an “anchor” or a “filter” on how SL operates.

With what fidelity? How much input is needed to compute the relevant statistic(s)? –Brent & Siskind (2001) –Mintz, Newport & Bever (2002) What decision mechanism operates on those stored statistical values? –Local minimum vs. hard threshold –How many bits of resolution? Is a transitional probability difference of 0.43 > 0.39 relevant?

Are SL studies just “toy” demos? Saffran et al. used simple structures Swingley (2005) showed that similar structures are present in IDS.

Which unit? Saffran, Aslin & Newport (1996) presumed the unit was the syllable. Newport et al. (BU: 2004) showed that SL in speech streams is computed over segments (Cs & Vs), not syllables. Other cues are clearly important: Saffran et al. (1996): Although experience with speech in the real world is unlikely to be as concentrated as it was in these studies, infants in more natural settings presumably benefit from other types of cues correlated with statistical information.

Fiser, Scholl & Aslin (in press) Bouncing vs. streaming

Perception of bouncing or streaming biases statistical learning “streaming”

3. What are the limits of SL? Some minimal “attention” is required. –Saffran et al. (1997) –Turke-Brown, Junge & Scholl (2005) –Toro, Sinnett & Soto-Faraco (in press) In streams of syllables, non-adjacent learning is difficult. –Newport & Aslin (2004) –Pena et al. (2002) Unfamiliar elements (noises) are hard to learn. –Gebhart, Newport & Aslin (2004)

Test phase: correct – incorrect