InputOutput 0 00 1 00 0 10 1 11 AND Network NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1.

InputOutput 0 00 1 00 0 10 1 11 AND Network

NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1 4.3717832565## i1 to 1 4.3582129478## i2 to 1 0.0000000000 OR Network InputOutput 0 00 1 01 0 11 1 11

XOR Network

-3.0456776619## bias to 1 5.5165352821## i1 to 1 -5.7562727928## i2 to 1 XOR Network

-3.0456776619## bias to 1 5.5165352821## i1 to 1 -5.7562727928## i2 to 1 XOR Network InputOutput 0 00 1 01 0 10 1 10

-3.0456776619## bias to 1 5.5165352821## i1 to 1 -5.7562727928## i2 to 1 -3.6789164543## bias to 2 -6.4448370934## i1 to 2 6.4957633018## i2 to 2 XOR Network InputOutput 0 00 1 01 0 10 1 10

-3.0456776619## bias to 1 5.5165352821## i1 to 1 -5.7562727928## i2 to 1 -3.6789164543## bias to 2 -6.4448370934## i1 to 2 6.4957633018## i2 to 2 XOR Network InputOutput 0 00 1 01 0 10 1 10 InputOutput 0 00 1 00 0 11 1 10

-3.0456776619## bias to 1 5.5165352821## i1 to 1 -5.7562727928## i2 to 1 -3.6789164543## bias to 2 -6.4448370934## i1 to 2 6.4957633018## i2 to 2 -4.4429202080## bias to output 9.0652370453## 1 to output 8.9045801163## 2 to output XOR Network InputOutput 0 00 1 01 0 10 1 10 InputOutput 0 00 1 00 0 11 1 10

-3.0456776619## bias to 1 5.5165352821## i1 to 1 -5.7562727928## i2 to 1 -3.6789164543## bias to 2 -6.4448370934## i1 to 2 6.4957633018## i2 to 2 -4.4429202080## bias to output 9.0652370453## 1 to output 8.9045801163## 2 to output XOR Network InputOutput 0 00 1 01 0 10 1 10 InputOutput 0 00 1 00 0 11 1 10 InputOutput 0 00 1 01 0 11 1 11

-3.0456776619## bias to 1 5.5165352821## i1 to 1 -5.7562727928## i2 to 1 -3.6789164543## bias to 2 -6.4448370934## i1 to 2 6.4957633018## i2 to 2 -4.4429202080## bias to output 9.0652370453## 1 to output 8.9045801163## 2 to output XOR Network InputOutput 0 00 1 01 0 10 1 10 InputOutput 0 00 1 00 0 11 1 10 InputOutput 0 00 1 01 0 11 1 11 The mapping from the hidden units to output is an OR network, that never receives a [1 1] input.

The Past Tense and Beyond

Classic Developmental Story Initial mastery of regular and irregular past tense forms Overregularization appears only later (e.g. goed, comed) ‘U-Shaped’ developmental pattern taken as evidence for learning of a morphological rule V + [+past] --> stem + /d/

Rumelhart & McClelland 1986 Model learns to classify regulars and irregulars, based on sound similarity alone. Shows U-shaped developmental profile.

What is really at stake here? Abstraction Operations over variables Learning based on input

What is not at stake here Feedback, negative evidence, etc.

Who has the most at stake here? Those who deny the need for rules/variables in language have the most to lose here …but if they are successful, they bring with them a simple and attractive learning theory, and mechanisms that can readily be grounded at the neural level However, if the advocates of rules/variables succeed here or elsewhere, they face the more difficult challenge at the neuroscientific level

Questions about Lab 2b How did the network perform? How well did the network generalize to novel stems? What was the effect of the frequency manipulation? Does the network need to internalize a Blocking Principle? Does the network explicitly represent a default form?

1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology Beyond Sound Similarity Regulars and Associative Memory

(Pinker & Ullman 2002)

Beyond Sound Similarity Zero-derived denominals are regular –Soldiers ringed the city –*Soldiers rang the city –high-sticked, grandstanded, … –*high-stuck, *grandstood, … Productive in adults & children Shows sensitivity to morphological structure [[ stem N ] ø V ]-ed Provides good evidence that sound similarity is not everything But nothing prevents a model from using richer similarity metric –morphological structure (for ringed) –semantic similarity (for low-lifes)

Regulars & Associative Memory Regulars are productive, need not be stored Irregulars are not productive, must be stored But are regulars immune to effects of associative memory? –frequency –over-irregularization Pinker & Ullman: –regulars may be stored –but they can also be generated on-the-fly –‘race’ can determine which of the two routes wins –some tasks more likely to show effects of stored regulars

Child vs. Adult Impairments Specific Language Impairment –Early claims that regulars show greater impairment than irregulars are not confirmed Pinker & Ullman 2002b –‘The best explanation is that language-impaired people are indeed impaired with rules, […] but can memorize common regular forms.’ –Regulars show consistent frequency effects in SLI, not in controls. –‘This suggests that children growing up with a grammatical deficit are better at compensating for it via memorization than are adults who acquired their deficit later in life.’

Neuropsychological Dissociations Ullman et al. 1997 –Alzheimer’s disease patients Poor memory retrieval Poor irregulars Good regulars –Parkinson’s disease patients Impaired motor control, good memory Good irregulars Poor regulars Striking correlation involving laterality of effect Marslen-Wilson & Tyler 1997 –Normals past tense primes stem –2 Broca’s Patients irregulars prime stems inhibition for regulars –1 patient with bilateral lesion regulars prime stems no priming for irregulars or semantic associates

Morphological Priming Lexical Decision Task –CAT, TAC, BIR, LGU, DOG –press ‘Yes’ if this is a word Priming –facilitation in decision times when related word precedes target (relative to unrelated control) –e.g., {dog, rug} - cat Marslen-Wilson & Tyler 1997 –Regular {jumped, locked} - jump –Irregular {found, shows} - find –Semantic {swan, hay} - goose –Sound {gravy, sherry} - grave

Neuropsychological Dissociations Bird et al. 2003 –complain that arguments for selective difficulty with regulars are confounded with the phonological complexity of the word-endings Pinker & Ullman 2002 –weight of evidence still supports dissociation; Bird et al.’s materials contained additional confounds

Brain Imaging Studies Jaeger et al. 1996, Language –PET study of past tense –Task: generate past from stem –Design: blocked conditions –Result: different areas of activation for regulars and irregulars Is this evidence decisive? –task demands very different –difference could show up in network –doesn’t implicate variables Münte et al. 1997 –ERP study of violations –Task: sentence reading –Design: mixed –Result: regulars: ~LAN irregulars: ~N400 Is this evidence decisive? –allows possibility of comparison with other violations

Low-Frequency Defaults German Plurals –die Straßedie Straßen die Fraudie Frauen –der Apfeldie Äpfel die Mutterdie Mütter –das Autodie Autos der Parkdie Parks die Schmidts -s plural low frequency, used for loan-words, denominals, names, etc. Response –frequency is not the critical factor in a system that focuses on similarity –distribution in the similarity space is crucial –similarity space with islands of reliability network can learn islands or network can learn to associate a form with the space between the islands

Similarity Space

Arabic Broken Plural CvCC –nafsnufuus‘soul’ –qidhqidaah‘arrow’ CvvCv(v)C –xaatamxawaatim‘signet ring’ –jaamuusjawaamiis‘buffalo’ Sound Plural –shuway?irshuway?ir-uun‘poet (dim.)’ –kaatibkaatib-uun‘writing (participle)’ –hindhind-aat‘Hind (fem. name)’ –ramadaanramadaan-aat‘Ramadan (month)’

German Plurals (Hahn & Nakisa 2000)

Syntax, Semantics, & Statistics

Starting Small Simulation How well did the network perform? How did it manage to learn?

Generalization Training Items –Input: 1 0 1 0Output: 1 0 1 0 –Input: 0 1 0 0Output: 0 1 0 0 –Input: 1 1 1 0Output: 1 1 1 0 –Input: 0 0 0 0Output: 0 0 0 0 Test Item –Input: 1 1 1 1Output ? ? ? ?

Generalization Training Items –Input: 1 0 1 0Output: 1 0 1 0 –Input: 0 1 0 0Output: 0 1 0 0 –Input: 1 1 1 0Output: 1 1 1 0 –Input: 0 0 0 0Output: 0 0 0 0 Test Item –Input: 1 1 1 1Output ? ? ? ? 1 1 1 1 (Humans) 1 1 1 0 (Network)

Generalization Training Items –Input: 1 0 1 0Output: 1 0 1 0 –Input: 0 1 0 0Output: 0 1 0 0 –Input: 1 1 1 0Output: 1 1 1 0 –Input: 0 0 0 0Output: 0 0 0 0 Test Item –Input: 1 1 1 1Output ? ? ? ? Generalization fails because learning is local 1 1 1 1 (Humans) 1 1 1 0 (Network)

Generalization Training Items –Input: 1 0 1 0Output: 1 0 1 0 –Input: 0 1 0 0Output: 0 1 0 0 –Input: 1 1 1 0Output: 1 1 1 0 –Input: 0 0 0 0Output: 0 0 0 0 Test Item –Input: 1 1 1 1Output ? ? ? ? Generalization succeeds because representations are shared 1 1 1 1 (Humans) 1 1 1 1 (Network)

Negative Evidence Standard Doctrine –Language learners do not receive negative evidence –They must therefore learn only from positive examples –This forces the learner to make constrained generalizations –[even with negative evidence, generalizations must be constrained] Common Suggestion –‘Implicit Negative Evidence’; the absence of certain examples in the input is taken to be significant –Who do you think John saw __?Who do you think that John saw __? Who do you think __ saw John? *Who do you think that __ saw John? Challenge: how to classify and store input appropriately, in order to detect appropriate generalizations; large memory needed

Feedback via Prediction Simple Recurrent Network –Prediction task provides feedback –System does not need to explicitly store large amounts of data –Challenge is to appropriately encode the feedback signal –If learning rate is set low, learner is protected against noisy input data

Feedback via Prediction The prediction problem is very relevant –can be viewed as representing many kinds of syntactic relations –typically, many predictions held simultaneously; if there’s an explicit representation of hierarchy and abstraction, then order of completion is easily predicted –challenge is to avoid spurious competition among dependencies,

Elman --> Rohde & Plaut –whatever happens, agreement is not the decisive case Seidenberg on locatives –locatives across languages are highly constrained; generalizations go beyond surface patterns; the Seidenberg/Allen model is given the outline of the solution to the problem –semantics, not statistics, is critical How to use SRN ideas effectively –Structured prediction device encodes hypotheses –How to encode and act upon error signals? –Partially matching features can lead to long-distance feedback –Prediction of alternatives can lead to confirmation or disconfirmation of 1 choice

Infants and Statistical Learning

Saffran, Aslin, & Newport (1996) 8-month old infants –Passive exposure to continuous speech sequence for 2 minutes bidakupadotigolabubidaku… –Test (Experiment #2) bidakubidakubidakubidakubidaku… kupadokupadokupadokupadokupado… –Infants listen longer to unfamiliar sequences –Transitional Probabilities bi da ku pa do ti 1.0.33

Marcus et al. (1999) Training –ABA:ga na gali ti li –ABB:ga na nali ti ti Testing –ABA:wo fe wo –ABB:wo fe fe

Aslin & Newport (in press) While some adjacent statistical regularities can be learned, other types of statistical regularities cannot

We have recently been developing a statistical approach to language acquisition and investigating the abilities of human adults, infants, and nonhuman primates to perform the computations that would be required for acquiring properties of natural languages by such a method. Our studies (initially in collaboration with Jenny Saffran) have shown that human adults and infants are capable of performing many of these computations online and with remarkable speed, during the presentation of controlled speech streams in the laboratory. We have also found that adults and infants can perform similar computations on nonlinguistic materials (e.g., music), and (in collaboration with Marc Hauser) that nonhuman primates can perform the simplest of these computations. However in our recent studies, when tested on more complex computations involving non-adjacent sounds, humans show strong selectivities (they can perform certain computations, but fail at others), corresponding to the patterns which natural languages do and do not exhibit. Primates are not capable of performing some of these more difficult computations. Additional recent studies examine how statistics can be used to form non-statistical generalizations and to regularize irregular structures, and how the computations we have hypothesized for word segmentation extend to acquiring syntactic phrase structure. Overall we feel that this approach may provide an important mechanism for learning certain aspects of language, particularly when combined with an understanding of the ways in which input statistics may be selectively extracted or altered as they are acquired. In addition, the constraints of learners in performing differing types and complexities of computations may provide part of the explanation for which learners can acquire human languages, and why languages have some of the properties they have. Newport & Aslin, Dec. 2003)

A Little Syntax

Gold (1967) Hypothetical classes of languages –#1: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA} –#2: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA, …, A ∞ } How could a learner figure out the target language, based on positive only examples (‘text presentation’) –#1: –#2: Under class #2, there’s no way to guarantee convergence

Baker (1979) Alternating Verbs –John gave a cookie to the boy. John gave the boy a cookie. –Mary showed some photos to her family. Mary showed her family some photos. Non-Alternating Verbs –John donated a painting to the museum. *John donated the museum a painting. –Mary displayed her art collection to the visitors. *Mary displayed the visitors her art collection Learnability problem: how to avoid overgeneralization

Seidenberg (1997, Science) Locative Verb Constructions –John poured the water into the cup *John poured the cup with water –*Sue filled the water into the glass Sue filled the glass with water –Bill loaded the apples onto the truck Bill loaded the truck with apples “Connectionist networks are well suited to capturing systems with this character. Importantly, a network configured as a device that learns to perform a task such as mapping from sound to meaning will act as a discovery procedure, determining which kinds of information are relevant. Evidence that such models can encode precisely the right combinations of probabilistic constraints is provided by Allen (42), who implemented a network that learns about verbs and their argument structures from naturalistic input.” (p. 1602)

Seidenberg (Science, 3/14/97) “Research on language has arrived at a particularly interesting point, however, because of important developments outside of the linguistic mainstream that are converging on a different view of the nature of language. These developments represent an important turn of events in the history of ideas about language.” (p. 1599)

Seidenberg (Science, 3/14/97) “A second implication concerns the relevance of poverty-of-the-stimulus arguments to other aspects of language. Verbs and their argument structures are important, but they are language specific rather than universal properties of languages and so must be learned from experience.” (p. 1602)

Allen’s Model Learns associations between (i) specific verbs & argument structures and (ii) semantic representations Feature encoding for verbs, 360 features [eat]: +act, +cause, +consume, etc. [John]: +human, +animate, +male, +automotive, - vehicle

Allen’s Model Learns associations between (i) specific verbs & argument structures and (ii) semantic representations Training set: 1200 ‘utterance types’ taken from caretaker speech in Peter corpus (CHILDES)

Allen’s Model Fine-grained distinction between hit, carry John kicked Mary the ball *John carried Mary the basket [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Allen’s Model Fine-grained distinction between hit, carry John kicked Mary the ball *John carried Mary the basket “This behavior shows crucially that the network is not merely sensitive to overall semantic similarity: rather, the network has organized the semantic space such that some features are more important than other.” (p. 5)

Challenges Allen’s results are impressive; the model is interesting in the way that it poses the learning task as a selection process (the linking rules do not emerge from nowhere) Fine-grained distinctions in English ‘Concealed’ distinctions in Korean Reason for universals

Challenges Fine-grained distinctions, e.g. in English pour the water into the glass pour the water the poured water stand the lamp on the floor *stand the lamp *the stood lamp

Challenges ‘Concealed’ distinctions, e.g. in Korean pour the water into the glass *pour the glass with water pile the books onto the shelf *pile the shelf with books *pour-put the glass with water pile-put the shelf with books

Challenges Universals, parametric connections - why should they exist and be stable?

(Pena et al. 2002)

InputOutput 0 00 1 00 0 10 1 11 AND Network NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1.

Similar presentations

Presentation on theme: "InputOutput 0 00 1 00 0 10 1 11 AND Network NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

InputOutput 0 00 1 00 0 10 1 11 AND Network NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1.

Similar presentations

Presentation on theme: "InputOutput 0 00 1 00 0 10 1 11 AND Network NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1."— Presentation transcript:

Similar presentations

About project

Feedback