Download presentation
Presentation is loading. Please wait.
Published byColleen Farmer Modified over 9 years ago
1
The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk Donald Mathis Melanie Soderstrom with:
2
Personal Firsts thanks to SPP First invited talk! (& first visit to JHU, 1986) First public confessional: midnight thoughts of a worried connectionist (UNC, 1988) First generative syntax talk (Memphis, 1994) First attempt at stand-up comedy (Columbia, 2000) First rendition of a 900-page book as a graphical synopsis in Powerpoint (1 minute from now)
3
Advertisement Blackwell 2002 (??) Develop the Integrated Connectionist/Symbolic (ICS) Cognitive Architecture Case study in formalist multidisciplinary cognitive science The Harmonic Mind: From neural computation to optimality-theoretic grammar Paul Smolensky & Géraldine Legendre
4
Talk Plan ‘Sketch’ the ICS cognitive architecture, pointing to contributions from/to traditional disciplines Topics of direct philosophical relevance Explanation of the productivity of cognition Nativism Theoretical work –Symbolic –Connectionist Experimental work
5
Mystery Quote #1 “Smolensky has recently been spending a lot of his time trying to show that, vivid first impressions to the contrary notwithstanding, some sort of connectionist cognitive architecture can indeed account for compositionality, productivity, systematicity, and the like. It turns out to be rather a long story … 185 pages … are devoted to Smolensky’s telling of it, and there appears to be no end in sight. It seems it takes a lot of squeezing to get this stone to bleed.”
6
Computational neuroscience ICS Key sources Hopfield 1982, 1984 Cohen and Grossberg 1983 Hinton and Sejnowski 1983, 1986 Smolensky 1983, 1986 Geman and Geman 1984 Golden 1986, 1988 a1a1 i 1 (0.6) a2a2 i 2 (0.5) –λ (–0.9) Processing I: Activation Processing — spreading activation — is optimization : Harmony maximization
7
–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization a 1 must be active (strength: 0.6) 0.79 –0.21 Optimal compromise: Key sources: Hinton & Anderson 1981 Rumelhart, McClelland, & the PDP Group 1986 Cognitive psychology ICS a 2 must be active (strength: 0.5) Harmony maximization is satisfaction of parallel, violable constraints a 1 and a 2 must not be simultaneously active (strength: λ)
8
Representation Symbolic theory ICS Complex symbol structures Generative linguistics ICS Particular linguistic representations PDP connectionism ICS Distributed activation patterns ICS: realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)
9
Representation k/r 0 æ/r 01 t/r 11 σ/r ε [ σ k [æ t]] σ k tæ
10
Linguistics (markedness theory) ICS ICS Generative linguistics: Optimality Theory Key sources: Prince & Smolensky 1993 [ ms.; Rutgers report ] McCarthy & Prince 1993 [ ms. ] Texts : Archangeli & Langendoen 1997, Kager 1999, McCarthy 2001 Electronic archive: rutgers/ruccs/roa.html Constraints Met in SPP Debate, 1988!
11
Constraints N O C ODA : A syllable has no coda σ k tæ * violation W * H ( a [ σ k [æ t] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation
12
Constraint Interaction I ICS Grammatical theory Harmonic Grammar Legendre, Miyata, Smolensky 1990 et seq.
13
= H = Constraint Interaction I σ k tæ H = H(k/, σ)H(k/, σ) H(σ,\t)H(σ,\t) O NSET Onset/k The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths N O C ODA Coda/t Any formal language can be so generated.
14
Harmonic Grammar Parser Simple, comprehensible network Simple grammar G X → A B Y → B A Language Parsing A BB A X Y A BB A X Y Top-down A BB A Bottom-up X Y
15
Harmonic Grammar Parser Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1
16
Harmonic Grammar Parser Representations:
17
Harmonic Grammar Parser Weight matrix for Y → B A H(Y, B — ) > 0 H(Y, — A) > 0
18
Harmonic Grammar Parser Weight matrix for X → A B
19
Harmonic Grammar Parser Weight matrix for entire grammar G
20
Bottom-up Parsing
21
Top-down Parsing
22
Explaining Productivity Full-scale parsing of formal languages by neural-network Harmony maximization: productive competence How to explain?
23
1. Structured representations
24
+ 2. Structured connections
25
= Proof of Productivity Productive behavior follows mathematically from combining the combinatorial structure of the vectorial representations encoding inputs & outputs and the combinatorial structure of the weight matrices encoding knowledge
26
Mystery Quote #2 “Paul Smolensky has recently announced that the problem of explaining the compositionality of concepts within a connectionist framework is solved in principle. … This sounds suspiciously like the offer of a free lunch, and it turns out, upon examination, that there is nothing to it.”
27
Explaining Productivity I + + Intra-level decomposition: [A B] {A, B} Inter-level decomposition: [A B] {1,0, 1,…1} Semantics Processes GOFAI ICS ICS & GOFAI
28
Explaining Productivity II Intra-level decomposition: G { X AB, Y BA } Inter-level decomposition: [A B] {1,0, 1,…1} Semantics Processes GOFAI ICS ICS & GOFAI +
29
Mystery Quote #3 “ … even after all those pages, Smolensky hasn’t so much as made a start on constructing an alternative to the Classical account of the compositionality phenomena.”
30
Constraint Interaction II: OT ICS Grammatical theory Optimality Theory Prince & Smolensky 1993
31
Constraint Interaction II: OT Differential strength encoded in strict domination hierarchies : Every constraint has complete priority over all lower-ranked constraints (combined) = ‘Take-the-best’ heuristic (Hertwig, today) constraint cue ranking cue validity Decision-theoretic justification for OT? Approximate numerical encoding employs special (exponentially growing) weights
32
Constraint Interaction II: OT “Grammars can’t count” Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man
33
Constraint Interaction II: OT Constraints are universal Human grammars differ only in how these constraints are ranked ‘factorial typology’ First true contender for a formal theory of cross-linguistic typology
34
The Faithfulness / Markedness Dialectic ‘cat’: /kat/ kæt *N O C ODA — why? F AITHFULNESS requires identity M ARKEDNESS often opposes it Markedness-Faithfulness dialectic diversity English: N O C ODA ≫ F AITH Polynesian: F AITH ≫ N O C ODA (~French) Another markedness constraint M: Nasal Place Agreement [‘Assimilation’] (NPA): mb ≻ nb, ŋb nd ≻ md, ŋd ŋg ≻ ŋb, ŋd labial coronal velar
35
Nativism I: Learnability Learning algorithm Provably correct and efficient ( under strong assumptions) Sources: Tesar 1995 et seq. Tesar & Smolensky 1993, …, 2000 If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E
36
in + possible Candidates Faith Mark (NPA) ☹ ☞ E☹ ☞ E i np ossible * A i m possible * Faith * ☺ ☞☺ ☞ If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E Constraint Demotion Learning Correctly handles difficult case: multiple violations in E
37
Nativism I: Learnability M ≫ F is learnable with /in+possible/→impossible ‘not’ = in- except when followed by … “exception that proves the rule”: M = NPA M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp : √ M and √ F, no M vs. F conflict, no evidence for their ranking Thus must have M ≫ F in the initial state, ℌ 0
38
Nativism II: Experimental Test Linking hypothesis: More harmonic phonological stimuli ⇒ Longer listening time More harmonic: √ M ≻ * M, when equal on F √ F ≻ * F, when equal on M When must chose one or the other, more harmonic to satisfy M: M ≫ F M = Nasal Place Assimilation (NPA) Collaborators Peter Jusczyk Theresa Allocco (Elliott Moreton, Karen Arnold)
39
4.5 Months (NPA) Higher HarmonyLower Harmony um…ber… umber um…ber… iŋgu p =.006 (11/16)
40
Higher HarmonyLower Harmony um…ber…u mb erun…ber…u nb er p =.044 (11/16) 4.5 Months (NPA)
41
Markedness Faithfulness u n …ber…u mb eru n …ber…u nb er ???
42
4.5 Months (NPA) Higher HarmonyLower Harmony u n …ber…u mb eru n …ber…u nb er p =.001 (12/16)
43
Nativism III: UGenome Can we combine Connectionist realization of harmonic grammar OT’s characterization of UG to examine the biological plausibility of UG as innate knowledge? Collaborators Melanie Soderstrom Donald Mathis
44
Nativism III: UGenome The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’ Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!
45
The Problem No concrete examples of such a LAD exist Even highly simplified cases pose a hard problem: How can genes — which regulate production of proteins — encode symbolic principles of grammar? Test preparation: Syllable Theory
46
Basic syllabification: Function ƒ: /underlying form/ [surface form] Plural form of dish : /d š+s/ [.d .š z.] /CVCC/ [.CV.C V C.]
47
Basic syllabification: Function ƒ: /underlying form/ [surface form] Plural form of dish : /d š+s/ [.d .š z.] /CVCC/ [.CV.C V C.] Basic CV Syllable Structure Theory Prince & Smolensky 1993: Chapter 6 ‘Basic’ — No more than one segment per syllable position:.(C)V(C).
48
Basic syllabification: Function ƒ: /underlying form/ [surface form] Plural form of dish : /d š+s/ [.d .š z.] /CVCC/ [.CV.C V C.] Basic CV Syllable Structure Theory Correspondence Theory McCarthy & Prince 1995 (‘M&P’) /C 1 V 2 C 3 C 4 / [.C 1 V 2.C 3 V C 4 ]
49
P ARSE : Every element in the input corresponds to an element in the output — “no deletion” [M&P: ‘M AX ’] Syllabification: Constraints ( Con )
50
P ARSE : Every element in the input corresponds to an element in the output F ILL V/C : Every output V/C segment corresponds to an input V/C segment [every syllable position in the output is filled by an input segment] — “no insertion/epenthesis” [M&P: ‘D EP ’] Syllabification: Constraints ( Con )
51
P ARSE : Every element in the input corresponds to an element in the output F ILL V/C : Every output V/C segment corresponds to an input V/C segment O NSET : No V without a preceding C Syllabification: Constraints ( Con )
52
P ARSE : Every element in the input corresponds to an element in the output F ILL V/C : Every output V/C segment corresponds to an input V/C segment O NSET : No V without a preceding C N O C ODA : No C without a following V Syllabification: Constraints ( Con )
53
SA net architecture /C 1 C 2 / [C 1 V C 2 ] C V /C 1 C 2 / [ C 1 V C 2 ]
54
Connection substructure Local: fixed, gene- tically determined Content of constraint 1 Global: variable during learning Strength of constraint 1 1 s1s1 2 i s2s2 Network weight: Network input: ι = W Ψ a
55
P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33 All connection coefficients are +2
56
O NSET All connection coefficients are 1 C V
57
Activation dynamics Boltzmann Machine/Harmony Theory dynamics (temperature T 0)
58
Boltzmann-type learning dynamics Clamped: P + = input & output; P = input Δ s i = ε[ E { H i | P + } E { H i | P }] ε E { H i | P } = During the processing of training data in phase P , whenever unit φ (of type Φ) and unit ψ (of type Ψ) are simultaneously active, modify s i by ε. [ε = ε/ N p ] Gradient descent in
59
Crucial Open Question (Truth in Advertising) Relation between strict domination and neural networks? Apparently not a problem in the case of the CV Theory
60
To be encoded How many different kinds of units are there? What information is necessary (from the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it? How are constraints initially specified? How are they maintained through the learning process?
61
Unit types Input unitsCV Output unitsCVx Correspondence unitsCV 7 distinct unit types Each represented in a distinct sub- region of the abstract genome ‘Help ourselves’ to implicit machinery to spell out these sub-regions as distinct cell types, located in grid as illustrated
62
Connectivity geometry Assume 3-d grid geometry V C ‘E’ ‘N’ ‘back’
63
Constraint: P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33 Input units grow south and connect Output units grow east and connect Correspondence units grow north & west and connect with input & output units.
64
Constraint: O NSET Short connections grow north-south between adjacent V output units, and between the first V node and the first x node. C V
65
Direction of projection growth Topographic organizations widely attested throughout neural structures Activity-dependent growth a possible alternative Orientation information (axes) Chemical gradients during development Cell age a possible alternative
66
Projection parameters Direction Extent Local Non-local Target unit type Strength of connections encoded separately
67
Connectivity Genome Contributions from O NSET and P ARSE : Source: CICI VIVI COCO VOVOC VCVC xoxo Projec- tions : S LC C S L V C E L C C E L V C N&S S V O N S x 0 N L C I W L C O N L V I W L V O S S V O Key: DirectionExtentTarget N(orth) S(outh) E(ast) W(est) F(ront) B(ack) L(ong) S(hort)Input: C I V I Output: C O V O x (0) Corr: V C C C
68
C V O NSET x 0 segment: | S S V O | N S x 0 V O segment: N&S S V O
69
Encoding connection strength For each constraint i, need to ‘embody’ Constraint strength s i Connection coefficients (Φ Ψ cell types) Product of these is contribution of i to the Φ Ψ connection weight Network-level specification —
70
Φ Ψ Processing [P 1 ] ∝ s 1
71
Φ Ψ Development
72
Φ Ψ Learning (during phase P + ; reverse during P )
73
Learning Behavior Simplified system can be solved analytically Learning algorithm turns out to ≈ s i ( ) = [# violations of constraint i P ]
74
C-C: C ORRESPOND : Abstract Gene Map General Developmental MachineryConnectivityConstraint Coefficients S L C C S L V C F S V C N/E L C C &V C S/W L C C &V C directionextenttarget C-I: V-I: G C O &V&x B 1 C C &V C B 2 C C C I &C O 1V C V I &V O 1 R ESPOND : G
75
Summary Described an attempt to integrate Connectionist theory of mental processes (computational neuroscience, cognitive psychology) Symbolic theory of Mental functions (philosophy, linguistics) Representations –General structure (philosophy, AI) –Specific structure (linguistics) Informs theory of UG Form, content Genetic encoding
76
Mystery Quote #4 “Smolensky, it would appear, would like a special dispensation for connectionist cognitive science to get the goodness out of Classical constituents without actually admitting that there are any.”
77
Mystery Quote #5 “The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather …to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non ‑ symbolic computation...” conceptual ‑ level research with new computational concepts and techniques that reflect an understanding of how conceptual ‑ level theoretical constructs emerge from subconceptual computation …
78
Mystery Quote #5 “The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non ‑ symbolic computation; to enrich conceptual ‑ level research with new computational concepts and techniques that reflect an understanding of how conceptual ‑ level theoretical constructs emerge from subconceptual computation…”
79
Thanks for your attention
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.