The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk.

The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk Donald Mathis Melanie Soderstrom with:

Personal Firsts thanks to SPP  First invited talk! (& first visit to JHU, 1986)  First public confessional: midnight thoughts of a worried connectionist (UNC, 1988)  First generative syntax talk (Memphis, 1994)  First attempt at stand-up comedy (Columbia, 2000)  First rendition of a 900-page book as a graphical synopsis in Powerpoint (1 minute from now)

Advertisement  Blackwell 2002 (??)  Develop the Integrated Connectionist/Symbolic (ICS) Cognitive Architecture  Case study in formalist multidisciplinary cognitive science The Harmonic Mind: From neural computation to optimality-theoretic grammar Paul Smolensky & Géraldine Legendre

Talk Plan  ‘Sketch’ the ICS cognitive architecture, pointing to contributions from/to traditional disciplines  Topics of direct philosophical relevance Explanation of the productivity of cognition Nativism  Theoretical work –Symbolic –Connectionist  Experimental work

Mystery Quote #1 “Smolensky has recently been spending a lot of his time trying to show that, vivid first impressions to the contrary notwithstanding, some sort of connectionist cognitive architecture can indeed account for compositionality, productivity, systematicity, and the like. It turns out to be rather a long story … 185 pages … are devoted to Smolensky’s telling of it, and there appears to be no end in sight. It seems it takes a lot of squeezing to get this stone to bleed.”

 Computational neuroscience  ICS  Key sources Hopfield 1982, 1984 Cohen and Grossberg 1983 Hinton and Sejnowski 1983, 1986 Smolensky 1983, 1986 Geman and Geman 1984 Golden 1986, 1988 a1a1 i 1 (0.6) a2a2 i 2 (0.5) –λ (–0.9) Processing I: Activation Processing — spreading activation — is optimization : Harmony maximization

–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization a 1 must be active (strength: 0.6) 0.79 –0.21 Optimal compromise:  Key sources: Hinton & Anderson 1981 Rumelhart, McClelland, & the PDP Group 1986  Cognitive psychology  ICS a 2 must be active (strength: 0.5) Harmony maximization is satisfaction of parallel, violable constraints a 1 and a 2 must not be simultaneously active (strength: λ)

Representation  Symbolic theory  ICS Complex symbol structures Generative linguistics  ICS  Particular linguistic representations  PDP connectionism  ICS Distributed activation patterns  ICS: realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)

Representation k/r 0 æ/r 01 t/r 11 σ/r ε [ σ k [æ t]] σ k tæ

 Linguistics (markedness theory)  ICS  ICS  Generative linguistics: Optimality Theory  Key sources: Prince & Smolensky 1993 [ ms.; Rutgers report ] McCarthy & Prince 1993 [ ms. ] Texts : Archangeli & Langendoen 1997, Kager 1999, McCarthy 2001 Electronic archive: rutgers/ruccs/roa.html Constraints Met in SPP Debate, 1988!

Constraints N O C ODA : A syllable has no coda σ k tæ * violation W * H ( a [ σ k [æ t] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation

Constraint Interaction I  ICS  Grammatical theory Harmonic Grammar  Legendre, Miyata, Smolensky 1990 et seq.

= H = Constraint Interaction I σ k tæ H = H(k/, σ)H(k/, σ) H(σ,\t)H(σ,\t) O NSET Onset/k The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths N O C ODA Coda/t Any formal language can be so generated.

Harmonic Grammar Parser  Simple, comprehensible network  Simple grammar G X → A B Y → B A  Language Parsing A BB A X Y A BB A X Y Top-down A BB A Bottom-up X Y

Harmonic Grammar Parser  Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1

Harmonic Grammar Parser  Representations:

Harmonic Grammar Parser  Weight matrix for Y → B A H(Y, B — ) > 0 H(Y, — A) > 0

Harmonic Grammar Parser  Weight matrix for X → A B

Harmonic Grammar Parser  Weight matrix for entire grammar G

Bottom-up Parsing

Top-down Parsing

Explaining Productivity  Full-scale parsing of formal languages by neural-network Harmony maximization: productive competence  How to explain?

1. Structured representations

+ 2. Structured connections

= Proof of Productivity  Productive behavior follows mathematically from combining the combinatorial structure of the vectorial representations encoding inputs & outputs and the combinatorial structure of the weight matrices encoding knowledge

Mystery Quote #2 “Paul Smolensky has recently announced that the problem of explaining the compositionality of concepts within a connectionist framework is solved in principle. … This sounds suspiciously like the offer of a free lunch, and it turns out, upon examination, that there is nothing to it.”

Explaining Productivity I + + Intra-level decomposition: [A B]  {A, B} Inter-level decomposition: [A B]  {1,0,  1,…1} Semantics Processes GOFAI ICS ICS & GOFAI

Explaining Productivity II Intra-level decomposition: G  { X  AB, Y  BA } Inter-level decomposition: [A B]  {1,0,  1,…1} Semantics Processes GOFAI ICS ICS & GOFAI +

Mystery Quote #3  “ … even after all those pages, Smolensky hasn’t so much as made a start on constructing an alternative to the Classical account of the compositionality phenomena.”

Constraint Interaction II: OT  ICS  Grammatical theory Optimality Theory  Prince & Smolensky 1993

Constraint Interaction II: OT  Differential strength encoded in strict domination hierarchies : Every constraint has complete priority over all lower-ranked constraints (combined) = ‘Take-the-best’ heuristic (Hertwig, today)  constraint  cue  ranking  cue validity Decision-theoretic justification for OT? Approximate numerical encoding employs special (exponentially growing) weights

Constraint Interaction II: OT  “Grammars can’t count”  Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man

Constraint Interaction II: OT  Constraints are universal  Human grammars differ only in how these constraints are ranked ‘factorial typology’  First true contender for a formal theory of cross-linguistic typology

The Faithfulness / Markedness Dialectic  ‘cat’: /kat/  kæt *N O C ODA — why? F AITHFULNESS requires identity M ARKEDNESS often opposes it  Markedness-Faithfulness dialectic  diversity English: N O C ODA ≫ F AITH Polynesian: F AITH ≫ N O C ODA (~French)  Another markedness constraint M: Nasal Place Agreement [‘Assimilation’] (NPA): mb ≻ nb, ŋb nd ≻ md, ŋd ŋg ≻ ŋb, ŋd labial coronal velar

Nativism I: Learnability  Learning algorithm Provably correct and efficient ( under strong assumptions) Sources:  Tesar 1995 et seq.  Tesar & Smolensky 1993, …, 2000 If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E

in + possible Candidates Faith Mark (NPA) ☹ ☞ E☹ ☞ E i np ossible * A i m possible * Faith * ☺ ☞☺ ☞ If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E Constraint Demotion Learning Correctly handles difficult case: multiple violations in E

Nativism I: Learnability  M ≫ F is learnable with /in+possible/→impossible ‘not’ = in- except when followed by … “exception that proves the rule”: M = NPA  M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp : √ M and √ F, no M vs. F conflict, no evidence for their ranking  Thus must have M ≫ F in the initial state, ℌ 0

Nativism II: Experimental Test  Linking hypothesis: More harmonic phonological stimuli ⇒ Longer listening time  More harmonic: √ M ≻ * M, when equal on F √ F ≻ * F, when equal on M When must chose one or the other, more harmonic to satisfy M: M ≫ F  M = Nasal Place Assimilation (NPA)  Collaborators  Peter Jusczyk  Theresa Allocco  (Elliott Moreton, Karen Arnold)

4.5 Months (NPA) Higher HarmonyLower Harmony um…ber… umber um…ber… iŋgu p =.006 (11/16)

Higher HarmonyLower Harmony um…ber…u mb erun…ber…u nb er p =.044 (11/16) 4.5 Months (NPA)

 Markedness  Faithfulness u n …ber…u mb eru n …ber…u nb er ???

4.5 Months (NPA) Higher HarmonyLower Harmony u n …ber…u mb eru n …ber…u nb er p =.001 (12/16)

Nativism III: UGenome  Can we combine Connectionist realization of harmonic grammar OT’s characterization of UG to examine the biological plausibility of UG as innate knowledge?  Collaborators Melanie Soderstrom Donald Mathis

Nativism III: UGenome  The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device  Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’  Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!

The Problem  No concrete examples of such a LAD exist  Even highly simplified cases pose a hard problem: How can genes — which regulate production of proteins — encode symbolic principles of grammar?  Test preparation: Syllable Theory

Basic syllabification: Function  ƒ: /underlying form/  [surface form]  Plural form of dish : /d  š+s/  [.d .š  z.]  /CVCC/  [.CV.C V C.]

Basic syllabification: Function  ƒ: /underlying form/  [surface form]  Plural form of dish : /d  š+s/  [.d .š  z.]  /CVCC/  [.CV.C V C.]  Basic CV Syllable Structure Theory Prince & Smolensky 1993: Chapter 6 ‘Basic’ — No more than one segment per syllable position:.(C)V(C).

Basic syllabification: Function  ƒ: /underlying form/  [surface form]  Plural form of dish : /d  š+s/  [.d .š  z.]  /CVCC/  [.CV.C V C.]  Basic CV Syllable Structure Theory  Correspondence Theory McCarthy & Prince 1995 (‘M&P’)  /C 1 V 2 C 3 C 4 /  [.C 1 V 2.C 3 V C 4 ]

 P ARSE : Every element in the input corresponds to an element in the output — “no deletion” [M&P: ‘M AX ’] Syllabification: Constraints ( Con )

 P ARSE : Every element in the input corresponds to an element in the output  F ILL V/C : Every output V/C segment corresponds to an input V/C segment [every syllable position in the output is filled by an input segment] — “no insertion/epenthesis” [M&P: ‘D EP ’] Syllabification: Constraints ( Con )

 P ARSE : Every element in the input corresponds to an element in the output  F ILL V/C : Every output V/C segment corresponds to an input V/C segment  O NSET : No V without a preceding C Syllabification: Constraints ( Con )

 P ARSE : Every element in the input corresponds to an element in the output  F ILL V/C : Every output V/C segment corresponds to an input V/C segment  O NSET : No V without a preceding C  N O C ODA : No C without a following V Syllabification: Constraints ( Con )

SA net architecture  /C 1 C 2 /  [C 1 V C 2 ] C V /C 1 C 2 / [ C 1 V C 2 ]

Connection substructure   Local: fixed, gene- tically determined Content of constraint  1 Global: variable during learning Strength of constraint  1 1 s1s1 2 i s2s2 Network weight: Network input: ι = W Ψ  a 

P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33  All connection coefficients are +2

O NSET  All connection coefficients are  1 C V

Activation dynamics  Boltzmann Machine/Harmony Theory dynamics (temperature T  0)

Boltzmann-type learning dynamics  Clamped: P + = input & output; P  = input  Δ s i = ε[ E { H i | P + }  E { H i | P  }]  ε E { H i | P  } =  During the processing of training data in phase P , whenever unit φ (of type Φ) and unit ψ (of type Ψ) are simultaneously active, modify s i by  ε. [ε = ε/ N p ] Gradient descent in

Crucial Open Question (Truth in Advertising)  Relation between strict domination and neural networks?  Apparently not a problem in the case of the CV Theory

To be encoded  How many different kinds of units are there?  What information is necessary (from the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it?  How are constraints initially specified?  How are they maintained through the learning process?

Unit types  Input unitsCV  Output unitsCVx  Correspondence unitsCV  7 distinct unit types  Each represented in a distinct sub- region of the abstract genome  ‘Help ourselves’ to implicit machinery to spell out these sub-regions as distinct cell types, located in grid as illustrated

Connectivity geometry  Assume 3-d grid geometry V C ‘E’ ‘N’ ‘back’

Constraint: P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33  Input units grow south and connect  Output units grow east and connect  Correspondence units grow north & west and connect with input & output units.

Constraint: O NSET  Short connections grow north-south between adjacent V output units,  and between the first V node and the first x node. C V

Direction of projection growth  Topographic organizations widely attested throughout neural structures Activity-dependent growth a possible alternative  Orientation information (axes) Chemical gradients during development Cell age a possible alternative

Projection parameters  Direction  Extent Local Non-local  Target unit type  Strength of connections encoded separately

Connectivity Genome  Contributions from O NSET and P ARSE : Source: CICI VIVI COCO VOVOC VCVC xoxo Projec- tions : S LC C S L V C E L C C E L V C N&S S V O N S x 0 N L C I W L C O N L V I W L V O S S V O  Key: DirectionExtentTarget N(orth) S(outh) E(ast) W(est) F(ront) B(ack) L(ong) S(hort)Input: C I V I Output: C O V O x (0) Corr: V C C C

C V O NSET x 0 segment: | S S V O | N S x 0  V O segment: N&S S V O

Encoding connection strength  For each constraint  i, need to ‘embody’ Constraint strength s i Connection coefficients (Φ  Ψ cell types)  Product of these is contribution of  i to the Φ  Ψ connection weight  Network-level specification —

Φ Ψ Processing [P 1 ] ∝ s 1

Φ Ψ Development

Φ Ψ Learning (during phase P + ; reverse during P  )

Learning Behavior  Simplified system can be solved analytically  Learning algorithm turns out to ≈  s i (  ) =  [# violations of constraint i P  ]

C-C: C ORRESPOND : Abstract Gene Map General Developmental MachineryConnectivityConstraint Coefficients S L C C S L V C F S V C N/E L C C &V C S/W L C C &V C directionextenttarget C-I: V-I: G  C O &V&x B 1 C C &V C B  2 C C C I &C O 1V C V I &V O 1  R ESPOND : G 

Summary  Described an attempt to integrate Connectionist theory of mental processes (computational neuroscience, cognitive psychology) Symbolic theory of  Mental functions (philosophy, linguistics)  Representations –General structure (philosophy, AI) –Specific structure (linguistics)  Informs theory of UG Form, content Genetic encoding

Mystery Quote #4  “Smolensky, it would appear, would like a special dispensation for connectionist cognitive science to get the goodness out of Classical constituents without actually admitting that there are any.”

Mystery Quote #5 “The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather …to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non ‑ symbolic computation...” conceptual ‑ level research with new computational concepts and techniques that reflect an understanding of how conceptual ‑ level theoretical constructs emerge from subconceptual computation …

Mystery Quote #5 “The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non ‑ symbolic computation; to enrich conceptual ‑ level research with new computational concepts and techniques that reflect an understanding of how conceptual ‑ level theoretical constructs emerge from subconceptual computation…”

Thanks for your attention

The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk.

Similar presentations

Presentation on theme: "The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk.

Similar presentations

Presentation on theme: "The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk."— Presentation transcript:

Similar presentations

About project

Feedback