Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural networks) –Philosophy (symbolic/connectionist debate) –Psychology (infant phonology) please indicate your relative level of –interest (for these lectures) [1 = least, 5 = most] –background [1 = none, 5 = expert] Thank you
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department Johns Hopkins University
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture –Symbols and neurons –Symbols in neural networks –Optimization in neural networks 2.Optimization in grammar I: HG OT 3.Optimization in grammar II: OT 4.OT in neural networks
Cognitive architecture Central dogma of cognitive science: Cognition is computation But what type of computation? What exactly is computation, and what work must it do in cognitive science?
Computation Functions, cognitive –Pixels objects locations [low- to high-level vision] –Sound stream word string [phonetics + …] –Word string parse tree [syntax] –Underlying form surface form [phonology] petit copain: /p ə tit + kop ɛ ̃ / [p ə. ti.ko.p ɛ ̃] petit ami: /p ə tit + ami/ [p ə. ti.ta.mi] Reduction of complex procedures for evaluating functions to combinations of primitive operations Computational architecture: –Operations: primitives + combinators –Data
Symbolic Computation Computational architecture: –Operations: primitives + combinators –Data The Pure Symbolic Architecture (PSA) –Data: strings, (binary) trees, graphs, … –Operations Primitives –Concatenate (string, tree) = cons –First-member(string); left-subtree(tree) = ex 0 Combinators –Composition: f ( x ) = def g ( h(x ))) –IF( x = A) THEN … ELSE …
ƒ Passive Few leaders are admired by George admire(George, few leaders) ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) But for cognition, need a reduction to a very different computational architecture Aux by V A P V PA PassiveLF
The cognitive architecture: The connectionist hypothesis –Representations: Distributed activation patterns –Primitive operations (e.g.) Multiplication of activations by synaptic weights Summation of weighted activation values Non-linear transfer functions – Combination: Massive parallelism PDP Computation At the lowest computational level of the mind/brain
Criticism of PDP (e.g., neuroscientists) Much too simple Misguided. Relevant complaint: – Much too complex – Target of computational reduction must be within the scope of neural computation. Confusion between two questions
The cognitive question for neuroscience What is the function of each component of the nervous system? Our question is quite different.
The neural question for cognitive science How are complex cognitive functions computed by a mass of numerical processors like neurons— each very simple, slow, and imprecise relative to the components that have traditionally been used to construct powerful, general-purpose computational systems? How does the structure arise that enables such a medium to achieve cognitive computation?
The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) In higher cognitive domains, representations and fuctions are well approximated by symbolic computation The Connectionist Hypothesis is correct Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation
Output Agent B DCFby EG Patient Aux Agent F G B DC Patient E Input W PassiveNet
The ICS Isomorphism Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks
Within-level compositionality W = W cons0 [ W ex1 W ex0 W ex1 ] + W cons1 [ W cons0 ( W ex1 W ex1 W ex1 )+ W cons1 ( W ex0 ) ] ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) Between-level reduction
Levels
ƒ G The ICS Architecture “dogs” dog+s A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ d gz Processing (Learning)
Computational neuroscience Key sources –Hopfield 1982, 1984 –Cohen and Grossberg 1983 –Hinton and Sejnowski 1983, 1986 –Smolensky 1983, 1986 –Geman and Geman 1984 –Golden 1986, 1988 Processing I: Activation
a1a1 i 1 (0.6) a2a2 i 2 (0.5) –λ (–0.9) Processing I: Activation Processing — spreading activation — is optimization : Harmony maximization
ƒ The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: optimal H : Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ catkæt G A
–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization Cognitive psychology Key sources: –Hinton & Anderson 1981 –Rumelhart, McClelland, & the PDP Group 1986 Processing — spreading activation — is optimization : Harmony maximization
–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization a 1 must be active (strength: 0.6) a 2 must be active (strength: 0.5) Harmony maximization is satisfaction of parallel, violable well-formedness constraints a 1 and a 2 must not be simultaneously active (strength: λ) 0.79 –0.21 Optimal compromise: CONFLICT
Processing II: Optimization The search for an optimal state can employ randomness Equations for units’ activation values have random terms – pr ( a ) ∝ e H ( a )/ T – T (‘temperature’) ~ randomness 0 during search –Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)
The ICS Architecture A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt. constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Satisfaction Spreading Activation σ k tæ σ k tæ σ k tæ ƒ catkæt G
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
Representation Symbolic theory Complex symbol structures Generative linguistics (Chomsky & Halle ’68 …) Particular linguistic representations Markedness Theory (Jakobson, Trubetzkoy, ’30s …) Good (well-formed) linguistic representations Connectionism (PDP) Distributed activation patterns ICS realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.) will employ ‘local representations’ as well
Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t
Tensor Product Representations Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫
Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations ⊗ Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) Depth 0 ⑫ ⑤ ⑪ ⑥ ⑦ ⑧ ⑨ ⑩ Depth 1 ① ② ③ ④ ① ② ③ ④
Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i ⊗ Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) ⑧ ⑦ ⑤ ⑥ Depth 1Depth 0 ① ② ③ ④ ① ② ③ ④ ⑫ ⑪ ⑩ ⑨
Local tree realizations Representations:
stopped
The ICS Isomorphism Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks
Structuring operation Symbolic formalizationConnectionist formalization StructuresExample Vector operation CombiningSets {c 1, c 2 } c 1 + c 2 Vector sum: + Role/filler binding Strings, frames AB = { A / r 1, B / r 2 } A r 1 + B r 2 Tensor product: Recursive embedding Trees A B C A r 0 +[B r 0 + C r 1 ] r 1 Recursive role vectors: r left/right-child(x) = r 0/1 r x Tensor Product Representations
Mental representations are defined by the activation values of connectionist units. When analyzed at a higher level, these representations are distributed patterns of activity — activation vectors. For core aspects of higher cognitive domains, these vectors realize symbolic structures. Such a symbolic structure s is defined by a collection of structural roles { r i } each of which may be occupied by a filler f i ; s is a set of constituents, each a filler/role binding f i / r i. The connectionist realization of s is an activity vector s = Σ i f i r i In higher cognitive domains such as language and reasoning, mental representations are recursive: the fillers or roles of s have themselves the same type of internal structure as s. And these structured fillers f or roles r in turn have the same type of tensor product realization as s.
Representation: Combination
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ catkæt G
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
Constraints N O C ODA : A syllable has no coda [Maori] W * H ( a [ σ k [æ t]] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation σ k tæ ‘cat’
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ catkæt G
NEXT LECTURE: HG, OT The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ Constraint Interaction ?? A ƒ catkæt G