Download presentation
Presentation is loading. Please wait.
Published byElwin Hunt Modified over 9 years ago
1
Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural networks) –Philosophy (symbolic/connectionist debate) –Psychology (infant phonology) please indicate your relative level of –interest (for these lectures) [1 = least, 5 = most] –background [1 = none, 5 = expert] Thank you
2
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department Johns Hopkins University
3
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture –Symbols and neurons –Symbols in neural networks –Optimization in neural networks 2.Optimization in grammar I: HG OT 3.Optimization in grammar II: OT 4.OT in neural networks
4
Cognitive architecture Central dogma of cognitive science: Cognition is computation But what type of computation? What exactly is computation, and what work must it do in cognitive science?
5
Computation Functions, cognitive –Pixels objects locations [low- to high-level vision] –Sound stream word string [phonetics + …] –Word string parse tree [syntax] –Underlying form surface form [phonology] petit copain: /p ə tit + kop ɛ ̃ / [p ə. ti.ko.p ɛ ̃] petit ami: /p ə tit + ami/ [p ə. ti.ta.mi] Reduction of complex procedures for evaluating functions to combinations of primitive operations Computational architecture: –Operations: primitives + combinators –Data
6
Symbolic Computation Computational architecture: –Operations: primitives + combinators –Data The Pure Symbolic Architecture (PSA) –Data: strings, (binary) trees, graphs, … –Operations Primitives –Concatenate (string, tree) = cons –First-member(string); left-subtree(tree) = ex 0 Combinators –Composition: f ( x ) = def g ( h(x ))) –IF( x = A) THEN … ELSE …
7
ƒ Passive Few leaders are admired by George admire(George, few leaders) ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) But for cognition, need a reduction to a very different computational architecture Aux by V A P V PA PassiveLF
8
The cognitive architecture: The connectionist hypothesis –Representations: Distributed activation patterns –Primitive operations (e.g.) Multiplication of activations by synaptic weights Summation of weighted activation values Non-linear transfer functions – Combination: Massive parallelism PDP Computation At the lowest computational level of the mind/brain
9
Criticism of PDP (e.g., neuroscientists) Much too simple Misguided. Relevant complaint: – Much too complex – Target of computational reduction must be within the scope of neural computation. Confusion between two questions
10
The cognitive question for neuroscience What is the function of each component of the nervous system? Our question is quite different.
11
The neural question for cognitive science How are complex cognitive functions computed by a mass of numerical processors like neurons— each very simple, slow, and imprecise relative to the components that have traditionally been used to construct powerful, general-purpose computational systems? How does the structure arise that enables such a medium to achieve cognitive computation?
12
The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) In higher cognitive domains, representations and fuctions are well approximated by symbolic computation The Connectionist Hypothesis is correct Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation
13
Output Agent B DCFby EG Patient Aux Agent F G B DC Patient E Input W PassiveNet
14
The ICS Isomorphism Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks
15
Within-level compositionality W = W cons0 [ W ex1 W ex0 W ex1 ] + W cons1 [ W cons0 ( W ex1 W ex1 W ex1 )+ W cons1 ( W ex0 ) ] ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) Between-level reduction
16
Levels
17
ƒ G The ICS Architecture “dogs” dog+s A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ d gz Processing (Learning)
18
Computational neuroscience Key sources –Hopfield 1982, 1984 –Cohen and Grossberg 1983 –Hinton and Sejnowski 1983, 1986 –Smolensky 1983, 1986 –Geman and Geman 1984 –Golden 1986, 1988 Processing I: Activation
19
a1a1 i 1 (0.6) a2a2 i 2 (0.5) –λ (–0.9) Processing I: Activation Processing — spreading activation — is optimization : Harmony maximization
20
ƒ The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: optimal H : Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ catkæt G A
21
–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization Cognitive psychology Key sources: –Hinton & Anderson 1981 –Rumelhart, McClelland, & the PDP Group 1986 Processing — spreading activation — is optimization : Harmony maximization
22
–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization a 1 must be active (strength: 0.6) a 2 must be active (strength: 0.5) Harmony maximization is satisfaction of parallel, violable well-formedness constraints a 1 and a 2 must not be simultaneously active (strength: λ) 0.79 –0.21 Optimal compromise: CONFLICT
23
Processing II: Optimization The search for an optimal state can employ randomness Equations for units’ activation values have random terms – pr ( a ) ∝ e H ( a )/ T – T (‘temperature’) ~ randomness 0 during search –Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)
24
The ICS Architecture A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt. constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Satisfaction Spreading Activation σ k tæ σ k tæ σ k tæ ƒ catkæt G
25
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
26
Representation Symbolic theory Complex symbol structures Generative linguistics (Chomsky & Halle ’68 …) Particular linguistic representations Markedness Theory (Jakobson, Trubetzkoy, ’30s …) Good (well-formed) linguistic representations Connectionism (PDP) Distributed activation patterns ICS realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.) will employ ‘local representations’ as well
27
Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t
28
Tensor Product Representations Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫
29
Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations ⊗ Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) Depth 0 ⑫ ⑤ ⑪ ⑥ ⑦ ⑧ ⑨ ⑩ Depth 1 ① ② ③ ④ ① ② ③ ④
30
Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i ⊗ Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) ⑧ ⑦ ⑤ ⑥ Depth 1Depth 0 ① ② ③ ④ ① ② ③ ④ ⑫ ⑪ ⑩ ⑨
31
Local tree realizations Representations:
32
stopped
33
The ICS Isomorphism Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks
34
Structuring operation Symbolic formalizationConnectionist formalization StructuresExample Vector operation CombiningSets {c 1, c 2 } c 1 + c 2 Vector sum: + Role/filler binding Strings, frames AB = { A / r 1, B / r 2 } A r 1 + B r 2 Tensor product: Recursive embedding Trees A B C A r 0 +[B r 0 + C r 1 ] r 1 Recursive role vectors: r left/right-child(x) = r 0/1 r x Tensor Product Representations
35
Mental representations are defined by the activation values of connectionist units. When analyzed at a higher level, these representations are distributed patterns of activity — activation vectors. For core aspects of higher cognitive domains, these vectors realize symbolic structures. Such a symbolic structure s is defined by a collection of structural roles { r i } each of which may be occupied by a filler f i ; s is a set of constituents, each a filler/role binding f i / r i. The connectionist realization of s is an activity vector s = Σ i f i r i In higher cognitive domains such as language and reasoning, mental representations are recursive: the fillers or roles of s have themselves the same type of internal structure as s. And these structured fillers f or roles r in turn have the same type of tensor product realization as s.
36
Representation: Combination
37
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ catkæt G
38
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
39
Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t
40
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
41
Constraints N O C ODA : A syllable has no coda [Maori] W * H ( a [ σ k [æ t]] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation σ k tæ ‘cat’
42
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ catkæt G
43
NEXT LECTURE: HG, OT The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ Constraint Interaction ?? A ƒ catkæt G
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.