Presentation is loading. Please wait.

Presentation is loading. Please wait.

Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural.

Similar presentations


Presentation on theme: "Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural."— Presentation transcript:

1 Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural networks) –Philosophy (symbolic/connectionist debate) –Psychology (infant phonology) please indicate your relative level of –interest (for these lectures) [1 = least, 5 = most] –background [1 = none, 5 = expert] Thank you

2 Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department Johns Hopkins University

3 Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture –Symbols and neurons –Symbols in neural networks –Optimization in neural networks 2.Optimization in grammar I: HG  OT 3.Optimization in grammar II: OT 4.OT in neural networks

4 Cognitive architecture Central dogma of cognitive science: Cognition is computation But what type of computation? What exactly is computation, and what work must it do in cognitive science?

5 Computation Functions, cognitive –Pixels  objects  locations [low- to high-level vision] –Sound stream  word string [phonetics + …] –Word string  parse tree [syntax] –Underlying form  surface form [phonology] petit copain: /p ə tit + kop ɛ ̃ /  [p ə. ti.ko.p ɛ ̃] petit ami: /p ə tit + ami/  [p ə. ti.ta.mi] Reduction of complex procedures for evaluating functions to combinations of primitive operations Computational architecture: –Operations: primitives + combinators –Data

6 Symbolic Computation Computational architecture: –Operations: primitives + combinators –Data The Pure Symbolic Architecture (PSA) –Data: strings, (binary) trees, graphs, … –Operations Primitives –Concatenate (string, tree) = cons –First-member(string); left-subtree(tree) = ex 0 Combinators –Composition: f ( x ) = def g ( h(x ))) –IF( x = A) THEN … ELSE …

7 ƒ Passive Few leaders are admired by George  admire(George, few leaders) ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) But for cognition, need a reduction to a very different computational architecture Aux by V A P V PA  PassiveLF

8 The cognitive architecture: The connectionist hypothesis –Representations: Distributed activation patterns –Primitive operations (e.g.) Multiplication of activations by synaptic weights Summation of weighted activation values Non-linear transfer functions – Combination: Massive parallelism PDP Computation At the lowest computational level of the mind/brain

9 Criticism of PDP (e.g., neuroscientists) Much too simple Misguided. Relevant complaint: – Much too complex – Target of computational reduction must be within the scope of neural computation. Confusion between two questions

10 The cognitive question for neuroscience What is the function of each component of the nervous system? Our question is quite different.

11 The neural question for cognitive science How are complex cognitive functions computed by a mass of numerical processors like neurons— each very simple, slow, and imprecise relative to the components that have traditionally been used to construct powerful, general-purpose computational systems? How does the structure arise that enables such a medium to achieve cognitive computation?

12 The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) In higher cognitive domains, representations and fuctions are well approximated by symbolic computation The Connectionist Hypothesis is correct Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation

13 Output Agent B DCFby EG Patient Aux Agent F G B DC Patient E Input W PassiveNet

14 The ICS Isomorphism  Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks

15 Within-level compositionality W = W cons0 [ W ex1 W ex0 W ex1 ] + W cons1 [ W cons0 ( W ex1 W ex1 W ex1 )+ W cons1 ( W ex0 ) ] ƒ ( s ) = cons(ex 1 (ex 0 (ex 1 (s))), cons(ex 1 (ex 1 (ex 1 (s))), ex 0 (s))) Between-level reduction

16 Levels

17 ƒ G The ICS Architecture “dogs” dog+s A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ  d  gz  Processing (Learning)

18 Computational neuroscience Key sources –Hopfield 1982, 1984 –Cohen and Grossberg 1983 –Hinton and Sejnowski 1983, 1986 –Smolensky 1983, 1986 –Geman and Geman 1984 –Golden 1986, 1988 Processing I: Activation

19 a1a1 i 1 (0.6) a2a2 i 2 (0.5) –λ (–0.9) Processing I: Activation Processing — spreading activation — is optimization : Harmony maximization

20 ƒ The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  optimal H : Activation Pattern Connection Weights Harmony Optimization Spreading Activation σ k tæ σ k tæ σ k tæ  catkæt G A

21 –λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization Cognitive psychology Key sources: –Hinton & Anderson 1981 –Rumelhart, McClelland, & the PDP Group 1986 Processing — spreading activation — is optimization : Harmony maximization

22 –λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization a 1 must be active (strength: 0.6) a 2 must be active (strength: 0.5) Harmony maximization is satisfaction of parallel, violable well-formedness constraints a 1 and a 2 must not be simultaneously active (strength: λ) 0.79 –0.21 Optimal compromise: CONFLICT

23 Processing II: Optimization The search for an optimal state can employ randomness Equations for units’ activation values have random terms – pr ( a ) ∝ e H ( a )/ T – T (‘temperature’) ~ randomness  0 during search –Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)

24 The ICS Architecture A Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt. constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Satisfaction Spreading Activation σ k tæ σ k tæ σ k tæ  ƒ catkæt G

25 Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

26 Representation  Symbolic theory Complex symbol structures Generative linguistics (Chomsky & Halle ’68 …)  Particular linguistic representations Markedness Theory (Jakobson, Trubetzkoy, ’30s …)  Good (well-formed) linguistic representations  Connectionism (PDP) Distributed activation patterns  ICS realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.) will employ ‘local representations’ as well

27 Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t

28 Tensor Product Representations Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫

29  Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations ⊗ Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) Depth 0 ⑫ ⑤ ⑪ ⑥ ⑦ ⑧ ⑨ ⑩ Depth 1 ① ② ③ ④ ① ② ③ ④

30  Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i ⊗ Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) ⑧ ⑦ ⑤ ⑥ Depth 1Depth 0 ① ② ③ ④ ① ② ③ ④ ⑫ ⑪ ⑩ ⑨

31 Local tree realizations Representations:

32 stopped

33 The ICS Isomorphism  Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks

34 Structuring operation Symbolic formalizationConnectionist formalization StructuresExample Vector operation CombiningSets {c 1, c 2 } c 1 + c 2 Vector sum: + Role/filler binding Strings, frames AB = { A / r 1, B / r 2 } A  r 1 + B  r 2 Tensor product:  Recursive embedding Trees A B C A  r 0 +[B  r 0 + C  r 1 ]  r 1 Recursive role vectors: r left/right-child(x) = r 0/1  r x Tensor Product Representations

35 Mental representations are defined by the activation values of connectionist units. When analyzed at a higher level, these representations are distributed patterns of activity — activation vectors. For core aspects of higher cognitive domains, these vectors realize symbolic structures. Such a symbolic structure s is defined by a collection of structural roles { r i } each of which may be occupied by a filler f i ; s is a set of constituents, each a filler/role binding f i / r i. The connectionist realization of s is an activity vector s = Σ i f i  r i In higher cognitive domains such as language and reasoning, mental representations are recursive: the fillers or roles of s have themselves the same type of internal structure as s. And these structured fillers f or roles r in turn have the same type of tensor product realization as s.

36 Representation: Combination

37 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ catkæt G

38 Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

39 Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t

40 Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

41 Constraints N O C ODA : A syllable has no coda [Maori] W * H ( a [ σ k [æ t]] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation σ k tæ ‘cat’

42 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ catkæt G

43 NEXT LECTURE: HG, OT The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  Constraint Interaction ?? A ƒ catkæt G


Download ppt "Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural."

Similar presentations


Ads by Google