Download presentation
Presentation is loading. Please wait.
Published byDarleen Blankenship Modified over 9 years ago
1
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols & optimization in neural networks 2.Optimization in grammar: HG OT From numerical to algebraic optimization in grammar 3.OT and nativism The initial state & neural/genomic encoding of UG 4.?
2
The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) In higher cognitive domains, representations and fuctions are well approximated by symbolic computation The Connectionist Hypothesis is correct Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation
3
Levels
4
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt. constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Satisfaction Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G σ k tæ
5
Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t
6
⊗ Role vectors: r ε = (1; 0 0) r 0 = (0; 1 1) r 1 = (0; 1 1) Representations: Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i ⑧ ⑦ ⑤ ⑥ Depth 1Depth 0 ① ② ③ ④ ① ② ③ ④⑫ ⑪ ⑩ ⑨ ⊗ Role vectors: r ε = (1; 0 0) r 0 = (0; 1 1) r 1 = (0; 1 1) ⊗
7
Local tree realizations Representations:
8
The ICS Isomorphism Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks
9
Structuring operation Symbolic formalizationConnectionist formalization StructuresExample Vector operation CombiningSets {c 1, c 2 } c 1 + c 2 Vector sum: + Role/filler binding Strings, frames AB = { A / r 1, B / r 2 } A r 1 + B r 2 Tensor product: Recursive embedding Trees A B C A r 0 +[B r 0 + C r 1 ] r 1 Recursive role vectors: r left/right-child(x) = r 0/1 r x Tensor Product Representations
10
Formal Role Filler Binding by Synchrony = s= r 1 [ f book + f give-obj ] + r 3 [ f Mary + f recipient ] + r 2 [ f giver + f John ] time = recipient giver give-obj John Mary book give(John, book, Mary) (Shastri & Ajjanagadde 1993) r 1 [ f book + f give-obj ] [Tesar & Smolensky 1994]
11
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G
12
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
13
Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t
14
Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints? Harmony maximization is satisfaction of parallel, violable constraints
15
Constraints N O C ODA : A syllable has no coda [Maori/French/English] W * H ( a [ σ k [æ t]] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation σ k tæ ‘cat’
16
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G
17
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal: opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G Constraint Interaction ??
18
Constraint Interaction I ICS Grammatical theory –Harmonic Grammar Legendre, Miyata, Smolensky 1990 et seq.
19
= = H Constraint Interaction I σ k tæ H = The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths Any formal language can be so generated. H( k, σ ) H( σ, t ) O NSET Onset/k > 0 N O C ODA Coda/t < 0
20
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H : opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ Constraint Interaction I: HG A ƒ kæt[ σ k[æt]] G
21
Harmonic Grammar Parser Simple, comprehensible network Simple grammar G –X → A B Y → B A Language Processing : Completion A BB A X Y A BB A X Y Top-down A BB A Bottom-up X Y
22
The ICS Architecture Representation ? Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H : opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G
23
Simple Network Parser Fully self-connected, symmetric network Like previously shown network … W … Except with 12 units; representations and connections shown below
24
Harmonic Grammar Parser Weight matrix for Y → B A H(Y, — A) > 0 H(Y, B — ) > 0
25
Harmonic Grammar Parser Weight matrix for X → A B
26
Harmonic Grammar Parser Weight matrix for entire grammar G
27
A BB A X Y Bottom-up Processing
28
X Y A BB A Top-down Processing
29
Scaling up Not yet … Still conceptual obstacles to surmount
30
Explaining Productivity Approaching full-scale parsing of formal languages by neural-network Harmony maximization Have other networks (like PassiveNet) that provably compute recursive functions ! productive competence How to explain?
31
1. Structured representations
32
+ 2. Structured connections
33
= Proof of Productivity Productive behavior follows mathematically from combining –the combinatorial structure of the vectorial representations encoding inputs & outputs and –the combinatorial structure of the weight matrices encoding knowledge
34
Explaining Productivity I + + Intra-level decomposition: [A B] ⇝ {A, B} Inter-level decomposition: [A B] ⇝ {1,0, 1,…,1} Processes PSA ICS Functions Semantics PSA & ICS
35
Explaining Productivity II Intra-level decomposition: G ⇝ { X AB, Y BA } Functions Semantics Processes PSA Inter-level decomposition: W ( G ) ⇝ {1,0, 1,0;…} Processes ICS ICS & PSA +
36
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H : opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G
37
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H : opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ Constraint Interaction II A ƒ kæt[ σ k[æt]] G
38
Constraint Interaction II: OT ICS Grammatical theory –Optimality Theory Prince & Smolensky 1991, 1993/2004
39
Constraint Interaction II: OT Differential strength encoded in strict domination hierarchies ( ≫ ) : –Every constraint has complete priority over all lower-ranked constraints (combined) –Approximate numerical encoding employs special (exponentially growing) weights –“Grammars can’t count”
40
Constraint Interaction II: OT “Grammars can’t count” Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man
41
Constraint Interaction II: OT Differential strength encoded in strict domination hierarchies ( ≫ ) Constraints are universal ( Con ) Candidate outputs are universal ( Gen ) Human grammars differ only in how these constraints are ranked –‘factorial typology’ First true contender for a formal theory of cross-linguistic typology 1 st innovation of OT: constraint ranking 2 nd innovation: ‘Faithfulness’
42
ŋg ≻ ŋb, ŋd velar nd ≻ md, ŋd coronal The Faithfulness/Markedness Dialectic ‘cat’: /kat/ kæt *N O C ODA — why? –F AITHFULNESS requires pronunciation = lexical form –M ARKEDNESS often opposes it Markedness-Faithfulness dialectic diversity – English: F AITH ≫ N O C ODA – Polynesian: N O C ODA ≫ F AITH (~French) Another markedness constraint M: –Nasal Place Agreement [‘Assimilation’] (NPA): mb ≻ nb, ŋ b labial
43
The ICS Architecture Representation ? Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal ≫ : opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ Constraint Interaction II: OT A ƒ kæt[ σ k[æt]] G
44
Optimality Theory Diversity of contributions to theoretical linguistics –Phonology & phonetics –Syntax –Semantics & pragmatics –… e.g., following lectures. Now: Can strict domination be explained by connectionism?
45
Case study Syllabification in Berber Plan –Data, then: OT grammar Harmonic Grammar Network
46
Syllabification in Berber Dell & Elmedlaoui, 1985: Imdlawn Tashlhit Berber Syllable nucleus can be any segment But driven by universal preference for nuclei to be highest-sonority segments
47
Berber syllable nuclei have maximal sonority Segment Class Example Segments ρ Sonority son (ρ) Berber Examples voiceless stops t, k 1.ra.t K.ti. voiced stops d, b, g 2.b D.dL..ma.ra.t G t. voiceless fricatives s, f, x 3.t F.tKt..t X.zNt. voiced fricatives z, γ 4.tx Z.nakk w. nasals n, m 5.tz M t..t M.z . liquids l, r 6.t R.g L t. high vocoids i/y, u/w 7.rat.l u lt.. i l.d i. low vowel a 8.tR.b a.
48
/txznt/O NSET H NUC a. .tX.zNt. n x b..tXz.nT. x! t c..txZ.Nt. *!*! n z OT Grammar: Brbr OT H NUC A syllable nucleus is sonorous O NSET A syllable has an onset Prince & Smolensky ’93/04 Strict Domination
49
Harmonic Grammar: Brbr HG H NUC A syllable nucleus is sonorous Nucleus of sonority s : Harmony = 2 s 1 s {1, 2, …, 8} ~ { t, d, f, z, n, l, i, a } O NSET *VV Harmony = 2 8 Theorem. The global Harmony maxima are the correct Berber core syllabifications [of Dell & Elmedlaoui; no sonority plateaux, as in OT analysis, here & henceforth]
50
BrbrNet realizes Brbr HG O NSET H NUC
51
BrbrNet’s Global Harmony Maximum is the correct parse Contrasts with Goldsmith’s Dynamic Linear Models (Goldsmith & Larson ’90; Prince ’93) For a given input string, a state of BrbrNet is a global Harmony maximum if and only if it realizes the syllabification produced by the serial Dell-Elmedlaoui algorithm
52
BrbrNet’s Search Dynamics Greedy local optimization –at each moment, make a small change of state so as to maximally increase Harmony –( gradient ascent : mountain climbing in fog) –guaranteed to construct a local maximum
53
/txznt/ tx.znt ‘you sing stored’ H txznt
54
The Hardest Case: 12378 /t.bx.ya* * hypothetical, but compare t. bx.la. kk w ‘she even behaved as a miser’ [ tbx.lakk w ]
55
V V V V Subsymbolic Parsing V V.b.b tx.i.i a V V
56
Parsing sonority profile 8121345787 a.tb.kf.zn.yay 81213 45787 Finds best of infinitely many representations: 1024 corners/parses
57
BrbrNet has many Local Harmony Maxima An output pattern in BrbrNet is a local Harmony maximum if and only if it realizes a sequence of legal Berber syllables (i.e., an output of Gen ) That is, every activation value is 0 or 1, and the sequence of values is that realizing a sequence of substrings taken from the syllable inventory {CV, CVC, #V, #VC}, where C = 0, V = 1 and # = word edge Greedy optimization avoids local maxima: why?
58
HG OT’s Strict Domination Strict Domination: Baffling from a connectionist perspective? Explicable from a connectionist perspective ? – Exponential BrbrNet escapes local H maxima – Linear BrbrNet does not
59
Linear BrbrNet makes errors (~ Goldsmith-Larson network) Error: /12378/ . 1 2 3. 7 8. (correct:. 1. 2 3. 7 8.)
60
Subsymbolic Harmony optimization can be stochastic The search for an optimal state can employ randomness Equations for units’ activation values have random terms – pr ( a ) ∝ e H ( a )/ T – T (‘temperature’) ~ randomness 0 during search –Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986) Can guarantee computation of global optimum in principle In practice: how fast? Exponential vs. linear BrbrNet
61
Stochastic BrbrNet: Exponential can succeed ‘fast’ 5-run average
62
Stochastic BrbrNet : Linear can’t succeed ‘fast’ 5-run average
63
Stochastic BrbrNet (Linear) 5-run average
64
The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal ≫ : opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.