Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols.

Similar presentations


Presentation on theme: "Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols."— Presentation transcript:

1 Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols & optimization in neural networks 2.Optimization in grammar: HG  OT From numerical to algebraic optimization in grammar 3.OT and nativism The initial state & neural/genomic encoding of UG 4.?

2 The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) In higher cognitive domains, representations and fuctions are well approximated by symbolic computation The Connectionist Hypothesis is correct Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation

3 Levels

4 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt. constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Satisfaction Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ kæt[ σ k[æt]] G  σ k tæ

5 Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t

6 ⊗ Role vectors: r ε = (1; 0 0) r 0 = (0; 1 1) r 1 = (0; 1  1)  Representations: Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ Tensor Product Representations Filler vectors: A, B, X, Y i, j, k ∊ {A, B, X, Y} j k i ⑧ ⑦ ⑤ ⑥ Depth 1Depth 0 ① ② ③ ④ ① ② ③ ④⑫ ⑪ ⑩ ⑨ ⊗ Role vectors: r ε = (1; 0 0) r 0 = (0; 1 1) r 1 = (0; 1  1) ⊗

7 Local tree realizations  Representations:

8 The ICS Isomorphism  Aux by V A P Passive V PA LF Output Agent B DCFbyby EG Patie nt Au x F G B DC Patie nt Input W Tensor product representationsTensorial networks

9 Structuring operation Symbolic formalizationConnectionist formalization StructuresExample Vector operation CombiningSets {c 1, c 2 } c 1 + c 2 Vector sum: + Role/filler binding Strings, frames AB = { A / r 1, B / r 2 } A  r 1 + B  r 2 Tensor product:  Recursive embedding Trees A B C A  r 0 +[B  r 0 + C  r 1 ]  r 1 Recursive role vectors: r left/right-child(x) = r 0/1  r x Tensor Product Representations

10  Formal Role Filler Binding by Synchrony =  s= r 1  [ f book + f give-obj ] + r 3  [ f Mary + f recipient ] + r 2  [ f giver + f John ] time = recipient giver give-obj John Mary book give(John, book, Mary) (Shastri & Ajjanagadde 1993) r 1  [ f book + f give-obj ] [Tesar & Smolensky 1994]

11 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ kæt[ σ k[æt]] G 

12 Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

13 Representation σ k tæ σ /r ε σ k /r 0 k [ σ k [æ t]] æ /r 01 æ t /r 11 t

14 Two Fundamental Questions 2.What are the constraints? Knowledge representation Prior question: 1.What are the activation patterns — data structures — mental representations — evaluated by these constraints?  Harmony maximization is satisfaction of parallel, violable constraints

15 Constraints N O C ODA : A syllable has no coda [Maori/French/English] W * H ( a [ σ k [æ t]] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation σ k tæ ‘cat’

16 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ kæt[ σ k[æt]] G 

17 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal:  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ kæt[ σ k[æt]] G  Constraint Interaction ??

18 Constraint Interaction I ICS  Grammatical theory –Harmonic Grammar Legendre, Miyata, Smolensky 1990 et seq.

19 = = H Constraint Interaction I σ k tæ H = The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths Any formal language can be so generated. H( k, σ ) H( σ, t ) O NSET Onset/k > 0 N O C ODA Coda/t < 0

20 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H :  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  Constraint Interaction I: HG A ƒ kæt[ σ k[æt]] G  

21 Harmonic Grammar Parser Simple, comprehensible network Simple grammar G –X → A B Y → B A Language Processing : Completion A BB A X Y A BB A X Y Top-down A BB A Bottom-up X Y

22 The ICS Architecture Representation ? Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H :  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ  A ƒ kæt[ σ k[æt]] G 

23 Simple Network Parser Fully self-connected, symmetric network Like previously shown network … W … Except with 12 units; representations and connections shown below

24 Harmonic Grammar Parser Weight matrix for Y → B A H(Y, — A) > 0 H(Y, B — ) > 0

25 Harmonic Grammar Parser Weight matrix for X → A B

26 Harmonic Grammar Parser Weight matrix for entire grammar G

27 A BB A X Y Bottom-up Processing

28 X Y A BB A Top-down Processing

29 Scaling up Not yet … Still conceptual obstacles to surmount

30 Explaining Productivity Approaching full-scale parsing of formal languages by neural-network Harmony maximization Have other networks (like PassiveNet) that provably compute recursive functions !  productive competence How to explain?

31 1. Structured representations

32 + 2. Structured connections

33 = Proof of Productivity Productive behavior follows mathematically from combining –the combinatorial structure of the vectorial representations encoding inputs & outputs and –the combinatorial structure of the weight matrices encoding knowledge

34 Explaining Productivity I + + Intra-level decomposition: [A B] ⇝ {A, B} Inter-level decomposition: [A B] ⇝ {1,0,  1,…,1} Processes PSA ICS Functions Semantics PSA & ICS

35 Explaining Productivity II Intra-level decomposition: G ⇝ { X  AB, Y  BA } Functions Semantics Processes PSA Inter-level decomposition: W ( G ) ⇝ {1,0,  1,0;…} Processes ICS ICS & PSA +

36 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H :  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G 

37 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal H :  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ Constraint Interaction II A ƒ kæt[ σ k[æt]] G 

38 Constraint Interaction II: OT ICS  Grammatical theory –Optimality Theory Prince & Smolensky 1991, 1993/2004

39 Constraint Interaction II: OT Differential strength encoded in strict domination hierarchies ( ≫ ) : –Every constraint has complete priority over all lower-ranked constraints (combined) –Approximate numerical encoding employs special (exponentially growing) weights –“Grammars can’t count”

40 Constraint Interaction II: OT “Grammars can’t count”  Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man

41 Constraint Interaction II: OT Differential strength encoded in strict domination hierarchies ( ≫ ) Constraints are universal ( Con ) Candidate outputs are universal ( Gen ) Human grammars differ only in how these constraints are ranked –‘factorial typology’ First true contender for a formal theory of cross-linguistic typology 1 st innovation of OT: constraint ranking 2 nd innovation: ‘Faithfulness’

42 ŋg ≻ ŋb, ŋd velar nd ≻ md, ŋd coronal The Faithfulness/Markedness Dialectic ‘cat’: /kat/  kæt *N O C ODA — why? –F AITHFULNESS requires pronunciation = lexical form –M ARKEDNESS often opposes it Markedness-Faithfulness dialectic  diversity – English: F AITH ≫ N O C ODA – Polynesian: N O C ODA ≫ F AITH (~French) Another markedness constraint M: –Nasal Place Agreement [‘Assimilation’] (NPA): mb ≻ nb, ŋ b labial

43 The ICS Architecture Representation ? Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal ≫ :  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ Constraint Interaction II: OT A ƒ kæt[ σ k[æt]] G 

44 Optimality Theory Diversity of contributions to theoretical linguistics –Phonology & phonetics –Syntax –Semantics & pragmatics –… e.g., following lectures. Now: Can strict domination be explained by connectionism?

45 Case study Syllabification in Berber Plan –Data, then: OT grammar Harmonic Grammar Network

46 Syllabification in Berber Dell & Elmedlaoui, 1985: Imdlawn Tashlhit Berber Syllable nucleus can be any segment But driven by universal preference for nuclei to be highest-sonority segments

47 Berber syllable nuclei have maximal sonority Segment Class Example Segments ρ Sonority son (ρ) Berber Examples voiceless stops t, k 1.ra.t K.ti. voiced stops d, b, g 2.b D.dL..ma.ra.t G t. voiceless fricatives s, f, x 3.t F.tKt..t X.zNt. voiced fricatives z, γ 4.tx Z.nakk w. nasals n, m 5.tz M t..t M.z . liquids l, r 6.t R.g L t. high vocoids i/y, u/w 7.rat.l u lt.. i l.d i. low vowel a 8.tR.b a.

48 /txznt/O NSET H NUC a. .tX.zNt. n x b..tXz.nT. x! t c..txZ.Nt. *!*! n z OT Grammar: Brbr OT H NUC A syllable nucleus is sonorous O NSET A syllable has an onset Prince & Smolensky ’93/04 Strict Domination

49 Harmonic Grammar: Brbr HG H NUC A syllable nucleus is sonorous Nucleus of sonority s : Harmony = 2 s  1 s  {1, 2, …, 8} ~ { t, d, f, z, n, l, i, a } O NSET *VV Harmony =  2 8 Theorem. The global Harmony maxima are the correct Berber core syllabifications [of Dell & Elmedlaoui; no sonority plateaux, as in OT analysis, here & henceforth]

50 BrbrNet realizes Brbr HG O NSET H NUC

51 BrbrNet’s Global Harmony Maximum is the correct parse Contrasts with Goldsmith’s Dynamic Linear Models (Goldsmith & Larson ’90; Prince ’93) For a given input string, a state of BrbrNet is a global Harmony maximum if and only if it realizes the syllabification produced by the serial Dell-Elmedlaoui algorithm

52 BrbrNet’s Search Dynamics Greedy local optimization –at each moment, make a small change of state so as to maximally increase Harmony –( gradient ascent : mountain climbing in fog) –guaranteed to construct a local maximum

53 /txznt/  tx.znt ‘you sing stored’ H txznt

54 The Hardest Case: 12378 /t.bx.ya* * hypothetical, but compare t. bx.la. kk w ‘she even behaved as a miser’ [ tbx.lakk w ]

55 V V V V Subsymbolic Parsing V V.b.b tx.i.i a V V

56 Parsing sonority profile 8121345787 a.tb.kf.zn.yay 81213 45787 Finds best of infinitely many representations: 1024 corners/parses

57 BrbrNet has many Local Harmony Maxima An output pattern in BrbrNet is a local Harmony maximum if and only if it realizes a sequence of legal Berber syllables (i.e., an output of Gen ) That is, every activation value is 0 or 1, and the sequence of values is that realizing a sequence of substrings taken from the syllable inventory {CV, CVC, #V, #VC}, where C = 0, V = 1 and # = word edge Greedy optimization avoids local maxima: why?

58 HG  OT’s Strict Domination Strict Domination: Baffling from a connectionist perspective? Explicable from a connectionist perspective ? – Exponential BrbrNet escapes local H maxima – Linear BrbrNet does not

59 Linear BrbrNet makes errors (~ Goldsmith-Larson network) Error: /12378/ . 1 2 3. 7 8. (correct:. 1. 2 3. 7 8.)

60 Subsymbolic Harmony optimization can be stochastic The search for an optimal state can employ randomness Equations for units’ activation values have random terms – pr ( a ) ∝ e H ( a )/ T – T (‘temperature’) ~ randomness  0 during search –Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986) Can guarantee computation of global optimum in principle In practice: how fast? Exponential vs. linear BrbrNet

61 Stochastic BrbrNet: Exponential can succeed ‘fast’ 5-run average

62 Stochastic BrbrNet : Linear can’t succeed ‘fast’ 5-run average

63 Stochastic BrbrNet (Linear) 5-run average

64 The ICS Architecture Representation Grammar G Function ƒAlgorithm A Constraints: N O C ODA optimal ≫ :  opt.constr. sat. : Activation Pattern Connection Weights Harmony Opt./ Constraint Sat. Spreading Activation σ k tæ σ k tæ σ k tæ A ƒ kæt[ σ k[æt]] G 


Download ppt "Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols."

Similar presentations


Ads by Google