Presentation is loading. Please wait.

Presentation is loading. Please wait.

Markedness Optimization in Grammar and Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald.

Similar presentations


Presentation on theme: "Markedness Optimization in Grammar and Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald."— Presentation transcript:

1

2 Markedness Optimization in Grammar and Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald Mathis Melanie Soderstrom Géraldine Legendre Alan Prince Peter Jusczyk Suzanne Stevenson with:

3 Grammar and Cognition 1.What is the system of knowledge? 2.How does this system of knowledge arise in the mind/brain? 3.How is this knowledge put to use? 4.What are the physical mechanisms that serve as the material basis for this system of knowledge and for the use of this knowledge? (Chomsky ‘88; p. 3)

4 A Grand Unified Theory for the cognitive science of language is enabled by Markedness : Avoid α ① Structure Alternations eliminate α Typology: Inventories lack α ② Acquisition α is acquired late ③ Processing α is processed poorly ④ Neural Brain damage most easily disrupts α Jakobson’s Program Formalize through OT? OT ① ③ ④ ②

5 Advertisement The complete story, forthcoming (2003) Blackwell: The harmonic mind: From neural computation to optimality-theoretic grammar Smolensky & Legendre Overview

6 StructureAcquisitionUseNeural Realization  Theoretical. OT (Prince & Smolensky ’91, ’93) : –Construct formal grammars directly from markedness principles –General formalism/ framework for grammars: phonology, syntax, semantics; GB/LFG/… –Strongly universalist: inherent typology  Empirical. OT: –Allows completely formal markedness- based explanation of highly complex data

7 Theoretical Formal structure enables OT-general: – Learning algorithms Constraint Demotion : Provably correct and efficient (when part of a general decomposition of the grammar learning problem) – Tesar 1995 et seq. –Tesar & Smolensky 1993, …, 2000 Gradual Learning Algorithm – Boersma 1998 et seq. StructureAcquisitionUseNeural Realization ® Initial state  Empirical –Initial state predictions explored through behavioral experiments with infants

8 StructureAcquisitionUseNeural Realization Theoretical –Theorems regarding the computational complexity of algorithms for processing with OT grammars Tesar ’94 et seq. Ellison ’94 Eisner ’97 et seq. Frank & Satta ’98 Karttunen ’98 Empirical (with Suzanne Stevenson ) –Typical sentence processing theory: heuristic constraints –OT: output for every input; enables incremental (word-by-word) processing –Empirical results concerning human sentence processing difficulties can be explained with OT grammars employing independently motivated syntactic constraints –The competence theory [OT grammar] is the performance theory [human parsing heuristics]

9 Empirical StructureAcquisitionUseNeural Realization Theoretical OT derives from the theory of abstract neural (connectionist) networks –via Harmonic Grammar ( Legendre, Miyata, Smolensky ’90) For moderate complexity, now have general formalisms for realizing –complex symbol structures as distributed patterns of activity over abstract neurons –structure-sensitive constraints/rules as distributed patterns of strengths of abstract synaptic connections –optimization of Harmony  Construction of a miniature, concrete LAD

10 Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

11 Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

12  The Great Dialectic Phonological representations serve two masters Phonological Representation Lexico n Phonetic s Phonetic interface [surface form] Often: ‘minimize effort (motoric & cognitive) ’; ‘maximize discriminability’ Locked in conflict Lexical interface /underlying form/ Recoverability: ‘match this invariant form’ F AITHFULNESS M ARKEDNESS

13 OT from Markedness Theory M ARKEDNESS constraints: *α: No α F AITHFULNESS constraints – F α demands that /input/  [output] leave α unchanged (McCarthy & Prince ’95) – F α controls when α is avoided (and how ) Interaction of violable constraints: Ranking –α is avoided when *α ≫ F α –α is tolerated when F α ≫ *α – M 1 ≫ M 2 : combines multiple markedness dimensions

14 OT from Markedness Theory M ARKEDNESS constraints: *α F AITHFULNESS constraints: F α Interaction of violable constraints: Ranking –α is avoided when *α ≫ F α –α is tolerated when F α ≫ *α – M 1 ≫ M 2 : combines multiple markedness dimensions Typology: All cross-linguistic variation results from differences in ranking – in how the dialectic is resolved (and in how multiple markedness dimensions are combined)

15 OT from Markedness Theory M ARKEDNESS constraints F AITHFULNESS constraints Interaction of violable constraints: Ranking Typology: All cross-linguistic variation results from differences in ranking – in resolution of the dialectic Harmony = M ARKEDNESS + F AITHFULNESS –A formally viable successor to Minimize Markedness is OT’s Maximize Harmony (among competitors)

16  Structure Explanatory goals achieved by OT Individual grammars are literally and formally constructed directly from universal markedness principles Inherent Typology : Within the analysis of phenomenon Φ in language L is inherent a typology of Φ across all languages

17 Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

18 Markedness and Inventories Theoretical part An inventory structured by markedness An inventory I is harmonically complete (HC) iff x  I and y is (strictly) less marked than x implies y  Iy  I A typology structured by markedness A typology T is strongly Harmonically complete (SHarC) iff L  T if and only if L is harmonically complete (Prince & Smolensky ’93: Ch. 9) Are OT inventories harmonically complete? Are OT typologies SHarC?

19 Harmonic Completeness English obstruent inventory is HC w.r.t. Place/ continuancy 11  * *[velar] 22  +  velar   tk * + sx *[+cont]  cont … but is not generable by ranking { *[velar], *[+cont]; F Place, F cont } Inventory Bans Only the Worst O f the Worst (BOWOW)

20 Local conjunction: *[+cont] & seg *[velar] violated when both violated in same segment Local Conjunction Crucial to distinguish * [taxi]  [saki] * x w.r.t segment inventory: *[+cont], *[velar] fatal in same segment *[+cont], *[velar]  [saki]

21 Basic Inventories/Typologies Formal analysis of HC/SHarC in OT: Definitions Basic inventory I [Φ] of elements of type T, where Φ = {φ k } Candidates : { X } = { [  φ 1,  φ 2,  φ 3,  φ 4, …] } Con :M ARK = { *[+φ 1 ], *[  φ 2 ], … } F AITH = { F [φ 1 ], F [φ 2 ], … } I [Φ]: a ranking of Con Basic typology T [Φ]: All rankings of Con Basic typology w/ Local Conjunction, T LC [Φ]: All rankings of Con LC = Con + all conjunctions of constraints in M ARK, local to T

22 SHarC Theorem T [Φ]: each language is HC SHarC property does not hold T LC [Φ]: each language is HC SHarC property holds

23 Empirical Relevance Empirical part Local conjunction has seen many empirical applications; here, vowel harmony Lango (Nilotic, Uganda) ATR harmony –Woock & Noonan = 79 –Archangeli & Pulleyblank ‘91 et seq., esp. = 94 Markedness: – *[+ ATR,  hi/fr] – *[  ATR, +hi/fr] –*[+ A ]/σ closed –HD-L[ ATR ] Rather than imposing a parametric superstructure on spreading rules (A&P ’94), we build the grammar directly from these markedness constraints marked articulatorily

24 Lango ATR Harmony Inventory of ATR domains D [ ATR ] (~ tiers) Vowel harmony renders many possibilities ungrammatical ’your SING/PLUR stew’: d  ̀ k +C í  *d  ̀ k k í  dè kk í * d  ̀ kk  ATR :  + [  ] [ + ][ + + 0 ] [  0  ] d  ̀ k+w ú   d  ̀ kwú *dèkwú*d  ̀ kw  ́́ critical difference: i [ + fr] vs. u [  fr] [  fr] ‘worse’ source for [+ ATR ] spread violates *[+ ATR,  fr] — marked w.r.t. ATR Complex system: interaction of 6 dimensions (2 6 = 64 distinct environments)

25

26 d  ̀ k +C í  d è kk í

27 d  ̀ k+w ú  d  ̀ kw ú

28

29 The Challenge Need a grammatical framework able to handle this nightmarish descriptive complexity while staying strictly within the confines of rigidly universal principles

30 Lang o rules Archangeli & Pulleyblank ‘94  ATR  rules: α β  ATR V C V  ATR V (C)C V   ATR rules: a b c  ATR V C V  hi  ATR V (C)C V  hi  ATR V (C)C V  hi  fr  ATR  rule: x  ATR V (C)C V  hi  fr

31

32

33 *[+ A ]/σ closed & D [ A ] *[  hi,+ A ]/H D [ A ] “No [  ATR ] spread into a closed syllable from a [  hi] source ” *[+cont] & seg *[velar]

34 BOWOW *[  hi,  A ] & H D -L [  A ] “No regressive [  ATR ] spread from a [  hi] source ” 33

35 X,Y,Z: *[  A ] 1,2,3: * [+ A ] ≫ A GREE ≫ F [ A ]

36 The Challenge Need a grammatical framework able to handle this nightmarish descriptive complexity while staying strictly within the confines of rigidly universal principles

37 Inherent Typology Method applicable to related African languages, where the same markedness constraints govern the inventory (Archangeli & Pulleyblank ’94), but with different interactions: different rankings and active conjunctions Part of a larger typology including a range of vowel harmony systems

38  Structure: Summary OT builds formal grammars directly from markedness: M ARK, with F AITH Inventories consistent with markedness relations are formally the result of OT with local conjunction: T LC [Φ], SHarC theorem Even highly complex patterns can be explained purely with simple markedness constraints: all complexity is in constraints’ interaction through ranking and conjunction: Lango ATR harmony

39 Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

40 The Initial State OT-general: M ARKEDNESS ≫ F AITHFULNESS  Learnability demands (Richness of the Base) (Alan Prince, p.c., ’93; Smolensky ’96a)  Child production: restricted to the unmarked  Child comprehension: not so restricted (Smolensky ’96b)

41  Experimental Exploration of the Initial State Collaborators: Peter Jusczyk Theresa Allocco Language Acquisition 2002 Karen ArnoldElliott Moreton in progress Grammar at 4.5 months?

42 X / Y / XY paradigm (P. Jusczyk) un...b ...umb  Experimental Paradigm p =.006 um...b ...umb  um...b ...iŋgu iŋ…..gu...iŋgu vs. iŋ…..gu…umb … ∃ F AITH Headturn Preference Procedure (Kemler Nelson et al. ‘95; Jusczyk ‘97) Highly general paradigm: Main result ℜ * F NP

43 Linking Hypothesis Experimental results challenging to explain Suppose stimuli A and B differ w.r.t. φ. Child: M ARK [φ] ≫ F AITH [φ] (‘M ≫ F’). Then: If A is consistent with M ≫ F and B is consistent with F ≫ M then ‘prefer’ (attend longer to) A: ‘A > B’ M ARK [φ] = Nasal Place Agreement

44 Experimental Results If A is consistent with M ≫ F and B is consistent with F ≫ M then ‘prefer’ (attend longer to) A: ‘A > B’ m+b  mb n+b  nb n+b  mb M ≫ F? yes (+) no A F ≫ M? yes (  ) no B > > > p <.05 ∃ M ARK p <.001 nb  mb; M ≫ F p <.05 n  m detectable n+b  nd   p >.40 / n+b /: nd ≺ UG mb p >.30 *UG  unreliability

45 Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

46 A LAD for OT Acquisition: Hypothesis: Universals are genetically encoded, learning is search among UG- permitted grammars. Question: Is this even possible ? Collaborators: Melanie SoderstromDonald Mathis

47 UGenomics The game: Take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device ¿ Proteins ⇝ Universal grammatical principles ? Time to willingly suspend disbelief …

48 UGenomics The game: Take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device ¿ Proteins ⇝ Universal grammatical principles ? Case study: Basic CV Syllable Theory (Prince & Smolensky ’93) Innovation: Introduce a new level, an ‘abstract genome’ notion parallel to [and encoding] ‘abstract neural network’

49 UGenome for CV Theory Three levels –Abstract symbolic:Basic CV Theory –Abstract neural: CVNet –Abstract genomic: CVGenome

50 UGenomics: Symbolic Level Three levels – Abstract symbolic:Basic CV Theory –Abstract neural: CVNet –Abstract genomic: CVGenome

51 Basic syllabification: Function Basic CV Syllable Structure Theory –‘Basic’ — No more than one segment per syllable position:.(C)V(C). ƒ: /underlying form/  [surface form] /CVCC/  [.CV.C V C.] /pæd+d/  [pæd  d] Correspondence Theory –McCarthy & Prince 1995 (‘M&P’) /C 1 V 2 C 3 C 4 /  [.C 1 V 2.C 3 V C 4 ]

52 P ARSE : Every element in the input corresponds to an element in the output O NSET : No V without a preceding C etc. Syllabification: Constraints (Con)

53 UGenomics: Neural Level Three levels –Abstract symbolic:Basic CV Theory – Abstract neural: CVNet –Abstract genomic: CVGenome

54 CVNet Architecture /C 1 C 2 /  [C 1 V C 2 ] C V / C 1 C 2 / [ C 1 V C 2 ] ‘1’ ‘2’

55 Connections: P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33 All connection coefficients are +2

56 Connections: O NSET All connection coefficients are  1 C V

57 CVNet Dynamics Boltzmann machine/Harmony network –Hinton & Sejnowski ’83 et seq. ; Smolensky ‘83 et seq. –stochastic activation-spreading algorithm: higher Harmony  more probable –CVNet innovation: connections realize fixed symbol-level constraints with variable strengths –learning: modification of Boltzmann machine algorithm to new architecture

58 UGenomics: Genome Level Three levels –Abstract symbolic:Basic CV Theory –Abstract neural: CVNet – Abstract genomic: CVGenome

59 Connectivity geometry Assume 3-d grid geometry (e.g., gradients) V C ‘E’ ‘N’ ‘back’

60 Correspondence units grow north & west and connect with input & output units. Output units grow east and connect Connectivity: P ARSE Input units grow south and connect

61 C V Connectivity: O NSET x 0 segment: | S S V O | N S x 0 V O segment: N&S S V O

62 Connectivity Genome Contributions from O NSET and P ARSE : Source: CICI VIVI COCO VOVOC VCVC xoxo Projec- tions : S LC C S L V C E L C C E L V C N&S S V O N S x 0 N L C I W L C O N L V I W L V O S S V O  Key: DirectionExtentTarget N(orth) S(outh) E(ast) W(est) F(ront) B(ack) L(ong) S(hort)Input: C I V I Output: C O V O x (0) Corr: V C C C

63 CVGenome: Connectivity

64 C-C: C ORRESPOND : Abstract Gene Map General Developmental MachineryConnectivityConstraint Coefficients S L C C S L V C F S V C N/E L C C &V C S/W L C C &V C directionextenttarget C-I: V-I: G  C O &V&x B 1 C C &V C B  2 C C C I &C O 1V C V I &V O 1  R ESPOND : G 

65 CVGenome: Connection Coefficients

66 UGenomics Realization of processing and learning algorithms in ‘abstract molecular biology’, using the types of interactions known to be biologically possible and genetically encodable

67 UGenomics Host of questions to address –Will this really work? –Can it be generalized to distributed nets? –Is the number of genes [77=0.26%] plausible? –Are the mechanisms truly biologically plausible? –Is it evolvable?  How is strict domination to be handled?

68 Hopeful Conclusion Progress is possible toward a Grand Unified Theory of the cognitive science of language –addressing the structure, acquisition, use, and neural realization of knowledge of language –strongly governed by universal grammar –with markedness as the unifying principle –as formalized in Optimality Theory at the symbolic level –and realized via Harmony Theory in abstract neural nets which are potentially encodable genetically

69 € Thank you for your attention (and indulgence) Hopeful Conclusion Progress is possible toward a Grand Unified Theory of the cognitive science of language Still lots of promissory notes, but all in a common currency — Harmony ≈ unmarkedness; hopefully this will promote further progress by facilitating integration of the sub-disciplines of cognitive science


Download ppt "Markedness Optimization in Grammar and Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald."

Similar presentations


Ads by Google