Formal Typology: Explanation in Optimality Theory Paul Smolensky Cognitive Science Department Johns Hopkins University Géraldine Legendre Donald Mathis Melanie Soderstrom Alan Prince Suzanne Stevenson Peter Jusczyk † with:
Advertisement Blackwell 2002 (??) Develop the Integrated Connectionist/Symbolic (ICS) Cognitive Architecture Apply to the theory of grammar The Harmonic Mind: From neural computation to optimality-theoretic grammar Paul Smolensky & Géraldine Legendre
Chomsky 1988 “1.What is the system of knowledge? 2.How does this system of knowledge arise in the mind/brain? 3.How is this knowledge put to use? 4.What are the physical mechanisms that serve as the material basis for this system of knowledge and for the use of this knowledge?” (p. 3)
Responsibilities of Grammatical Theory Chomsky’s “Big 4” questions concerning knowledge of grammar Structure Acquisition Processing Neuro-genetics Nativist hypothesis OT ① ① ② ③ ④ Not new to Chomsky or generative grammar …
Jakobson’s Program Linguistic theory is not just for theoretical linguists The same principles that explain formal cross-linguistic and language-internal distributional patterns can also explain Acquisition Processing Neurological breakdown
Jakobson’s Program Markedness enables a Grand Unified Theory for the cognitive science of language: Avoid α ① Structure Inventories lack α Alternations eliminate α ② Acquisition α is acquired late ③ Processing α is processed poorly ④ Neural Brain damage most easily disrupts α
Talk Plan ① Structure ② Acquisition ③ Processing ④ Neuro- genetics OT Explanation Formal result(s) Jakobson’s program Question Achieves goal Empirical insights
Responsibilities of Grammatical Theory Chomsky’s “Big 4” questions concerning knowledge of grammar Structure Structure of UG: Captured in a general formalism for grammars and their variation OT ① ⇒ Possible strong version – Explanatory Goal ① : Analysis of phenomenon Φ in language L Universal typology of phenomenon Φ ① Inherent typology Acquisition Processing Neuro-genetics
From Markedness to OT Formalizing markedness ⋯ OT Markedness constraints Faithfulness constraints Competition Strict domination Strong universality & Richness of the Base
Structure: Formal Result Formalizing Markedness: Two Problems Goal: Change epiphenomenal explanatory status of markedness Markedness “explains grammars (e.g., rules)”; informal commentary about grammar vs. Markedness IS grammar: markedness- grammars formally determine languages
Structure: Formal Result Formalizing Markedness: Two Problems Problem 1: Multidimensional integration Each dimension of linguistic structure independently has its own marked pole, but how do these dimensions combine? Turns out to be related to another fundamental problem:
Structure: Formal Result Formalizing Markedness: Two Problems “α is marked” ⇝ “Avoid α” But when & how does “avoidance” happen? Problem 2: Pervasive variability in “avoidance” Inventories: If [θ] is absent in French “because it is marked” how can it be present in English “despite being marked”? ¿The grammar of every language turns on or off: “No α ” = *α — a markedness constraint. OT: More subtle version that also solves : Alternations: If in environment E, α β “because α is more marked than β”, how do we explain that in E α ̷ β “even though” α is more marked than β?
Structure: Formal Result Formalizing Markedness Most crudely: Why aren’t unmarked elements always avoided? Something must oppose markedness forces. Markedness cannot be the sole basis of a formal grammatical theory: it is only one half of the complete story.
Structure: Formal Result The Great Dialectic Phonological representations serve two masters Phonological Representation Lexico n Phonetic s Phonetic interface [surface form] Often: ‘minimize effort (motoric & cognitive) ’; ‘maximize discriminability’ Locked in eternal conflict Lexical interface /underlying form/ ‘be this invariant form’ F AITHFULNESS M ARKEDNESS
Structure: Formal Result The Core Constraints of Con M ARKEDNESS : *α (“minimize effort; maximize distinctiveness”) “constraint *α Con” α meets empirical criteria for ‘marked’ Freedom? Empirically constrained by universal patterns F AITHFULNESS (“ be this invariant form ”) : /input/ [output] is the identity map, i.e., elements /x/ and [x] are in one-to-one correspondence and identical ( McCarthy & Prince ’95) Constraints: M AX (x), D EP (x), I DENT (x), … Essentially determined by elements {x} of representation Freedom? Representations — as always: empirically constrained to allow statement of markedness constraints ¿ “In OT you can invent any constraint you want” ?
Structure: Formal Result Conflict Dialectic: M ARK vs. F AITH conflict Why aren’t marked elements always avoided? Because sometimes M ARK is over-ruled by F AITH Why aren’t words always pronounced in their invariant, lexical form? Because sometimes F AITH is over-ruled by M ARK 1 over-rules ( dominates ) 2 : 1 ≫ 2 Whether M gets violated (whether marked elements fail to ‘be avoided’) varies by Language (in some, M ≫ F ; in others, F ≫ M ) Context (in some, M ≫ F 2 ; in others F 1 ≫ M )
Structure: Formal Result Conflict Dialectic: M ARK vs. F AITH conflict Whether M gets violated (whether marked elements fail to ‘be avoided’) varies by Language (in some, M ≫ F ; in others, F ≫ M ) Context (in some, M ≫ F 2 ; in others F 1 ≫ M ) Why is there cross-linguistic variation? Phonetic Lexical ~ M ARK F AITH Dialectic gets resolved differently Typology by re-ranking: Factorial Typology {possible human languages} {rankings of Con } (n constraints give n ! rankings — many are equivalent)
Structure: Formal Result Formalizing Markedness Problem 1: ‘Avoidance of the marked’ is pervasively variable; exactly where does marked material appear? Solution: Constraint ranking — M ARK w.r.t. F AITH Will now see this also solves: Problem 2: Multidimensional markedness Solution: single constraint ranking for all constraints in a given language
Structure: Formal Result Formalizing Markedness Markedness is multidimensional Each dimension has its universally marked pole How do dimensions combine? ( M 1, * M 2 ) vs. (* M 1, M 2 ) CVC.CV ( S TRESS H EAVY, * M AIN S TRESS R IGHT ) vs. CVC.CV Integrate via a common markedness currency: Harmony Numerical: * M 1 = 3.2; * M 2 = 2.8 Symbolic: * M 1 absolutely worse than * M 2 see below OT: For a given language, there is a single constraint ranking for all constraints Strict domination hierarchy: markedness on higher- ranked constraints can never be compensated for by unmarkedness on lower-ranked ones
Structure: Formal Result Competition for Optimality Given an input, an OT grammar does not provide a procedure for how to construct the output — bur rather a description of the output: the structure that best-satisfies the constraint ranking Best-satisfies is a comparative criterion; outputs compete and the grammar identifies the winner: the optimal — grammatical — highest Harmony — output for that input
Structure: Formal Result Harmonic Competition Numerical Harmony Stress is on the initial heavy syllable iff the number of light syllables n obeys Pathological grammars “Grammars can’t count” ´ ´
´ ´ Structure: Formal Result Harmonic Competition Symbolic Harmony: Strict domination S TRESS H EAVY ≫ M AIN S TRESS R IGHT Stress the initial heavy syllable Stress the final syllable M AIN S TRESS R IGHT ≫ S TRESS H EAVY ´ ´ ´ Strict domination “Grammars can’t count”
Structure: Formal Result OT: ‘Formal’ definition Gen: Specifies candidate outputs for any given input Con: The constraint set A grammar: A hierarchical ranking of Con H-Eval: Given two candidates and a ranking, a formal definition employing strict domination of which has higher Harmony — which better-satisfies the ranking I O mapping : I The maximal-Harmony candidate[s] in Gen ( I )
Structure: Formal Result Richness of the Base Universality: All systematic cross-linguistic variation arises from differences in constraint ranking Therefore: Con is universal; H-Eval is universal Gen is universal, including the space of possible inputs as well as possible outputs i.e. : No systematic cross-linguistic variation is due to differences in inputs e.g. : Languages with no surface codas cannot get this property from limitations on the lexicon (e.g., a morpheme structure constraint *C wd ]) — but rather from the ranking i.e. : The grammar must have the property that even if there were C- final inputs, there would still be no surface codas
Aside Richness of the Base is a principle for inducing a grammar (generalizing) from a set of grammatical items It can be justified by the central principle of John Goldsmith’s presentation: Maximize the probability of the data
Structure: Conceptual “Question” Explanatory Power “OT is as unexplanatory as extrinsically-ordered rule-theory” Stipulating ranking ~ stipulating ordering Actually, OT achieves Explanatory Goal ①, Inherent Typology : In the analysis of phenomenon Φ in one language is inherent a typology of Φ in all languages Structure: Explanatory Goal Inherent Typology
Structure: Conceptual “Question” Analytic Restrictiveness “You can make up any constraint you want in OT ” Actually, in OT, positing in the analysis of a language L necessarily has a huge number of empirically falsifiable implications (one consequence of Inherent Typology) E.g., Two pervasive patterns generated by ‘ Con’ Structure: Explanatory Goal Robust Falsifiability
Structure: Explanatory Goal Consequences of ‘ Con ’ – I: The Subordination Pattern E.g., = N O C ODA Recall: If ‘No codas’ is in UG, why do codas ever appear? Conflict With faithfulness constraints With other markedness constraints – other dimensions of markedness Cross-linguistic variation: codas are less and less restricted as N O C ODA is subordinated to more and more conflicting constraints (i.e., dimensions of markedness)
Structure: Empirical Application Subordination Pattern: Codas No codas at all Codas only in stressed syllables … + Geminate codas Codas unrestricted … except prohibited inter-vocalically [~V.CV~] S TRESS - TO -W EIGHT M AX μ M AX N O C ODA
Structure: Conceptual “Question” Multiplicity of Constraints For second pervasive pattern generated by ‘ Con’: “Any framework which leads to the morass of constraints found in OT analyses in phonology cannot possibly be explanatorily adequate.” Actually, OT interaction-via-domination replaces many rules by fewer constraints Structure: Explanatory Goal Factorial Interaction
Structure: Explanatory Goal Consequences of ‘ Con ’ – II: Factorial Interaction ‘Factorial interaction’: with varying interaction (re-ranking), n simple modular constraints correspond to Multiplicity of rules (many more than n ) Complex, non-modular rules Rules + representational/notational tricks Rules + constraints E.g., = N O C ODA
Structure: Empirical Application Factorial Interaction: Codas Consider Con {M AX } ↪ {M AX, D EP } Number of constraints increases by 1 Number of corresponding rules doubles as set of ‘repairs’ now includes epenthesis as well as deletion: N O C ODA ≫ M AX ~ C Ø/— σ ] ↪ N O C ODA ≫ D EP ~ Ø V/C σ ]— O NSET ≫ M AX ~ V Ø/[ σ — ↪ O NSET ≫ D EP ~ Ø C/[ σ —V
Structure: Empirical Application Factorial Interaction: Codas M ARKEDNESS ≫ F AITHFULNESS M ARKEDNESS N O C ODA O NSET F AITHFUL- NESS M AX C Ø/— σ ]V Ø/[ σ — D EP Ø V/C σ ]—Ø C/[ σ —V In general, the number of comparable rules increases much faster than the number of constraints
Structure: Explanatory Goal Consequences of ‘ Con ’ – II: Factorial Interaction ‘Factorial interaction’: with varying interaction (re-ranking), n simple modular constraints correspond to Multiplicity of rules (many more than n ) Complex, non-modular rules Rules + representational/notational tricks Rules + constraints E.g., = N O C ODA
Structure: Empirical Application Factorial Interaction: Codas S TRESS - TO -W EIGHT ≫ N O C ODA Codas only in stressed syllables C Ø/— σ ̆ ] segmental rule sensitive to foot structure [‘non-modular rules’] A NCHOR-R ≫ N O C ODA Codas only word-finally C Ø/— σ ] plus final-C extrametricality [‘representational trick’] M AX μ ≫ N O C ODA Only geminate codas — /C μ / C Ø/— σ ] plus Hayes’ exclusivity of association [‘notational trick’]
Structure: Empirical Application Factorial Interaction S TRESS - TO -W EIGHT ≫ N O C ODA Codas only in stressed syllables S TRESS - TO -W EIGHT ≫ * C μ Geminates only after stressed V μ Ø/— σ ̆ ] A NCHOR-R ≫ N O C ODA Codas only word-finally A NCHOR-R ≫ *[+voi, son] Obstruent devoicing except word-finally [+voi] [ voi]/[—, son] plus ?? to block word-finally M AX μ ≫ N O C ODA Only geminate codas; / C μ / M AX μ ≫ W EIGHT-TO -S TRESS Geminates are the only codas in unstressed syllables C Ø/— σ ̆ ] plus exclusivity of association
Structure: Jakobson’s Program Markedness + Faithfulness = Harmony In summary: Jakobson’s key insight concerning linguistic structure: the central organizing principle of grammar is: Minimize Markedness OT formalizes this as Maximize Harmony OT formalizes Markedness via violable constraints OT adds the crucial notion of Faithfulness – the other (lexical) half of the phonological dialectic OT Harmony combines Markedness with Faithfulness; their conflict is adjudicated via ranking Ranking unifies multiple dimensions of markedness
Structure: Summary OT achieves the explanatory goals of Changing the epiphenomenal status of markedness in grammatical theory: markedness is now in grammar, not about grammar A strongly universalist formalism exhibiting Inherent Typology Robust falsifiability
Responsibilities of Grammatical Theory Chomsky’s “Big 4” questions concerning knowledge of grammar Structure Acquisition Processing Neuro-genetics Nativist hypothesis OT ① ① ② Possible strong version – Explanatory Goal ② : ⇒ ② General Learning Theory Substantive structure ( ① ) of a UG module governing phenomenon Φ Acquisition theory — initial state, learning algorithm — for phenomenon Φ
Acquisition: Formal Result I Learning Theory Learning algorithm Provably correct and efficient (when part of a general decomposition of the grammar learning problem) Sources: Tesar 1995 et seq. Tesar & Smolensky 1993, …, 2000* * See for how to exploit the analogy to ‘weighted OT’ (Goldsmith, today) If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E
in + possible Candidates Faith Mark (NPA) ☹ ☞ E☹ ☞ E i np ossible * A i m possible * Faith * ☺ ☞☺ ☞ If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E Correctly handles difficult case: multiple violations in E Acquisition: Formal Result I Constraint Demotion Algorithm
Acquisition: Conceptual “Question” Large Grammar Space “Huge number of grammars” — “OT is too unrestrictive” Acquisition: Explanatory Goal General Learning Theory Actually, OT achieves Explanatory Goal ② : General Learning Theory: A theory-general, UG-informed learning algorithm, provably correct and efficient (under strong assumptions)
Acquisition: Formal Result II Learnability & the Initial State M ≫ F is learnable with /in+possible/→impossible ‘not’ = in- except when followed by … “exception that proves the rule”: M = NPA M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp : M and F, no M vs. F conflict, no evidence for their ranking Thus must have M ≫ F in the initial state, ℌ 0
Acquisition: Empirical Application Initial State: Experimental Test Collaborators Peter Jusczyk Theresa Allocco (Elliott Moreton, Karen Arnold) Here, only a thumbnail sketch (more in the OT Workshop Thursday)
Acquisition: Empirical Application Initial State: Experimental Test Linking hypothesis: More harmonic phonological stimuli ⇒ Longer listening time More harmonic: M ≻ * M, when equal on F F ≻ * F, when equal on M When must chose one or the other, more harmonic to satisfy M: M ≫ F M = Nasal Place Assimilation (NPA)
4.5 Months (NPA) Higher HarmonyLower Harmony um…ber… umber um…ber… iŋgu p =.006 (11/16) Acquisition: Empirical Application
Higher HarmonyLower Harmony um…ber…u mb erun…ber…u nb er p =.044 (11/16) 4.5 Months (NPA) Acquisition: Empirical Application
4.5 Months (NPA) Markedness * Faithfulness * Markedness Faithfulness u n …ber…u mb eru n …ber…u nb er ??? Acquisition: Empirical Application
4.5 Months (NPA) Higher HarmonyLower Harmony u n …ber…u mb eru n …ber…u nb er p =.001 (12/16) Acquisition: Empirical Application
Acquisition: Jakobson’s Program Markedness = Distance from Initial State X is universally more marked than Y ~ In addition to the constraints M 1, M 2, …, M k violated by Y, X also violates markedness constraints M 1, M 2, …, M n Y will be acquired – become admitted into the child’s inventory – after M 1, M 2, … M n are all demoted below relevant faithfulness constraints These demotions are all necessary for X to be acquired, and additional demotions of M 1, M 2, …, M n are also required ~ X will require more time to be acquired
Responsibilities of Grammatical Theory Chomsky’s “Big 4” questions concerning knowledge of grammar Structure Acquisition Processing Neuro-genetics Nativist hypothesis OT ① ① ② ③ Possible strong version – Explanatory Goal ③ : ⇒ ③ General Processing Theory Substantive structure ( ① ) of a UG module governing phenomenon Φ Processing theory — e.g., parsing algorithm — for phenomenon Φ
Processing: Formal Results Context-Free Parsing Algorithm Theorem (Tesar 1994, 1995b, a, 1996). Suppose Gen parses a string of input symbols into structures specified via a context-free grammar Con constraints meet a tree-locality condition and penalize empty structure Then a given dynamic programming algorithm is Left-to-right General ( any such Gen, Con ) Guaranteed to find the optimal outputs As efficient as parsers for conventional context-free grammars.
Processing: Formal Results Finite-State Parsing Algorithm Theorem (Ellison 1994). Suppose Gen ( I ) is representable as a (non-deterministic) finite- state transducer (particular to I ) mapping the input string to a set of output candidates Con constraints are reducible to multiply-violable binary constraints each representable as a finite-state transducer mapping an output candidate to a sequence of violation marks Then composing the Gen ( I ) and rank-sequenced constraint-transducers yields a transducer that Directly maps I to its optimal outputs Can be efficiently pruned by dynamic programming
Processing: Formal Results Complexity of Violable Constraints Theorem (Frank and Satta 1998). Suppose Gen is representable as a (non-deterministic) finite-state transducer mapping an input string to a set of output candidates Con: the set of structures incurring n violations of each constraint is generable by a finite-state machine, and n can be finitely bounded for each constraint Then the mapping from inputs to optimal outputs has the complexity of a finite-state transducer. Theorem (Hiller 1996, Smolensky 1997). If n is unbounded there are (extremely simple) OT grammars with greater computational complexity.
Processing: Conceptual “Question” Processing (Symbolic): Theory “Infinite candidate set uncomputable” Actually, achieves Explanatory Goal ③ (computational) Processing: Conceptual “Question” Processing (Symbolic): Theory ⇒ ③ General Processing Theory Substantive structure ( ① ) of a UG module governing phenomenon Φ Processing theory — e.g., parsing algorithm — for phenomenon Φ
Processing: Empirical Application Sentence Processing Because an OT grammar assigns a parse to any input, no additional principles (e.g., ‘parsing heuristics’) are needed for parsing the initial, incomplete segment of a sentence Linking hypothesis: Processing difficulty arises when previously established structure needs to be abandoned in the face of further input
Processing: Empirical Application PP Attachment The servant of the actress who… (Cuetos & Mitchell 88) [Assuming who is ambiguous for Case.] Violates: *N OM, L OCALITY 2 Violates: *N OM, A GR C ASE Violates: *G EN who [+nom] NPPP PNP of the actress [+gen] the servant who [+nom] who [+gen] L OCALITY: If XP c-commands YP, then XP precedes YP. A GR C ASE: A relative pronoun must agree in Case with the modified NP. *C ASE : *G EN ≫ *D AT ≫ *A CC ≫ *N OM (universal)
Processing: Empirical Application PP Attachment The servant of the actress who… (Cuetos & Mitchell 88) If *G EN, A GR C ASE ≫ L OCALITY 2, then : attach high If L OCALITY 2 ≫ *G EN or A GR C ASE, then or : attach low NPPP PNP who [+nom] who [+gen] Violates: *N OM, L OCALITY 2 Violates: *N OM, A GR C ASE Violates: *G EN of the actress [+gen] the servant
Processing: Empirical Application PP Attachment Preliminary result: A cross-linguistic typology of PP attachment patterns (across differences in case and embedding depth) Empirically promising, but not perfect Unclear yet how rankings determining parsing preferences relate to rankings in the pure ‘competence grammar’
Processing: Jakobson’s Program Processing and Markedness Phonological analogy: Incrementally parse C…V…C… / C / [ C ] /C V / [C V ] /CV C / [CV] [C ] Now ‘expect’ a V … if get it, no ‘reanalysis’ But if get a C, need reanalysis difficulty: /CVC C / [CV C] [ C ] Processing marked material (coda C) creates difficulty because it is initially analyzed as unmarked (as an onset)
Processing: Conceptual “Question” Processing (Symbolic): Theory “OT not psychologically plausible” Actually, achieves Explanatory Goal ③ (empirical perspective): a competence theory automatically entails an empirically fruitful performance (processing) theory Processing: Conceptual “Question” Processing (Symbolic): Theory
Responsibilities of Grammatical Theory Chomsky’s “Big 4” questions concerning knowledge of grammar Structure Acquisition Processing Neuro-genetics Nativist hypothesis OT ① ① ② ③ ④ Possible strong version – Explanatory Goal ④ : ⇒ ④ General Biological Realization Substantive structure ( ① ) of a UG module M Neural network instantiating M (nativism: with genetic encoding)
Neuro-genetics: Formal Results Neural Representations ( Gen ) k/r 0 æ/r 01 t/r 11 σ/r ε [ σ k [æ t]] σ k tæ
OT & Connectionism OT derives from the numerical formalism, derived from connectionist Harmony maximization, of Harmonic Grammar (Legendre, Miyata, & Smolensky, 1990)
Neuro-genetics: Formal Results Neural Constraints ( Con ) N O C ODA : A syllable has no coda σ k tæ * violation W * H ( a [ σ k [æ t] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation
Neuro-genetics: Formal Results UGenome for CV Theory The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device ¿ Proteins ⇝ Universal grammatical principles ? Case study: Basic CV Syllable Theory Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’ Collaborators Melanie Soderstrom Donald Mathis
Neuro-genetics: Formal Results Network Architecture /C 1 C 2 / [C 1 V C 2 ] C V /C 1 C 2 / [ C 1 V C 2 ]
Neuro-genetics: Formal Results P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33 All connection coefficients are +2
Neuro-genetics: Formal Results O NSET All connection coefficients are 1 C V
Neuro-genetics: Formal Results Connectivity geometry Assume 3-d grid geometry V C ‘E’ ‘N’ ‘back’
Neuro-genetics: Formal Results Constraint: P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33 Input units grow south and connect Output units grow east and connect Correspondence units grow north & west and connect with input & output units.
Neuro-genetics: Formal Results Connectivity Genome Contributions from O NSET and P ARSE : Source: CICI VIVI COCO VOVOC VCVC xoxo Projec- tions : S LC C S L V C E L C C E L V C N&S S V O N S x 0 N L C I W L C O N L V I W L V O S S V O Key: DirectionExtentTarget N(orth) S(outh) E(ast) W(est) F(ront) B(ack) L(ong) S(hort)Input: C I V I Output: C O V O x (0) Corr: V C C C
Φ Ψ Neuro-genetics: Formal Results Processing [P 1 ] ∝ s 1
Φ Ψ Neuro-genetics: Formal Results Learning (during phase P + ; reverse during P )
Neuro-genetics: Formal Results Learning Behavior A simplified system can be solved analytically Learning algorithm turns out to ≈ s i ( ) = [# violations of constraint i P ]
Conclusion OT is enabling progress on several explanatory goals for linguistic theory Inherent typology General learning theory General processing theory ¯ General biological realization Thank you for your attention Often, OT formalizes Jakobson’s program