Jakobson's Grand Unified Theory of Linguistic Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold.

Slides:

Advertisements

Similar presentations

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel

Advertisements

Optimality Theory Presented by Ashour Abdulaziz, Eric Dodson, Jessica Hanson, and Teresa Li.

Summer 2011 Tuesday, 8/ No supposition seems to me more natural than that there is no process in the brain correlated with associating or with.

1 Language Transfer Lan-Hsin Chang National Kaohsiung University of Applied Sciences.

Krashen’s “monitor model” The acquisition-learning hypothesis The monitor hypothesis The natural order hypothesis The input hypothesis The affective.

Second Language Acquisition

Chapter 4 Key Concepts.

Contrastive Analysis, Error Analysis, Interlanguage

18 and 24-month-olds use syntactic knowledge of functional categories for determining meaning and reference Yarden Kedar Marianella Casasola Barbara Lust.

FIRST LANGUAGE ACQUISITION

Computation and representation Joe Lau. Overview of lecture What is computation? Brief history Computational explanations in cognitive science Levels.

Introduction: The Chomskian Perspective on Language Study.

Gestural overlap and self-organizing phonological contrasts Contrast in Phonology, University of Toronto May 3-5, 2002 Alexei Kochetov Haskins Laboratories/

PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”

Language Special form of communication in which we learn complex rules to manipulate symbols that can be used to generate an endless number of meaningful.

Psych 56L/ Ling 51: Acquisition of Language Lecture 8 Phonological Development III.

Introduction to Cognitive Science Sept 2005 :: Lecture #1 :: Joe Lau :: Philosophy HKU.

COGNITIVE NEUROSCIENCE

Models of Human Performance Dr. Chris Baber. 2 Objectives Introduce theory-based models for predicting human performance Introduce competence-based models.

Psycholinguistics 12 Language Acquisition. Three variables of language acquisition Environmental Cognitive Innate.

January 24-25, 2003Workshop on Markedness and the Lexicon1 On the Priority of Markedness Paul Smolensky Cognitive Science Department Johns Hopkins University.

Second Language Acquisition and Real World Applications Alessandro Benati (Director of CAROLE, University of Greenwich, UK) Making.

Applying Multi-Criteria Optimisation to Develop Cognitive Models Peter Lane University of Hertfordshire Fernand Gobet Brunel University.

Lecture 1 Introduction: Linguistic Theory and Theories

Generative Grammar(Part ii)

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

2 nd lecture.  Stages of child’s intellectual development : Birth -2 sensorimotor 2-7 preoperational 7-16 Concrete operational:7-11 Formal operational:

January 24-25, 2003Workshop on Markedness and the Lexicon1  Empirical Relevance Local conjunction has seen many empirical applications; here, vowel harmony.

[kmpjuteynl] [fownldi]

X Language Acquisition

Markedness Optimization in Grammar and Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald.

Evolution of Universal Grammar Pia Göser Universität Tübingen Seminar: Sprachevolution Dozent: Prof. Jäger

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural.

 The origin of grammatical rules is ascribed to an innate system in the human brain.  The knowledge of and competence for human language is acquired.

Theories of First Language Acquisition

Formal Typology: Explanation in Optimality Theory Paul Smolensky Cognitive Science Department Johns Hopkins University Géraldine Legendre Donald Mathis.

Harmonic Ascent  Getting better all the time Timestamp: Jul 25, 2005.

Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Conceptual Modelling and Hypothesis Formation Research Methods CPE 401 / 6002 / 6003 Professor Will Zimmerman.

1 New Y. X. Zhong Chinese Association for AI (CAAI) University of Posts & Telecom, Beijing -- to The Celebration of The 50 th Anniversary.

Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols.

May 7, 2003University of Amsterdam1 Markedness in Acquisition Is there evidence for innate markedness- based bias in language processing? Look to see whether.

The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk.

PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.

The Minimalist Program

The phonology of Hakka zero- initials Raung-fu Chung Southern Taiwan University 2011, 05, 29, Cheng Da.

Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal.

SIMULATIONS, REALIZATIONS, AND THEORIES OF LIFE H. H. PATTEE (1989) By Hyojung Seo Dept. of Psychology.

A Psycholinguistic Perspective on Child Phonology Sharon Peperkamp Emmanuel Dupoux Laboratoire de Sciences Cognitives et Psycholinguistique, EHESS-CNRS,

Linguistics as a Model for the Cognitive Approaches in Biblical Studies Tamás Biró SBL, London, 4 July 2011.

Model-based learning: Theory and an application to sequence learning P.O. Box 49, 1525, Budapest, Hungary Zoltán Somogyvári.

Principles Rules or Constraints

Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.

Bridging the gap between L2 speech perception research and phonological theory Paola Escudero & Paul Boersma (March 2002) Presented by Paola Escudero.

The Language of Thought : Part II Joe Lau Philosophy HKU.

COURSE AND SYLLABUS DESIGN

Optimality Theory. Linguistic theory in the 1990s... and beyond!

Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.

Cognitive Modeling Cogs 4961, Cogs 6967 Psyc 4510 CSCI 4960 Mike Schoelles

Sub-fields of computer science. Sub-fields of computer science.

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

Biointelligence Laboratory, Seoul National University

Theories of Language Development

Today Review: “Knowing a Language” Complete chapter 1

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

Cognitive models linguistic physical and device architectural

Quaid –e- azam university

First Language Acquisition

Chapter 3 Interlanguage.

Presentation transcript:

Jakobson's Grand Unified Theory of Linguistic Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald Mathis Melanie Soderstrom Géraldine Legendre Alan Prince Peter Jusczyk Suzanne Stevenson with:

Grammar and Cognition 1.What is the system of knowledge? 2.How does this system of knowledge arise in the mind/brain? 3.How is this knowledge put to use? 4.What are the physical mechanisms that serve as the material basis for this system of knowledge and for the use of this knowledge? (Chomsky ‘88; p. 3)

Advertisement The complete story, forthcoming (2003) Blackwell: The harmonic mind: From neural computation to optimality-theoretic grammar Smolensky & Legendre

A Grand Unified Theory for the cognitive science of language is enabled by Markedness : Avoid α ① Structure Alternations eliminate α Typology: Inventories lack α ② Acquisition α is acquired late ③ Processing α is processed poorly ④ Neural Brain damage most easily disrupts α Jakobson’s Program Formalize through OT? OT ① ③ ④ ②

StructureAcquisitionUseNeural Realization  Theoretical. OT (Prince & Smolensky ’91, ’93) : –Construct formal grammars directly from markedness principles –General formalism/ framework for grammars: phonology, syntax, semantics; GB/LFG/… –Strongly universalist: inherent typology  Empirical. OT: –Allows completely formal markedness- based explanation of highly complex data /

Theoretical Formal structure enables OT-general: – Learning algorithms Constraint Demotion : Provably correct and efficient (when part of a general decomposition of the grammar learning problem) – Tesar 1995 et seq. –Tesar & Smolensky 1993, …, 2000 Gradual Learning Algorithm – Boersma 1998 et seq. StructureAcquisitionUseNeural Realization ® Initial state  Empirical –Initial state predictions explored through behavioral experiments with infants

StructureAcquisitionUseNeural Realization Theoretical –Theorems regarding the computational complexity of algorithms for processing with OT grammars Tesar ’94 et seq. Ellison ’94 Eisner ’97 et seq. Frank & Satta ’98 Karttunen ’98 Empirical (with Suzanne Stevenson ) –Typical sentence processing theory: heuristic constraints –OT: output for every input; enables incremental (word-by-word) processing –Empirical results concerning human sentence processing difficulties can be explained with OT grammars employing independently motivated syntactic constraints –The competence theory [OT grammar] is the performance theory [human parsing heuristics]

Empirical StructureAcquisitionUseNeural Realization Theoretical OT derives from the theory of abstract neural (connectionist) networks –via Harmonic Grammar ( Legendre, Miyata, Smolensky ’90) For moderate complexity, now have general formalisms for realizing –complex symbol structures as distributed patterns of activity over abstract neurons –structure-sensitive constraints/rules as distributed patterns of strengths of abstract synaptic connections –optimization of Harmony  Construction of a miniature, concrete LAD

Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

 The Great Dialectic Phonological representations serve two masters Phonological Representation Lexico n Phonetic s Phonetic interface [surface form] Often: ‘minimize effort (motoric & cognitive) ’; ‘maximize discriminability’ Locked in conflict Lexical interface /underlying form/ Recoverability: ‘match this invariant form’ F AITHFULNESS M ARKEDNESS

OT from Markedness Theory M ARKEDNESS constraints: *α: No α F AITHFULNESS constraints – F α demands that /input/  [output] leave α unchanged (McCarthy & Prince ’95) – F α controls when α is avoided (and how ) Interaction of violable constraints: Ranking –α is avoided when *α ≫ F α –α is tolerated when F α ≫ *α – M 1 ≫ M 2 : combines multiple markedness dimensions

OT from Markedness Theory M ARKEDNESS constraints: *α F AITHFULNESS constraints: F α Interaction of violable constraints: Ranking –α is avoided when *α ≫ F α –α is tolerated when F α ≫ *α – M 1 ≫ M 2 : combines multiple markedness dimensions Typology: All cross-linguistic variation results from differences in ranking – in how the dialectic is resolved (and in how multiple markedness dimensions are combined)

OT from Markedness Theory M ARKEDNESS constraints F AITHFULNESS constraints Interaction of violable constraints: Ranking Typology: All cross-linguistic variation results from differences in ranking – in resolution of the dialectic Harmony = M ARKEDNESS + F AITHFULNESS –A formally viable successor to Minimize Markedness is OT’s Maximize Harmony (among competitors)

 Structure Explanatory goals achieved by OT Individual grammars are literally and formally constructed directly from universal markedness principles Inherent Typology : Within the analysis of phenomenon Φ in language L is inherent a typology of Φ across all languages

Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data --- Friday Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

 Structure: Summary OT builds formal grammars directly from markedness: M ARK, with F AITH Friday: Inventories consistent with markedness relations are formally the result of OT with local conjunction Even highly complex patterns can be explained purely with simple markedness constraints: all complexity is in constraints’ interaction through ranking and conjunction: Lango ATR vowel harmony

Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

Nativism I: Learnability Learning algorithm – Provably correct and efficient (under strong assumptions) –Sources: Tesar 1995 et seq. Tesar & Smolensky 1993, …, 2000 –If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E

in + possible Candidates Faith Mark (NPA) ☹ ☞ E☹ ☞ E i np ossible * A i m possible * Faith * ☺ ☞☺ ☞ If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E Constraint Demotion Learning Correctly handles difficult case: multiple violations in E

Nativism I: Learnability M ≫ F is learnable with /in+possible/→impossible –‘not’ = in- except when followed by … –“exception that proves the rule, M = NPA” M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if lexicon produces only inputs with mp, never np : then  M and  F, no M vs. F conflict, no evidence for their ranking Thus must have M ≫ F in the initial state, ℌ 0

The Initial State OT-general: M ARKEDNESS ≫ F AITHFULNESS  Learnability demands (Richness of the Base) (Alan Prince, p.c., ’93; Smolensky ’96a)  Child production: restricted to the unmarked  Child comprehension: not so restricted (Smolensky ’96b)

Nativism II: Experimental Test  Collaborators  Peter Jusczyk  Theresa Allocco  Language Acquisition ( 2002)

Nativism II: Experimental Test Linking hypothesis: More harmonic phonological stimuli ⇒ Longer listening time More harmonic: –  M ≻ * M, when equal on F –  F ≻ * F, when equal on M –When must chose one or the other, more harmonic to satisfy M: M ≫ F M = Nasal Place Assimilation (NPA)

X / Y / XY paradigm (P. Jusczyk) un...b ...umb  Experimental Paradigm p =.006 um...b ...umb  um...b ...iŋgu iŋ…..gu...iŋgu vs. iŋ…..gu…umb … ∃ F AITH Headturn Preference Procedure (Kemler Nelson et al. ‘95; Jusczyk ‘97) Highly general paradigm: Main result ℜ * F NP

4.5 Months (NPA) Higher HarmonyLower Harmony um…ber… umber um…ber… iŋgu p =.006 (11/16)

Higher HarmonyLower Harmony um…ber…u mb erun…ber…u nb er p =.044 (11/16) 4.5 Months (NPA)

 Markedness * Faithfulness * Markedness  Faithfulness u n …ber…u mb eru n …ber…u nb er ???

4.5 Months (NPA) Higher HarmonyLower Harmony u n …ber…u mb eru n …ber…u nb er p =.001 (12/16)

Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal markedness-based explanation of highly complex data Acquisition  Initial state predictions explored through behavioral experiments with infants Neural Realization  Construction of a miniature, concrete LAD

The question The nativist hypothesis, central to generative linguistic theory: Grammatical principles respected by all human languages are encoded in the genome. Questions: –Evolutionary theory: How could this happen? –Empirical question: Did this happen? –Today: What — concretely — could it mean for a genome to encode innate knowledge of universal grammar?

UGenomics The game: Take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device ¿ Proteins ⇝ Universal grammatical principles ? Time to willingly suspend disbelief …

UGenomics The game: Take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device ¿ Proteins ⇝ Universal grammatical principles ? Case study: Basic CV Syllable Theory (Prince & Smolensky ’93) Innovation: Introduce a new level, an ‘abstract genome’ notion parallel to [and encoding] ‘abstract neural network’

GrammarInnate Constraints Abstract Neural NetworkAbstract Genome Biological Neural Network Biological Genome = A instantiates B = A encodes B Approach: Multiple Levels of Encoding

UGenome for CV Theory Three levels –Abstract symbolic:Basic CV Theory –Abstract neural: CVNet –Abstract genomic: CVGenome

UGenomics: Symbolic Level Three levels – Abstract symbolic:Basic CV Theory –Abstract neural: CVNet –Abstract genomic: CVGenome

GrammarInnate Constraints Abstract Neural NetworkAbstract Genome Biological Neural Network Biological Genome = A instantiates B = A encodes B Approach: Multiple Levels of Encoding

Basic syllabification: Function Basic CV Syllable Structure Theory –‘Basic’ — No more than one segment per syllable position:.(C)V(C). ƒ: /underlying form/  [surface form] /CVCC/  [.CV.C V C.] /pæd+d/  [pæd  d] Correspondence Theory –McCarthy & Prince 1995 (‘M&P’) /C 1 V 2 C 3 C 4 /  [.C 1 V 2.C 3 V C 4 ]

Why basic CV syllabification? ƒ: underlying  surface linguistic forms Forms simple but combinatorially productive Well-known universals; typical typology Mini-component of real natural language grammars A (perhaps the ) canonical model of universal grammar in OT

P ARSE : Every element in the input corresponds to an element in the output O NSET : No V without a preceding C etc. Syllabification: Constraints (Con)

UGenomics: Neural Level Three levels –Abstract symbolic:Basic CV Theory – Abstract neural: CVNet –Abstract genomic: CVGenome

GrammarInnate Constraints Abstract Neural NetworkAbstract Genome Biological Neural Network Biological Genome = A instantiates B = A encodes B Approach: Multiple Levels of Encoding

CVNet Architecture /C 1 C 2 /  [C 1 V C 2 ] C V / C 1 C 2 / [ C 1 V C 2 ] ‘1’ ‘2’

Connection substructure   Local: fixed, genetically determined Content of constraint  1 Global: variable during learning Strength of constraint  1 1 s1s1 2 i s2s2 Network weight: Network input: ι = W Ψ  a 

P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33 All connection coefficients are +2

O NSET All connection coefficients are  1 C V

Crucial Open Question (Truth in Advertising) Relation between strict domination and neural networks?

CVNet Dynamics Boltzmann machine/Harmony network –Hinton & Sejnowski ’83 et seq. ; Smolensky ‘83 et seq. –stochastic activation-spreading algorithm: higher Harmony  more probable –CVNet innovation: connections realize fixed symbol-level constraints with variable strengths –learning: modification of Boltzmann machine algorithm to new architecture

Learning Behavior A simplified system can be solved analytically Learning algorithm turns out to ≈  s i (  ) =  [# violations of constraint i P  ]

UGenomics: Genome Level Three levels –Abstract symbolic:Basic CV Theory –Abstract neural: CVNet – Abstract genomic: CVGenome

GrammarInnate Constraints Abstract Neural NetworkAbstract Genome Biological Neural Network Biological Genome = A instantiates B = A encodes B Approach: Multiple Levels of Encoding

Connectivity geometry Assume 3-d grid geometry V C ‘E’ ‘N’ ‘back’

C V O NSET x 0 segment: | S S V O | N S x 0 V O segment: N&S S V O

Correspondence units grow north & west and connect with input & output units. Output units grow east and connect Connectivity: P ARSE Input units grow south and connect

To be encoded How many different kinds of units are there? What information is necessary (from the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it? How are constraints initially specified? How are they maintained through the learning process?

Unit types Input unitsCV Output unitsCVx Correspondence unitsCV 7 distinct unit types Each represented in a distinct sub- region of the abstract genome ‘Help ourselves’ to implicit machinery to spell out these sub-regions as distinct cell types, located in grid as illustrated

Direction of projection growth Topographic organizations widely attested throughout neural structures –Activity-dependent growth a possible alternative Orientation information (axes) –Chemical gradients during development –Cell age a possible alternative

Projection parameters Direction Extent –Local –Non-local Target unit type Strength of connections encoded separately

Connectivity Genome Contributions from O NSET and P ARSE : Source: CICI VIVI COCO VOVOC VCVC xoxo Projec- tions : S LC C S L V C E L C C E L V C N&S S V O N S x 0 N L C I W L C O N L V I W L V O S S V O  Key: DirectionExtentTarget N(orth) S(outh) E(ast) W(est) F(ront) B(ack) L(ong) S(hort)Input: C I V I Output: C O V O x (0) Corr: V C C C

CVGenome: Connectivity

Encoding connection strength For each constraint  i, need to ‘embody’ –Constraint strength s i –Connection coefficients (Φ  Ψ cell types) Product of these is contribution of  i to the Φ  Ψ connection weight  Network-level specification —

Φ Ψ Processing [P 1 ] ∝ s 1

Φ Ψ Development

Φ Ψ Learning (during phase P + ; reverse during P  )

CVGenome: Connection Coefficients

C-C: C ORRESPOND : Abstract Gene Map General Developmental MachineryConnectivityConstraint Coefficients S L C C S L V C F S V C N/E L C C &V C S/W L C C &V C directionextenttarget C-I: V-I: G  C O &V&x B 1 C C &V C B  2 C C C I &C O 1V C V I &V O 1  R ESPOND : G 

UGenomics Realization of processing and learning algorithms in ‘abstract molecular biology’, using the types of interactions known to be biologically possible and genetically encodable

UGenomics Host of questions to address –Will this really work? –Can it be generalized to distributed nets? –Is the number of genes [77=0.26%] plausible? –Are the mechanisms truly biologically plausible? –Is it evolvable?  How is strict domination to be handled?

Hopeful Conclusion Progress is possible toward a Grand Unified Theory of the cognitive science of language –addressing the structure, acquisition, use, and neural realization of knowledge of language –strongly governed by universal grammar –with markedness as the unifying principle –as formalized in Optimality Theory at the symbolic level –and realized via Harmony Theory in abstract neural nets which are potentially encodable genetically

€ Thank you for your attention (and indulgence) Hopeful Conclusion Progress is possible toward a Grand Unified Theory of the cognitive science of language Still lots of promissory notes, but all in a common currency — Harmony ≈ unmarkedness; hopefully this will promote further progress by facilitating integration of the sub-disciplines of cognitive science