The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk.

Slides:

Advertisements

Similar presentations

Optimality Theory Presented by Ashour Abdulaziz, Eric Dodson, Jessica Hanson, and Teresa Li.

Advertisements

Chapter 12 cognitive models.

Summer 2011 Tuesday, 8/ No supposition seems to me more natural than that there is no process in the brain correlated with associating or with.

Second Language Acquisition

Chapter 4 Key Concepts.

Psych : Introduction to Parallel Distributed Processing Michael Harm, Professor Anthony Cate, TA.

Chapter Thirteen Conclusion: Where We Go From Here.

FIRST LANGUAGE ACQUISITION

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Computation and representation Joe Lau. Overview of lecture What is computation? Brief history Computational explanations in cognitive science Levels.

An Introduction to Artificial Intelligence Presented by : M. Eftekhari.

The Linguistics of SLA.

PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”

Introductory Lecture. What is Discrete Mathematics? Discrete mathematics is the part of mathematics devoted to the study of discrete (as opposed to continuous)

A Brief History of Artificial Intelligence

Autosegmental Phonology

Cognitive Processes PSY 334 Chapter 1 – The Science of Cognition.

Introduction to Cognitive Science Sept 2005 :: Lecture #1 :: Joe Lau :: Philosophy HKU.

COGNITIVE NEUROSCIENCE

PSY 369: Psycholinguistics Some basic linguistic theory part3.

COGN1001 Introduction to Cognitive Science Sept 2006 :: Lecture #1 :: Joe Lau :: Philosophy HKU.

Psycholinguistics 12 Language Acquisition. Three variables of language acquisition Environmental Cognitive Innate.

January 24-25, 2003Workshop on Markedness and the Lexicon1 On the Priority of Markedness Paul Smolensky Cognitive Science Department Johns Hopkins University.

Lecture 1 Introduction: Linguistic Theory and Theories

Generative Grammar(Part ii)

THEORIES OF MIND: AN INTRODUCTION TO COGNITIVE SCIENCE Jay Friedenberg and Gordon Silverman.

SCI Scientific Inquiry The Big Picture: Science, Technology, Engineering, etc.

X Language Acquisition

Jakobson's Grand Unified Theory of Linguistic Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold.

Markedness Optimization in Grammar and Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald.

SLA Seminar, NSYSU 11/17/2006 Ch. 9 Cognitive accounts of SLA OUTLINE Cognitive theory of language acquisition Models of cognitive accounts Implicit vs.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Theories of First Language Acquisition

Formal Typology: Explanation in Optimality Theory Paul Smolensky Cognitive Science Department Johns Hopkins University Géraldine Legendre Donald Mathis.

Discrete Structures for Computing

LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.

Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols.

Psycholinguistic aspects of interlanguage

May 7, 2003University of Amsterdam1 Markedness in Acquisition Is there evidence for innate markedness- based bias in language processing? Look to see whether.

Bain on Neural Networks and Connectionism Stephanie Rosenthal September 9, 2015.

PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.

The Minimalist Program

Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.

The phonology of Hakka zero- initials Raung-fu Chung Southern Taiwan University 2011, 05, 29, Cheng Da.

Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal.

Cognitive Science and Biomedical Informatics Department of Computer Sciences ALMAAREFA COLLEGES.

A Psycholinguistic Perspective on Child Phonology Sharon Peperkamp Emmanuel Dupoux Laboratoire de Sciences Cognitives et Psycholinguistique, EHESS-CNRS,

Linguistics as a Model for the Cognitive Approaches in Biblical Studies Tamás Biró SBL, London, 4 July 2011.

Levels of Linguistic Analysis

Principles Rules or Constraints

Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

The Language of Thought : Part II Joe Lau Philosophy HKU.

Optimality Theory. Linguistic theory in the 1990s... and beyond!

Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.

Mechanistic explanation and the integration of insights from the humanities and cognitive sciences Machiel Keestra, Institute for Interdisciplinary Studies,

Introductory Lecture. What is Discrete Mathematics? Discrete mathematics is the part of mathematics devoted to the study of discrete (as opposed to continuous)

What is cognitive psychology?

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

Biointelligence Laboratory, Seoul National University

Today Review: “Knowing a Language” Complete chapter 1

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Levels of Linguistic Analysis

Boltzmann Machine (BM) (§6.4)

Cognitive models linguistic physical and device architectural

The Network Approach: Mind as a Web

Chapter 12 cognitive models.

Presentation transcript:

The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk Donald Mathis Melanie Soderstrom with:

Personal Firsts thanks to SPP  First invited talk! (& first visit to JHU, 1986)  First public confessional: midnight thoughts of a worried connectionist (UNC, 1988)  First generative syntax talk (Memphis, 1994)  First attempt at stand-up comedy (Columbia, 2000)  First rendition of a 900-page book as a graphical synopsis in Powerpoint (1 minute from now)

Advertisement  Blackwell 2002 (??)  Develop the Integrated Connectionist/Symbolic (ICS) Cognitive Architecture  Case study in formalist multidisciplinary cognitive science The Harmonic Mind: From neural computation to optimality-theoretic grammar Paul Smolensky & Géraldine Legendre

Talk Plan  ‘Sketch’ the ICS cognitive architecture, pointing to contributions from/to traditional disciplines  Topics of direct philosophical relevance Explanation of the productivity of cognition Nativism  Theoretical work –Symbolic –Connectionist  Experimental work

Mystery Quote #1 “Smolensky has recently been spending a lot of his time trying to show that, vivid first impressions to the contrary notwithstanding, some sort of connectionist cognitive architecture can indeed account for compositionality, productivity, systematicity, and the like. It turns out to be rather a long story … 185 pages … are devoted to Smolensky’s telling of it, and there appears to be no end in sight. It seems it takes a lot of squeezing to get this stone to bleed.”

 Computational neuroscience  ICS  Key sources Hopfield 1982, 1984 Cohen and Grossberg 1983 Hinton and Sejnowski 1983, 1986 Smolensky 1983, 1986 Geman and Geman 1984 Golden 1986, 1988 a1a1 i 1 (0.6) a2a2 i 2 (0.5) –λ (–0.9) Processing I: Activation Processing — spreading activation — is optimization : Harmony maximization

–λ (–0.9) a1a1 i 1 (0.6) a2a2 i 2 (0.5) Processing II: Optimization a 1 must be active (strength: 0.6) 0.79 –0.21 Optimal compromise:  Key sources: Hinton & Anderson 1981 Rumelhart, McClelland, & the PDP Group 1986  Cognitive psychology  ICS a 2 must be active (strength: 0.5) Harmony maximization is satisfaction of parallel, violable constraints a 1 and a 2 must not be simultaneously active (strength: λ)

Representation  Symbolic theory  ICS Complex symbol structures Generative linguistics  ICS  Particular linguistic representations  PDP connectionism  ICS Distributed activation patterns  ICS: realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)

Representation k/r 0 æ/r 01 t/r 11 σ/r ε [ σ k [æ t]] σ k tæ

 Linguistics (markedness theory)  ICS  ICS  Generative linguistics: Optimality Theory  Key sources: Prince & Smolensky 1993 [ ms.; Rutgers report ] McCarthy & Prince 1993 [ ms. ] Texts : Archangeli & Langendoen 1997, Kager 1999, McCarthy 2001 Electronic archive: rutgers/ruccs/roa.html Constraints Met in SPP Debate, 1988!

Constraints N O C ODA : A syllable has no coda σ k tæ * violation W * H ( a [ σ k [æ t] ) = – s N O C ODA < 0 a [ σ k [æ t ]] * * violation

Constraint Interaction I  ICS  Grammatical theory Harmonic Grammar  Legendre, Miyata, Smolensky 1990 et seq.

= H = Constraint Interaction I σ k tæ H = H(k/, σ)H(k/, σ) H(σ,\t)H(σ,\t) O NSET Onset/k The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths N O C ODA Coda/t Any formal language can be so generated.

Harmonic Grammar Parser  Simple, comprehensible network  Simple grammar G X → A B Y → B A  Language Parsing A BB A X Y A BB A X Y Top-down A BB A Bottom-up X Y

Harmonic Grammar Parser  Representations: ⊗ Filler vectors: A, B, X, Y Role vectors: r ε = 1 r 0 = (1 1) r 1 = (1 –1) ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ i, j, k ∊ {A, B, X, Y} j k i Depth 0Depth 1

Harmonic Grammar Parser  Representations:

Harmonic Grammar Parser  Weight matrix for Y → B A H(Y, B — ) > 0 H(Y, — A) > 0

Harmonic Grammar Parser  Weight matrix for X → A B

Harmonic Grammar Parser  Weight matrix for entire grammar G

Bottom-up Parsing

Top-down Parsing

Explaining Productivity  Full-scale parsing of formal languages by neural-network Harmony maximization: productive competence  How to explain?

1. Structured representations

+ 2. Structured connections

= Proof of Productivity  Productive behavior follows mathematically from combining the combinatorial structure of the vectorial representations encoding inputs & outputs and the combinatorial structure of the weight matrices encoding knowledge

Mystery Quote #2 “Paul Smolensky has recently announced that the problem of explaining the compositionality of concepts within a connectionist framework is solved in principle. … This sounds suspiciously like the offer of a free lunch, and it turns out, upon examination, that there is nothing to it.”

Explaining Productivity I + + Intra-level decomposition: [A B]  {A, B} Inter-level decomposition: [A B]  {1,0,  1,…1} Semantics Processes GOFAI ICS ICS & GOFAI

Explaining Productivity II Intra-level decomposition: G  { X  AB, Y  BA } Inter-level decomposition: [A B]  {1,0,  1,…1} Semantics Processes GOFAI ICS ICS & GOFAI +

Mystery Quote #3  “ … even after all those pages, Smolensky hasn’t so much as made a start on constructing an alternative to the Classical account of the compositionality phenomena.”

Constraint Interaction II: OT  ICS  Grammatical theory Optimality Theory  Prince & Smolensky 1993

Constraint Interaction II: OT  Differential strength encoded in strict domination hierarchies : Every constraint has complete priority over all lower-ranked constraints (combined) = ‘Take-the-best’ heuristic (Hertwig, today)  constraint  cue  ranking  cue validity Decision-theoretic justification for OT? Approximate numerical encoding employs special (exponentially growing) weights

Constraint Interaction II: OT  “Grammars can’t count”  Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man

Constraint Interaction II: OT  Constraints are universal  Human grammars differ only in how these constraints are ranked ‘factorial typology’  First true contender for a formal theory of cross-linguistic typology

The Faithfulness / Markedness Dialectic  ‘cat’: /kat/  kæt *N O C ODA — why? F AITHFULNESS requires identity M ARKEDNESS often opposes it  Markedness-Faithfulness dialectic  diversity English: N O C ODA ≫ F AITH Polynesian: F AITH ≫ N O C ODA (~French)  Another markedness constraint M: Nasal Place Agreement [‘Assimilation’] (NPA): mb ≻ nb, ŋb nd ≻ md, ŋd ŋg ≻ ŋb, ŋd labial coronal velar

Nativism I: Learnability  Learning algorithm Provably correct and efficient ( under strong assumptions) Sources:  Tesar 1995 et seq.  Tesar & Smolensky 1993, …, 2000 If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E

in + possible Candidates Faith Mark (NPA) ☹ ☞ E☹ ☞ E i np ossible * A i m possible * Faith * ☺ ☞☺ ☞ If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E Constraint Demotion Learning Correctly handles difficult case: multiple violations in E

Nativism I: Learnability  M ≫ F is learnable with /in+possible/→impossible ‘not’ = in- except when followed by … “exception that proves the rule”: M = NPA  M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp : √ M and √ F, no M vs. F conflict, no evidence for their ranking  Thus must have M ≫ F in the initial state, ℌ 0

Nativism II: Experimental Test  Linking hypothesis: More harmonic phonological stimuli ⇒ Longer listening time  More harmonic: √ M ≻ * M, when equal on F √ F ≻ * F, when equal on M When must chose one or the other, more harmonic to satisfy M: M ≫ F  M = Nasal Place Assimilation (NPA)  Collaborators  Peter Jusczyk  Theresa Allocco  (Elliott Moreton, Karen Arnold)

4.5 Months (NPA) Higher HarmonyLower Harmony um…ber… umber um…ber… iŋgu p =.006 (11/16)

Higher HarmonyLower Harmony um…ber…u mb erun…ber…u nb er p =.044 (11/16) 4.5 Months (NPA)

 Markedness  Faithfulness u n …ber…u mb eru n …ber…u nb er ???

4.5 Months (NPA) Higher HarmonyLower Harmony u n …ber…u mb eru n …ber…u nb er p =.001 (12/16)

Nativism III: UGenome  Can we combine Connectionist realization of harmonic grammar OT’s characterization of UG to examine the biological plausibility of UG as innate knowledge?  Collaborators Melanie Soderstrom Donald Mathis

Nativism III: UGenome  The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device  Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’  Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!

The Problem  No concrete examples of such a LAD exist  Even highly simplified cases pose a hard problem: How can genes — which regulate production of proteins — encode symbolic principles of grammar?  Test preparation: Syllable Theory

Basic syllabification: Function  ƒ: /underlying form/  [surface form]  Plural form of dish : /d  š+s/  [.d .š  z.]  /CVCC/  [.CV.C V C.]

Basic syllabification: Function  ƒ: /underlying form/  [surface form]  Plural form of dish : /d  š+s/  [.d .š  z.]  /CVCC/  [.CV.C V C.]  Basic CV Syllable Structure Theory Prince & Smolensky 1993: Chapter 6 ‘Basic’ — No more than one segment per syllable position:.(C)V(C).

Basic syllabification: Function  ƒ: /underlying form/  [surface form]  Plural form of dish : /d  š+s/  [.d .š  z.]  /CVCC/  [.CV.C V C.]  Basic CV Syllable Structure Theory  Correspondence Theory McCarthy & Prince 1995 (‘M&P’)  /C 1 V 2 C 3 C 4 /  [.C 1 V 2.C 3 V C 4 ]

 P ARSE : Every element in the input corresponds to an element in the output — “no deletion” [M&P: ‘M AX ’] Syllabification: Constraints ( Con )

 P ARSE : Every element in the input corresponds to an element in the output  F ILL V/C : Every output V/C segment corresponds to an input V/C segment [every syllable position in the output is filled by an input segment] — “no insertion/epenthesis” [M&P: ‘D EP ’] Syllabification: Constraints ( Con )

 P ARSE : Every element in the input corresponds to an element in the output  F ILL V/C : Every output V/C segment corresponds to an input V/C segment  O NSET : No V without a preceding C Syllabification: Constraints ( Con )

 P ARSE : Every element in the input corresponds to an element in the output  F ILL V/C : Every output V/C segment corresponds to an input V/C segment  O NSET : No V without a preceding C  N O C ODA : No C without a following V Syllabification: Constraints ( Con )

SA net architecture  /C 1 C 2 /  [C 1 V C 2 ] C V /C 1 C 2 / [ C 1 V C 2 ]

Connection substructure   Local: fixed, gene- tically determined Content of constraint  1 Global: variable during learning Strength of constraint  1 1 s1s1 2 i s2s2 Network weight: Network input: ι = W Ψ  a 

P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33  All connection coefficients are +2

O NSET  All connection coefficients are  1 C V

Activation dynamics  Boltzmann Machine/Harmony Theory dynamics (temperature T  0)

Boltzmann-type learning dynamics  Clamped: P + = input & output; P  = input  Δ s i = ε[ E { H i | P + }  E { H i | P  }]  ε E { H i | P  } =  During the processing of training data in phase P , whenever unit φ (of type Φ) and unit ψ (of type Ψ) are simultaneously active, modify s i by  ε. [ε = ε/ N p ] Gradient descent in

Crucial Open Question (Truth in Advertising)  Relation between strict domination and neural networks?  Apparently not a problem in the case of the CV Theory

To be encoded  How many different kinds of units are there?  What information is necessary (from the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it?  How are constraints initially specified?  How are they maintained through the learning process?

Unit types  Input unitsCV  Output unitsCVx  Correspondence unitsCV  7 distinct unit types  Each represented in a distinct sub- region of the abstract genome  ‘Help ourselves’ to implicit machinery to spell out these sub-regions as distinct cell types, located in grid as illustrated

Connectivity geometry  Assume 3-d grid geometry V C ‘E’ ‘N’ ‘back’

Constraint: P ARSE C V 33 33 33 33 33 33 11 11 11 11 11 11 33 33 33 33 33 33 33 33 33 33 33 33  Input units grow south and connect  Output units grow east and connect  Correspondence units grow north & west and connect with input & output units.

Constraint: O NSET  Short connections grow north-south between adjacent V output units,  and between the first V node and the first x node. C V

Direction of projection growth  Topographic organizations widely attested throughout neural structures Activity-dependent growth a possible alternative  Orientation information (axes) Chemical gradients during development Cell age a possible alternative

Projection parameters  Direction  Extent Local Non-local  Target unit type  Strength of connections encoded separately

Connectivity Genome  Contributions from O NSET and P ARSE : Source: CICI VIVI COCO VOVOC VCVC xoxo Projec- tions : S LC C S L V C E L C C E L V C N&S S V O N S x 0 N L C I W L C O N L V I W L V O S S V O  Key: DirectionExtentTarget N(orth) S(outh) E(ast) W(est) F(ront) B(ack) L(ong) S(hort)Input: C I V I Output: C O V O x (0) Corr: V C C C

C V O NSET x 0 segment: | S S V O | N S x 0  V O segment: N&S S V O

Encoding connection strength  For each constraint  i, need to ‘embody’ Constraint strength s i Connection coefficients (Φ  Ψ cell types)  Product of these is contribution of  i to the Φ  Ψ connection weight  Network-level specification —

Φ Ψ Processing [P 1 ] ∝ s 1

Φ Ψ Development

Φ Ψ Learning (during phase P + ; reverse during P  )

Learning Behavior  Simplified system can be solved analytically  Learning algorithm turns out to ≈  s i (  ) =  [# violations of constraint i P  ]

C-C: C ORRESPOND : Abstract Gene Map General Developmental MachineryConnectivityConstraint Coefficients S L C C S L V C F S V C N/E L C C &V C S/W L C C &V C directionextenttarget C-I: V-I: G  C O &V&x B 1 C C &V C B  2 C C C I &C O 1V C V I &V O 1  R ESPOND : G 

Summary  Described an attempt to integrate Connectionist theory of mental processes (computational neuroscience, cognitive psychology) Symbolic theory of  Mental functions (philosophy, linguistics)  Representations –General structure (philosophy, AI) –Specific structure (linguistics)  Informs theory of UG Form, content Genetic encoding

Mystery Quote #4  “Smolensky, it would appear, would like a special dispensation for connectionist cognitive science to get the goodness out of Classical constituents without actually admitting that there are any.”

Mystery Quote #5 “The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather …to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non ‑ symbolic computation...” conceptual ‑ level research with new computational concepts and techniques that reflect an understanding of how conceptual ‑ level theoretical constructs emerge from subconceptual computation …

Mystery Quote #5 “The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non ‑ symbolic computation; to enrich conceptual ‑ level research with new computational concepts and techniques that reflect an understanding of how conceptual ‑ level theoretical constructs emerge from subconceptual computation…”

Thanks for your attention