January 24-25, 2003Workshop on Markedness and the Lexicon1 On the Priority of Markedness Paul Smolensky Cognitive Science Department Johns Hopkins University.

Slides:

Advertisements

Similar presentations

Optimality Theory (OT) Prepared and presented by: Abdullah Bosaad & Liú Chàng Spring 2011.

Advertisements

Optimality Theory Presented by Ashour Abdulaziz, Eric Dodson, Jessica Hanson, and Teresa Li.

Chapter 4 Key Concepts.

Parallel Symbolic Execution for Structural Test Generation Matt Staats Corina Pasareanu ISSTA 2010.

Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.

Learning linguistic structure with simple recurrent networks February 20, 2013.

EE663 Image Processing Histogram Equalization Dr. Samir H. Abdul-Jauwad Electrical Engineering Department King Fahd University of Petroleum & Minerals.

Gestural overlap and self-organizing phonological contrasts Contrast in Phonology, University of Toronto May 3-5, 2002 Alexei Kochetov Haskins Laboratories/

1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.

Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Psych 56L/ Ling 51: Acquisition of Language Lecture 8 Phonological Development III.

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Sampling Methods and Sampling Distributions Chapter.

Complex Systems, Agent Cognition and Network Structure : Modeling with Low Cognition Agents Rich Colbaugh, Kristin Glass, Paul Ormerod and Bridget Rosewell.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Lecture 1 Introduction: Linguistic Theory and Theories

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

January 24-25, 2003Workshop on Markedness and the Lexicon1  Empirical Relevance Local conjunction has seen many empirical applications; here, vowel harmony.

[kmpjuteynl] [fownldi]

Psych 56L/ Ling 51: Acquisition of Language Lecture 8 Phonological Development III.

Particle Filtering in Network Tomography

Jakobson's Grand Unified Theory of Linguistic Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold.

Markedness Optimization in Grammar and Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold Donald.

Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.

Statistical learning, cross- constraints, and the acquisition of speech categories: a computational approach. Joseph Toscano & Bob McMurray Psychology.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

Attendee questionnaire Name Affiliation/status Area of study/research For each of these subjects: –Linguistics (Optimality Theory) –Computation (connectionism/neural.

Chapter 2 Developmental Psychology A description of the general approach to behavior by developmental psychologists.

Motif finding with Gibbs sampling CS 466 Saurabh Sinha.

Formal Typology: Explanation in Optimality Theory Paul Smolensky Cognitive Science Department Johns Hopkins University Géraldine Legendre Donald Mathis.

Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.

Heuristic Optimization Methods Tabu Search: Advanced Topics.

Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.

Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Statistics Introduction 2. The word Probability derives from the Latin probabilitas, which can also mean probity, a measure of the authority of a witness.

Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.

Course: Logic Programming and Constraints

Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures 1.Cognitive architecture: Symbols.

Models of Linguistic Choice Christopher Manning. 2 Explaining more: How do people choose to express things? What people do say has two parts: Contingent.

May 7, 2003University of Amsterdam1 Markedness in Acquisition Is there evidence for innate markedness- based bias in language processing? Look to see whether.

Laboratory Phonology 11, 30 June - 2 July 2008, Wellington, New Zealand The Gradient Phonotactics of English CVC Syllables Olga Dmitrieva & Arto Anttila.

Chapter 10 Introducing Probability BPS - 5th Ed. Chapter 101.

The phonology of Hakka zero- initials Raung-fu Chung Southern Taiwan University 2011, 05, 29, Cheng Da.

Program Structure  OT Constructs formal grammars directly from markedness principles Strongly universalist: inherent typology  OT allows completely formal.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

BPS - 3rd Ed. Chapter 91 Introducing Probability.

Introduction to Models Lecture 8 February 22, 2005.

1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.

Logistic Regression William Cohen.

Principles Rules or Constraints

Why use landscape models?  Models allow us to generate and test hypotheses on systems Collect data, construct model based on assumptions, observe behavior.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.

1 Chapter 10 Probability. Chapter 102 Idea of Probability u Probability is the science of chance behavior u Chance behavior is unpredictable in the short.

Chapter 8: Probability: The Mathematics of Chance Probability Models and Rules 1 Probability Theory  The mathematical description of randomness.  Companies.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Psychological status of phonological analyses Before Chomsky linguists didn't talk about psychological aspects of linguistics Chomsky called linguistics.

Optimality Theory. Linguistic theory in the 1990s... and beyond!

1 LING 696B: Maximum-Entropy and Random Fields. 2 Review: two worlds Statistical model and OT seem to ask different questions about learning UG: what.

Usage-Based Phonology Anna Nordenskjöld Bergman. Usage-Based Phonology overall approach What is the overall approach taken by this theory? summarize How.

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

Theoretical Discussion on the

Phonological Priming and Lexical Access in Spoken Word Recognition

Visiting human errors in IR systems from decision making perspective

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Learning linguistic structure with simple recurrent neural networks

Conceptual Puzzles & Theoretical Elegance (principledness. naturalness

Presentation transcript:

January 24-25, 2003Workshop on Markedness and the Lexicon1 On the Priority of Markedness Paul Smolensky Cognitive Science Department Johns Hopkins University

January 24-25, 2003Workshop on Markedness and the Lexicon2 Markedness Rules Markedness is prior to lexical frequency  Developmentally Explanatorily  Markedness determines possible inventories (e.g., of lexical items) ÌMarkedness determines relative frequency of structures Have few solid results; mostly suggestive evidence, empirical and theoretical

January 24-25, 2003Workshop on Markedness and the Lexicon3  Developmental Priority Look to see whether young infants are sensitive to markedness before they’ve had sufficient relevant experience Before 6 months, infants have not shown sensitivity to language-particular phonotactics

January 24-25, 2003Workshop on Markedness and the Lexicon4  Experimental Exploration of the Initial State

January 24-25, 2003Workshop on Markedness and the Lexicon5 Talk Outline Markedness is prior to lexical frequency  Developmentally Explanatorily  Markedness determines possible inventories (e.g., of lexical items) ÌMarkedness determines relative frequency of structures

January 24-25, 2003Workshop on Markedness and the Lexicon6  Markedness and Inventories Insert: SHarC Theorem Insert: Lango

January 24-25, 2003Workshop on Markedness and the Lexicon7  Inherent Typology Method applicable to related African languages, where the same markedness constraints govern the inventory (Archangeli & Pulleyblank ’94), but with different interactions: different rankings and active conjunctions Part of a larger typology including a range of vowel harmony systems

January 24-25, 2003Workshop on Markedness and the Lexicon8  Summary OT builds formal grammars directly from markedness: M ARK … with F AITH Inventories consistent with markedness relations are formally the result of OT … with local conjunction: T LC [Φ], SHarC theorem Even highly complex patterns can be explained purely with simple markedness constraints: all complexity is in constraints’ interaction through ranking and conjunction: Lango ATR harmony

January 24-25, 2003Workshop on Markedness and the Lexicon9 Talk Outline Markedness is prior to lexical frequency  Developmentally Explanatorily  Markedness determines possible inventories (e.g., of lexical items) ÌMarkedness determines relative frequency of structures [???]

January 24-25, 2003Workshop on Markedness and the Lexicon10 The question is not – why does John say X more frequently than Y?, but – why does John’s speech community say X more frequently than Y?  Markedness  Frequency How are markedness and frequency to be theoretically related? Markedness theory must predict frequency distributions –Frequencies are the data to be explained How, within generative grammar? Consider an extreme (but important) distribution in cross-linguistic typology

January 24-25, 2003Workshop on Markedness and the Lexicon11  A Generativist Paradox UG must not generate unattested languages What counts as unattested? “The overwhelming generalization is U ; the proposed UG 0 is right because all systems it generates satisfy U ” “This UG generates the somewhat odd system X (violates U ) … but this is actually a triumph because it so happens that the actual (but obscure) language L is odd like X ” Inconsistent ! celebrates: X not generated celebrates: X is generated

January 24-25, 2003Workshop on Markedness and the Lexicon12  The Generativist Paradox That is, how to explain generalizations of the form “Overwhelmingly across languages, U is true, but in rare cases it is violated: (an ‘exception’) X ” Generative grammar has only two options: –Generate only U -systems: strictly prohibits X or –Generate both U and not-U systems: allows X Neither explains the generalization

January 24-25, 2003Workshop on Markedness and the Lexicon13  The Generativist Paradox A proposed UG 0 entails a universal U: T ≻ K UG 0 thus predicts –if a language allows T it must also allow K –errors must be directed K  T Suppose this is overwhelmingly true, but rarely: –a language X ’s inventory includes K but not T –there are errors T  K UG 0 -impossible! –Is this evidence for or against UG 0 ? –Must UG 0 be weakened to allow languages with K ≻ T ?

January 24-25, 2003Workshop on Markedness and the Lexicon14 UG is not responsible for X ; not core –Linguists’ judgment determines the core data –Good approach  Approaches to the Paradox

January 24-25, 2003Workshop on Markedness and the Lexicon15 UG is not responsible for X ; not core UG generates X and is not responsible for its rarity –Derives from extra-grammatical factors  Approaches to the Paradox

January 24-25, 2003Workshop on Markedness and the Lexicon16 UG is not responsible for X ; not core UG generates X and is not responsible for its rarity  Approaches to the Paradox UG generates X and derives its rarity –qualitatively or –quantitatively I have no idea Well, maybe three ideas … How, within a generative theory — OT?

January 24-25, 2003Workshop on Markedness and the Lexicon17 Graded Generability in OT Idea  :Ranking Restrictiveness Rare systems are those produced by only a highly restricted set of rankings Parallel to within-language variation in OT  Grammar + Ø

January 24-25, 2003Workshop on Markedness and the Lexicon18 Consider first within-language variation –a language has a range of rankings –for a given input, the probability of an output is the combined probability of all the rankings for which it is optimal Rankings: equal probability (Anttila) Rankings: “Gaussian probability” (Boersma) – works surprisingly well  Graded Generability in OT

January 24-25, 2003Workshop on Markedness and the Lexicon19 Consider first within-language variation –a language has a range of rankings –for a given input, the probability of an output is the combined probability of all the rankings for which it is optimal  Graded Generability in OT Can this work for cross-linguistic variation? – I haven’t a clue Well, maybe three clues

January 24-25, 2003Workshop on Markedness and the Lexicon20 Encouraging or discouraging???  Clue 1: CV Theory

January 24-25, 2003Workshop on Markedness and the Lexicon21  Clue 2: Constraint Sensitivity The probabilistic interpretation would provide additional empirical constraints on OT theories: ¿Markedness of low-front-round  (IPA Œ ): ① *[+fr, +lo, +rd] or ② *[+fr, +rd], *[+lo, +rd], [+fr, +lo] ? Faithfulness constraints F[fr], F[rd], F[lo] Probability of  in the inventory ① 25% ② 7% Empirical probability informs constraint discovery

January 24-25, 2003Workshop on Markedness and the Lexicon22  Clue 3: BO(WO) n W and & D In Basic Inventory Theory with Local Conjunction, the proportion of rankings yielding a BO(WO) n W inventory is Even when many conjunctions are present, the likelihood that they matter becomes vanishingly small as n (the order of conjunction) increases

January 24-25, 2003Workshop on Markedness and the Lexicon23  Graded Generability in OT Idea . Learnability Rarer grammars are less robustly learnable  Grammar + general learning theory ???

January 24-25, 2003Workshop on Markedness and the Lexicon24  Graded Generability in OT As with Ranking Restrictiveness, start with language-internal variation Idea  Connectionist substrate Given an input I, a rare output O is one that is rarely found by the search process  Grammar + general processing theory

January 24-25, 2003Workshop on Markedness and the Lexicon25  Graded Generability in OT Problem identified by Matt Goldrick Aphasic errors predominantly k  t but also t  k occurs, rarely Exceptional behavior w.r.t. markedness How is this possible if *dor ≫ *cor in UG? Under no possible ranking can t  k Must we allow violations of *dor ≫ *cor ? Alternative approach via processing theory Crucial: global vs. local optimization

January 24-25, 2003Workshop on Markedness and the Lexicon26  OT ⇒ pr[I → O] via Connectionism Candidate A : realized as an activation pattern a (distributed; or local to a unit) Harmony of A : H ( a ), numerical measure of consistency between a and the connection weights W Grammar: W Discrete symbolic candidate space embedded in a continuous state space Search: Probability of A : pr T ( a ) ∝ e H ( a )/ T –During search, T  0

January 24-25, 2003Workshop on Markedness and the Lexicon27  Harmony Maxima Patterns realizing optimal symbolic structures are global Harmony maxima Patterns realizing suboptimal symbolic structures are local Harmony maxima Search should find the global optimum Search will find a local optimum Example: Simple local network for doing ITBerber syllabification

January 24-25, 2003Workshop on Markedness and the Lexicon28 BrbrNet

January 24-25, 2003Workshop on Markedness and the Lexicon29 BrbrNet’s Local Harmony Maxima An output pattern in BrbrNet is a local Harmony maximum if and only if it realizes a sequence of legal Berber syllables (i.e., an output of Gen ) That is, every activation value is 0 or 1, and the sequence of values is that realizing a sequence of substrings taken from the inventory {CV, CVC, #V, #VC}, where C denotes 0, V denotes 1 and # denotes a word edge

January 24-25, 2003Workshop on Markedness and the Lexicon30  Competence, Performance So how can t  k ? – t a global max, k a local max –now we can get k when should get t Distinguish Search Dynamics (‘performance’) from Harmony Landscape (‘competence’) –the universals in the Harmony Landscape require that, absent performance errors, we must have k  t –an imperfect Search Dynamics allows t  k The huge ‘general case/exception’ contrast – t ’s output derives from UG – k ’s output derives from performance error

January 24-25, 2003Workshop on Markedness and the Lexicon31  Summary Exceptions to markedness universals may potentially be modeled as performance errors: the unmarked (optimal) elements are global Harmony maxima, but local search can end up with marked elements which are local maxima Applicable potentially to sporadic, unsystematic exceptions in I  O mapping Extensible to systematic exceptions in I  O or to exceptional grammars???

January 24-25, 2003Workshop on Markedness and the Lexicon32 Markedness Rules Markedness is prior to lexical frequency  Developmentally Explanatorily  Markedness determines possible inventories (with local conjunction) ÌMarkedness determines relative frequency of structures --- ???