Models of Linguistic Choice Christopher Manning. 2 Explaining more: How do people choose to express things? What people do say has two parts: Contingent.

Slides:

Advertisements

Similar presentations

Data-Assimilation Research Centre

Advertisements

Sociology 680 Multivariate Analysis Logistic Regression.

Modality Lecture 10. Language is not merely used for conveying factual information A speaker may wish to indicate a degree of certainty to try to influence.

1 Phonology → Phonetics Understanding Features 2 Richness of the Base The source of all systematic cross-linguistic variation is constraint reranking.

MEGN 537 – Probabilistic Biomechanics Ch.4 – Common Probability Distributions Anthony J Petrella, PhD.

Using a Modified Borda Count to Predict the Outcome of a Condorcet Tally on a Graphical Model 11/19/05 Galen Pickard, MIT Advisor: Dr. Whitman Richards,

Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.

Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.

Introduction to Linguistics and Basic Terms

Exam 1 Review u Scores Min 30 Max 96 Ave 63.9 Std Dev 14.5.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

1/13 Parsing III Probabilistic Parsing and Conclusions.

January 24-25, 2003Workshop on Markedness and the Lexicon1 On the Priority of Markedness Paul Smolensky Cognitive Science Department Johns Hopkins University.

Chapter 2 Simple Linear Regression – What is it, Why do we do it?

Definition and Properties of the Cost Function

English Language Proficiency Test (ELPT) On the Model of IBT- TOEFL.

Online Learning Algorithms

Hydrologic Statistics

[kmpjuteynl] [fownldi]

Gaussian process modelling

Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.

Basics of Probability. A Bit Math A Probability Space is a triple, where  is the sample space: a non-empty set of possible outcomes; F is an algebra.

Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.

Linguistic Theory Lecture 10 Grammaticality. How do grammars determine what is grammatical? 1 st idea (traditional – 1970): 1 st idea (traditional – 1970):

Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.

Equations, Inequalities, and Mathematical Models 1.2 Linear Equations

Statistics Introduction 2. The word Probability derives from the Latin probabilitas, which can also mean probity, a measure of the authority of a witness.

Estimating the Predictive Distribution for Loss Reserve Models Glenn Meyers Casualty Loss Reserve Seminar September 12, 2006.

SPEECH AND WRITING. Spoken language and speech communication In a normal speech communication a speaker tries to influence on a listener by making him:

인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

General ideas to communicate Dynamic model Noise Propagation of uncertainty Covariance matrices Correlations and dependencs.

Chapter 1 Introduction n Introduction: Problem Solving and Decision Making n Quantitative Analysis and Decision Making n Quantitative Analysis n Model.

Exam 1 Review u Scores Min 30 Max 96 Ave 63.9 Std Dev 14.5.

Math for Liberal Studies.  We have seen many methods, all of them flawed in some way  Which method should we use?  Maybe we shouldn’t use any of them,

Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.

BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.

3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.

Data Mining and Decision Support

Unit 2 The Nature of Learner Language 1. Errors and errors analysis 2. Developmental patterns 3. Variability in learner language.

Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Review of statistical modeling and probability theory Alan Moses ML4bio.

Grammar Chapter 10. What is Grammar? Basic Points description of patterns speakers use to construct sentences stronger patterns - most nouns form plurals.

Optimality Theory. Linguistic theory in the 1990s... and beyond!

Plenary 3. So how do we decide what to teach? You might be inspired by an approach you see potential in. Here’s an example very popular in some places.

Unit 6 Managing Life. NOT used Usually I’m just going to …

1 LING 696B: Maximum-Entropy and Random Fields. 2 Review: two worlds Statistical model and OT seem to ask different questions about learning UG: what.

King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.

MEGN 537 – Probabilistic Biomechanics Ch.4 – Common Probability Distributions Anthony J Petrella, PhD.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

How to Fix Problem Sentences Fragments Run-ons Comma Splices.

Scientific English By the end of this course you will be able to:  - use scientific language and style (English and not only!);  - make intriguing titles.

Lecture 7: Measurements and probability

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

Biointelligence Laboratory, Seoul National University

Impossibility and Other Alternative Voting Methods

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Impossibility and Other Alternative Voting Methods

Machine Learning Today: Reading: Maria Florina Balcan

Chapter 7 Implications of Existence and Equivalence Theorems

MEGN 537 – Probabilistic Biomechanics Ch

Lexico-grammar: From simple counts to complex models

Learning Theory Reza Shadmehr

Conceptual Puzzles & Theoretical Elegance (principledness. naturalness

Dr. Arslan Ornek MATHEMATICAL MODELS

CS249: Neural Language Model

Presentation transcript:

Models of Linguistic Choice Christopher Manning

2 Explaining more: How do people choose to express things? What people do say has two parts: Contingent facts about the world People in the Bay Area have talked a lot about electricity, housing prices, and stocks lately The way speakers choose to express ideas within a situation using the resources of their language People don’t often put that clauses pre-verbally: That we will have to revise this program is almost certain We’re focusing on linguistic models of the latter choice

3 How do people choose to express things? Simply delimiting a set of grammatical sentences provides only a very weak description of a language, and of the ways people choose to express ideas in it Probability densities over sentences and sentence structures can give a much richer view of language structure and use In particular, we find that the same soft generalizations and tendencies of one language often appear as (apparently) categorical constraints in other languages Linguistic theory should be able to uniformly capture these constraints, rather than only recognizing them when they are categorical

4 Probabilistic Models of Choice P(form|meaning, context) Looks difficult to define. We’re going to define it via features A feature is anything we can measure/check P(form|f 1, f 2, f 3, f 4, f 5 ) A feature might be “3 rd singular subject”, “object is old information”, “addressee is a friend”, “want to express solidarity”

5 Explaining language via (probabilistic) constraints 110The plan was approved by us last week 001We approved the plan last week f 3 *Ag/Non-subj f 2 *3>1/2 f 1 *Su/Newer Input: approve LinkingPersonDiscourse Constraints = Features = Properties

6 Explaining language via (probabilistic) constraints Categorical/constraint-based grammar [GB, LFG, HPSG, …] All constraints must be satisfied, if elsewhere conditions / emergence of unmarked, complex negated conditions need to be added. Optimality Theory Highest ranked differentiating constraint always determines things. Emergence of unmarked. Single winner: No variable outputs. No ganging up. Stochastic OT Probabilistic noise at evaluation time allows variable rankings and hence a distribution over multiple outputs. No ganging up. Generalized linear models (e.g., Varbrul)

7 A theory with categorical feature combination In a certain situation you can predict a single output or no well-formed output No model of gradient grammaticality No way to model variation Or you can predict a set of outputs Can’t model their relative frequency Categorical models of constraint combination allows no room for soft preferences and constraint combining together to make an output dispreferred or impossible (“ganging up” or “cumulativity”)

8 Optimality Theory Prince and Smolensky (1993/2005!): Provide a ranking of constraints (ordinal model) Highest differentiating constraint determines winner “When the scalar and the gradient are recognized and brought within the purview of theory, Universal Grammar can supply the very substance from which grammars are built: a set of highly general constraints, which, through ranking, interact to produce the elaborate particularity of individual languages.” No variation in output (except if ties) No cumulativity of constraints

9 Creating more ties One way to get more variation is to create more ties by allowing various forms of floating constraint rankings or unordering of constraints If you have lots of ways of deriving a form from underlying meanings, then you can count the number of derivations Anttila (1997) (I confess I’m sceptical of such models; inter alia they inherit the problems of ties in OT: they’re extremely unstable.)

10 Stochastic OT (Boersma 1997) Basically follows Optimality Theory, but Don’t simply have a constraint ranking Constraints have a numeric value on a scale A random perturbation is added to a constraint’s ranking at evaluation time The randomness represents incompleteness of our model Variation results if constraints have similar values – our grammar constrains but underdetermines the output One gets a probability distribution over optimal candidates for an input (over different evaluations) f 1 f 2 f 3 f 4

11 Stochastic OT (Boersma 1997) Stochastic OT can model variable outputs It does have a model of cumulativity, but constraints in the model are and can only be very weakly cumulative We’ll look soon at some papers that discuss how well this works as a model of linguistic feature combination

12 Generalized linear models The grammar provides representations We define arbitrary properties over those representations (e.g. Subj=Pro, Subj=Topic) We learn weights w i for how important the properties are These are put into a generalized linear model Model: or

13 Generalized linear models Can get categorical or variable outputs As probability distribution: All outputs have some probability of occurrence, with the distribution based on the weights of the features. Ganging up. Emergence of the unmarked. Optimizing over generalized linear models: we choose one for which the probability is highest: arg max j P(c j ) Output for an input is categorical. Features gang up. (However by setting weights far enough apart, ganging up will never have an effect – giving conventional OT.) Emergence of unmarked.