Efficient Generation in Primitive Optimality Theory Jason Eisner University of Pennsylvania ACL - 1997.

Slides:



Advertisements
Similar presentations
Optimality Theory (OT) Prepared and presented by: Abdullah Bosaad & Liú Chàng Spring 2011.
Advertisements

Heuristic Functions By Peter Lane
Optimality Theory Presented by Ashour Abdulaziz, Eric Dodson, Jessica Hanson, and Teresa Li.
Regular Expressions and DFAs COP 3402 (Summer 2014)
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Department of Computer Science & Engineering
Lecture 21 Nondeterministic Polynomial time, and the class NP FIT2014 Theory of Computation Monash University Faculty of Information Technology Slides.
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 4 By Herb I. Gross and Richard A. Medeiros next.
January 5, 2015CS21 Lecture 11 CS21 Decidability and Tractability Lecture 1 January 5, 2015.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
1 Regular Expressions and Automata September Lecture #2-2.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
The Theory of NP-Completeness
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
Normal forms for Context-Free Grammars
Finite State Machines Data Structures and Algorithms for Information Processing 1.
[kmpjuteynl] [fownldi]
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Complexity and Computability Theory I Lecture #13 Instructor: Rina Zviel-Girshin Lea Epstein Yael Moses.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 6 The Mathematics of Touring 6.1Hamilton Paths and Hamilton Circuits.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 111 Computability, etc. Midterm review. Turing machines. Finite state machines. Push down automata. Homework: FSA, PDA, TM problems (can work in teams)
Lecture 1 Computation and Languages CS311 Fall 2012.
Comprehension & Compilation in Optimality Theory Jason Eisner Jason Eisner Johns Hopkins University July 8, 2002 — ACL.
CSC 172 P, NP, Etc. “Computer Science is a science of abstraction – creating the right model for thinking about a problem and devising the appropriate.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 3813 Introduction to Formal Languages and Automata Chapter 14 An Introduction to Computational Complexity These class notes are based on material from.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
Models of Linguistic Choice Christopher Manning. 2 Explaining more: How do people choose to express things? What people do say has two parts: Contingent.
Computability Review homework. Regular Operations. Nondeterministic machines. NFSM = FSM Homework: By hand, build FSM version of specific NFSM. Try to.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
Umans Complexity Theory Lectures Lecture 1a: Problems and Languages.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
NP-Complete problems.
NP-Complete Problems Algorithm : Design & Analysis [23]
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
 Relation: like a function, but multiple outputs ok  Regular: finite-state  Transducer: automaton w/ outputs  b  ? a  ?  aaaaa  ?  Invertible?
CS 203: Introduction to Formal Languages and Automata
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
TM Design Macro Language D and SD MA/CSSE 474 Theory of Computation.
Principles Rules or Constraints
A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
Optimality Theory. Linguistic theory in the 1990s... and beyond!
1 LING 696B: Maximum-Entropy and Random Fields. 2 Review: two worlds Statistical model and OT seem to ask different questions about learning UG: what.
1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
[These slides are missing most examples and discussion from class …]
Formal Language & Automata Theory
Context-Free Grammars: an overview
CSE 105 theory of computation
Cpt S 317: Spring 2009 Reading: Chapters 1-4
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CHAPTER 2 Context-Free Languages
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Cpt S 317: Spring 2009 Reading: Chapters 1-4
Presentation transcript:

Efficient Generation in Primitive Optimality Theory Jason Eisner University of Pennsylvania ACL

2 Overview A new formalism –What is Optimality Theory? (OT) –Primitive Optimality Theory (OTP) Some results for OTP –Linguistic fit –Formal results –Practical results on generation

3 What Is Optimality Theory? Prince & Smolensky (1993) Alternative to stepwise derivation Stepwise winnowing of candidate set Gen Constraint 1 Constraint 2 Constraint 3 input... output such that different constraint orders yield different languages

4 Filtering, OT-style constraint would prefer A, but only allowed to break tie among B,D,E  = candidate violates constraint twice

5 Formalisms in phonology ? ? Two communities with different needs...

6 Unformalized OT isn’t a theory We need a formalism here, not informal English. Using English, can express any constraint  describe impossible languages  specify any grammar with 1 big constraint (undermines claim that typology = constraint reranking)  no algorithms (generation, parsing, learning) ?

7 OTFS: A finite-state formalization (used computationally: Ellison 1994, Frank & Satta 1996) Let’s call this system OTFS, for “finite-state”: Q: What does a candidate look like? A: It’s a string. And a set of candidates is a regular set of strings. Q: Where does the initial candidate set come from? A: Gen is a nondeterministic transducer. It turns an input into a regular set of candidate strings. Q: How powerful can a constraint be? A: Each constraint is an arc-weighted DFA. A candidate that violates the constraint 3 times, , is accepted on a path of weight 3.

8 … but should linguists use OTFS? Linguists probably won’t use OTFS directly: Strings aren’t a perspicuous representation Again, can specify grammar with 1 big constraint Too easy to express “unnatural” constraints Linguistically too strong? (e.g., it can count) too weak? (floating tones? GA?) ?

9 Solution: Primitive OT (“OTP”) OTP Formalizes current practice in linguistics (and easy for linguists to use) Turns out to be equivalent to OTFS (new result! not in the paper) Simple enough for computational work

10 Representations in OTP V CC V C voi nas  Stem cf. Goldsmith style (old) V CC V C  Stem voi nas OTP style (new) OTP’s “autosegmental timeline” specifies the relative timing of phonetic gestures and other constituents. (not absolute timing)

11 Edges & Overlaps V CC V C  Stem voi nas Edges are explicit; no association lines Associations are now captured by temporal overlap OTP’s constraints are simple & local: They merely check whether these gestures overlap in time, and whether their edges line up.

12 The Primitive Constraints Each  overlaps with some .       2 violations (all other  ’s attract  ’s)  “implication”  “clash” Each  overlaps with no .       3 violations (all other  ’s repel  ’s)

13 Examples from the literature nas  voi every nasal segment bears some voicing feature [   [C every syllable starts with some consonant (onset) F  [  every foot crosses some mora boundary (non- ATR  low no ATR feature on any low vowel ]F  ]word [   C no  boundary during any consonant (no geminates)   H or L every syllable bears some tone ( ) no foot at the end of any word (extrametricality) conj  disj degenerate)

14 Input, Output, and Gen in OTP V CC VC voi CC VC } underlying tiers surface tiers CC VC voi Gen CC VC CC VC voi CCC velar CC VC voi CCC V etc. Gen proposes all candidates that include this input. }

15 Example (Korean Final Devoicing) InputOutput bi-bim babbi-bim bap word-final, devoiced word-final, NOT devoiced (because it’s sonorant) Relevant constraints son  voi“sonorants attract voicing” ] word  ] voi “ends of words repel voicing” voi  voi “input voicing attracts surface voicing”

16 Example (Korean Final Devoicing) winner! voi word b a b voi word b a p word p a p voi (and many more)

17 INTERMISSION I’ve sketched: –Why (something like) OTP is needed –How OTP works What’s left: –Results about OTP and OTFS –How can we build a tool for linguists?

18 Linguistic appropriateness Tested OTP against the literature Powerful enough? –Nearly all constraints turn out primitive Not too powerful? –All degrees of freedom are exercised … e.g., –… in each of several domains: features, prosody, featural prosody, I-O, morph. [ x  y [ x  y x  y x  y

19 Generative power: OTP = OTFS Encode OTP grammar in OTFS? –Cheaply - OTP constraints are tiny automata! –Encode multi-tier candidates as strings Encode OTFS grammar with just OTP? –Yes, if we’re allowed some liberties: to invent new kinds of OTP constituents (beyond nas, voi,  …) to replace big OTFS constraint with many small primitive constraints that shouldn’t be reordered -F +F [F ]F

20 Is OTP = OTFS strong enough? OTP less powerful than McCarthy & Prince’s Generalized Alignment, which sums distances Proof: –Align-Left( , Hi) prefers a floating tone to dock centrally; this essentially gives a n b n –Pumping  OTFS can’t capture this case H  H H = 21 violations = 21 violations = 12 violations

21 current linguistic practice (OT as she is often spoke) On the other hand... OTFS known more powerful than rational transductions (Frank & Satta 1997) So is OTP too weak or too strong?? rat. transductions < OTP < OTP+GA past linguistic practice (serial derivations)

22 Eliminating Generalized Alignment Should we pare OTP back to this level? Hard to imagine making it any simpler. Should we beef OT up to this level, by allowing GA? Ugly mechanisms like GA weren’t needed before OT. GA is non-local, arithmetic, and too powerful. Does OT really need it, or would OTP be enough? rat. transductions < OTP < OTP+GA

23 Stress typology without GA OTP forbids A LIGN and other stress constraints –But complete reanalysis within OTP is possible –The new analysis captures the data, and does a better job at explaining tricky typological facts! In OTP analysis, constraint reranking explains: –several iambic-trochaic asymmetries –coexistence of metrical & non-metrical systems –restricted distribution of degenerate feet –a new typological fact not previously spotted

24 Building a tool for generation If linguists use OTP (or OTFS), can we help them filter the infinite candidate set? Gen Constraint 1 Constraint 2 Constraint 3 input... output OTP grammar

25 Ellison’s generation method (1994) Gen... Candidate set (an unweighted DFA accepting candidate strings) Encode every candidate as a string input (simplified)

26 Ellison’s generation method (1994) Gen Constraint = simple weighted DFA... Candidate set Encode every candidate as a string A constraint is an arc-weighted DFA that evaluates strings – Weight of the accepting path = degree of violation input

27 Ellison’s generation method (1994) Gen... Encode every candidate as a string A constraint is an arc-weighted DFA that scores strings – Weight of accepting path = degree of violation Constraint = simple weighted DFA Intersection yields weighted DFA that accepts the candidates and scores each one input Candidate set

28 Gen... Encode every candidate as a string A constraint is a weighted DFA that scores strings – Weight of accepting path = degree of violation Prune back to min-weight accepting paths (best candidates) Constraint = simple weighted DFA Intersection input Ellison’s generation method (1994) Candidate set

29 Alas - Explosion of states Ellison’s algorithm is impractical for OTP Why? Initial candidate set is huge DFA –2 k states: An intersection of many orthogonal 2-state automata –For every left edge on any tier, there must be a right edge –So state must keep track: “I’m in C, and in nas, but out of ...” Mostly the same work gets duplicated at nasal and non-nasal states, etc. –Wasteful: stress doesn’t care if foot is nasal!

30 Solution: Factored automata Clumsy big automata arise in OTP when we intersect many small automata Just maintain the list of small automata –Like storing a large integer as a list of prime factors –Try to compute in this “factored” domain for as long as possible: defer intersection

31 Solution: Factored automata nas tier is well-formed  x tier is well-formed  F tier is well-formed  input material  word never ends on voiced obstruent etc. nas tier is well-formed  x tier is well-formed  F tier is well-formed  input material  word never ends on voiced obstruent etc. Candidate set intersect candidate set with new constraint and prune back to lightest paths [F without [x other new constraint [ F  [ x

32 Solution: Factored automata nas tier is well-formed  x tier is well-formed  F tier is well-formed  input material  word never ends on voiced obstruent etc. nas tier is well-formed  x tier is well-formed  F tier is well-formed  input material  word never ends on voiced obstruent etc. Candidate set [F without [x other Just add this as a new factor? No, must follow heavy arc as rarely as possible. CERTAIN of the existing factors force us to take heavy arc. Ignore the other factors!

33 Factored automata Filter candidates via “best intersection” –Candidate set = unweighted factored DFA –Constraint = simple weighted DFA –Goal: Winnow candidate set (i.e., add new factor) Factored DFA Factored DFA small DFA: where does [ F bar [ x ? intersection, pruned back to best paths constraint [ F  [ x

34 Good news & bad news Factored methods work correctly Can get 100x speedup on real problem But what is the worst case? –O(n log n) on the size of the input –but NP-complete on the size of the grammar! can encode Hamilton Path as an OTP grammar –Significant if grammar keeps changing learning algorithms (Tesar 1997) tools for linguists to develop grammars

35 Summary OTP: A clean formalism for linguists –simple, empirically plausible version of OT –good fit to current linguistic practice –can force fruitful new analyses Formal results –transducers < OTFS = OTP < OTP+GA –the generation problem is NP-complete Practical results on generation –use factored automata for efficiency

36 Representation = Edge Ordering V CC V C voi nas [ voivoi ]  Stem [ StemStem ] [[]] ]][[ [V[VV]V][V[VV]V] [C[CC]C][C[CC]C][C[CC]C] [ nasnas ][ nasnas ] =

37 Linguists have not formalized OT Gen Informal English More English English input... output ? ? ? How powerful is Gen in preselecting candidates? How powerful can constraints be? What do the candidates look like? (to their chagrin)

38 Regard string abc as Given a finite-state constraint: –invent a new constituent type for each arc –use several primitive constraints to ensure: each symbol must project an arc that accepts it these arcs must form an accepting path the path must have as few violations as possible Encoding OTFS into OTP YXZ bac bac  symbols arcs violations

39 OTP generation is NP-complete Solve Hamilton Path within OTP 1. The word attracts one copy of each vertex 2. Repels added copies (so candidate = vertex ordering) 3. No gaps: vertices attract each other 4. Unconnected vertices repel each other a u v...[a][v][u]... To solve a big Hamilton Path problem, construct a big grammar For fixed grammar, only O(n log n), but some grammars require huge constant