600.325/425 Declarative Methods - J. Eisner1 Constraints on Strings.

Slides:



Advertisements
Similar presentations
NP-Hard Nattee Niparnan.
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
1Basic Mathematics - Finite-State Methods in Natural-Language Processing: Basic Mathematics Ronald M. Kaplan and Martin Kay.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Intro to NLP - J. Eisner1 Finite-State Methods.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Support Vector Machines and Kernel Methods
Machine Learning Week 2 Lecture 2.
FSA and HMM LING 572 Fei Xia 1/5/06.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Finite state automaton (FSA)
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Crash Course on Machine Learning
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Week 15 - Wednesday.  What did we talk about last time?  Review first third of course.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Regular Expressions and Finite State Automata Themes –Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state –Regular.
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 Relational Algebra and Calculas Chapter 4, Part A.
1 Closures of Relations: Transitive Closure and Partitions Sections 8.4 and 8.5.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
Copyright © Curt Hill Finite State Automata Again This Time No Output.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
1 Parameterized Finite-State Machines and Their Training Jason Eisner Jason Eisner Johns Hopkins University March 4, 2004 — Saarbrücken.
Mathematical Preliminaries
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
CS 203: Introduction to Formal Languages and Automata
TM Design Macro Language D and SD MA/CSSE 474 Theory of Computation.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
The Big Picture Chapter 3.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Fuzzy Relations( 關係 ), Fuzzy Graphs( 圖 形 ), and Fuzzy Arithmetic( 運算 ) Chapter 4.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
P & NP.
Lecture 7: Constrained Conditional Models
N-Gram Model Formulas Word sequences Chain rule of probability
Building Finite-State Machines
Finite-State and the Noisy Channel
Support Vector Machines
More on showing L non-regular
Presentation transcript:

/425 Declarative Methods - J. Eisner1 Constraints on Strings

/425 Declarative Methods - J. Eisner 2 What’s a constraint, again? X= … X Y unary binary A set of allowed values A set of allowed value pairs Infinite sets? Sure … Infinite subsets of (pairs of) integers, reals, … How about soft constraints?

/425 Declarative Methods - J. Eisner 3 What’s a constraint on strings? Hard constraint:  Does string S match pattern P? (Is it in the set?)  A description of a set of strings  Like a constraint … how? S is a variable whose domain is set of all strings! So P can be regarded as a unary constraint: let’s write P(S). Soft constraint:  How well does string S fit pattern P?  A function mapping each string to a score / weight / cost.  Like a soft constraint …

/425 Declarative Methods - J. Eisner 4 What is a pattern? What operations would you expect for combining these string constraints? If P is a pattern, then so is ~P  ~P matches exactly the strings that P doesn’t If P and Q are both patterns, then so is P & Q If P and Q are both patterns, then so is P | Q Wow, we can build up boolean formulas! Does this allow us to encode SAT? How?

/425 Declarative Methods - J. Eisner 5 More about the relation to constraints By building complicated patterns from simple ones, we are building up complicated constraints! That is also allowed in ECLiPSe:  alldiff3(X,Y,Z) :- X #\= Y, Y \#= Z, X \#= Z.  between(X,Y,Z) :- X #< Y, Y #< Z. % either this  between(X,Y,Z) :- X #> Y, Y #> Z. %... or this  Now we can use “alldiff3” and “between” as new constraints Hang on, patterns are only unary constraints. Generalize? between(X,Y,Z) :- (X # Y, Y ># Z).

/425 Declarative Methods - J. Eisner 6 What is a pattern? Binary constraint (relation):  What are all the possible translations of string S?  A description of a set of string pairs (S,T)  Like a binary constraint: let’s write P(S,T)  We can also do n-ary constraints more generally, but most current solvers don’t allow them   Fuzzy case: How strongly is string S related to each T? Which one is it most strongly related to? Ok, so what’s new here? Why does it matter that they’re string variables?

/425 Declarative Methods - J. Eisner 7 Some Pattern Operators ~ complementation~P & intersectionP & Q | unionP | Q concatenationPQ * iteration (0 or more)P* + iteration (1 or more)P+ - differenceP - Q \ char complement\P (equiv. to ?-P) Which of these can be treated as syntactic sugar? That is, which of these can we get rid of?

/425 Declarative Methods - J. Eisner 8 More Pattern Operators.x. crossproductP.x. Q.o. compositionP.o. Q.u upper (input) languageP.u “domain”.l. lower (output) languageP.l “range”

/425 Declarative Methods - J. Eisner 9 The language of “regular expressions” A variable S has infinitely many possible values if its type is “string” or “real”  So to specify a constraint on S, not enuf to list possible values  Language for simple constraints on reals: linear equations  Language for simple constraints on strings: regular expressions Regular expression language  You probably know the standard form of regular expressions Standard regexp is a unary constraint (“X must match a*b(c|d)*”) Basic operators: union “|”, concatenation, closure “*”  But the language has been extended in various ways: soft constraints (specifies costs) binary constraints (over pairs of string variables) n-ary constraints (over n string variables)

/425 Declarative Methods - J. Eisner 10 Regular expressions  finite-state automata 1. Given a regexp that specifies a constraint, you can build an FSA that efficiently determines whether a given string satisfies the constraint. 2. Given an FSA, you can find an equivalent regexp. So the “compiled” form of the little language can be converted back to the source form. Conclusion: Anything you can do with regexps, you can do with FSAs, and vice-versa.

/425 Declarative Methods - J. Eisner 11 Given a regular expression … 1. Make a parse tree for it 2. Build up the FSA from the bottom up Example: (ab|c)*(bb*a) ab c concat union closure b b a concat closure concat

/425 Declarative Methods - J. Eisner 12 Concatenation (of soft constraints) = example thanks to M. Mohri

/425 Declarative Methods - J. Eisner 13 + Union = example thanks to M. Mohri

/425 Declarative Methods - J. Eisner 14 + Union = example thanks to M. Mohri eps/0 eps/0.3 eps/0.8

/425 Declarative Methods - J. Eisner 15 Closure (also illustrates binary constraints) = * why add new start state 4? why not just make state 0 final? example thanks to M. Mohri

/425 Declarative Methods - J. Eisner 16 Complementation M represents a constraint on strings We’d like to represent ~M (i.e., a constraint that says that the string must not be accepted by M) Just change M’s final states to non-final and vice-versa Only works if every string takes you to exactly one state in M (final or non-final). So M must be both deterministic and complete. Any M can be put in this form. example thanks to M. Mohri

/425 Declarative Methods - J. Eisner 17 Intersection fat/ /0.8 pig/0.3eats/0 sleeps/0.6 fat/ /0.5 eats/0.6 sleeps/1.3 pig/0.4& 0,00,0 fat/0.7 0,10,1 1,11,1 pig/0.7 2,0/0.8 2,2/1.3 eats/0.6 sleeps/1.9 = example adapted from M. Mohri

/425 Declarative Methods - J. Eisner 18 Intersection fat/ /0.8 pig/0.3eats/0 sleeps/0.6 0,00,0 fat/0.7 0,10,1 1,11,1 pig/0.7 2,0/0.8 2,2/1.3 eats/0.6 sleeps/1.9 = fat/ /0.5 eats/0.6 sleeps/1.3 pig/0.4& Paths 0012 and 0110 both accept fat pig eats So must the new machine: along path 0,0 0,1 1,1 2,0 example adapted from M. Mohri

/425 Declarative Methods - J. Eisner 19 fat/0.5 fat/0.2 Intersection 10 2/ /0.8 pig/0.3eats/0 sleeps/0.6 eats/0.6 sleeps/1.3 pig/0.4 0,00,0 fat/0.7 0,10,1 = & Paths 00 and 01 both accept fat So must the new machine: along path 0,0 0,1

/425 Declarative Methods - J. Eisner 20 pig/0.3 pig/0.4 Intersection fat/ /0.8 eats/0 sleeps/0.6 fat/ /0.5 eats/0.6 sleeps/1.3 0,00,0 fat/0.7 0,10,1 pig/0.7 1,11,1 = & Paths 00 and 11 both accept pig So must the new machine: along path 0,1 1,1

/425 Declarative Methods - J. Eisner 21 sleeps/0.6 sleeps/1.3 Intersection fat/ /0.8 pig/0.3eats/0 fat/ eats/0.6 pig/0.4 0,00,0 fat/0.7 0,10,1 1,11,1 pig/0.7 sleeps/1.9 2,2/1.3 2/0.5 = & Paths 12 and 12 both accept fat So must the new machine: along path 1,1 2,2

/425 Declarative Methods - J. Eisner 22 eats/0.6 eats/0 sleeps/0.6 sleeps/1.3 Intersection fat/ /0.8 pig/0.3 fat/ pig/0.4 0,00,0 fat/0.7 0,10,1 1,11,1 pig/0.7 sleeps/1.9 2/0.5 2,2/0.8 eats/0.6 2,0/1.3 = &

/425 Declarative Methods - J. Eisner 23 Intersection Why is intersection guaranteed to terminate? How big a machine might be produced by intersection?

/425 Declarative Methods - J. Eisner 24 Given a regular expression … 1. Make a parse tree for it 2. Build up the FSA from the bottom up Example: (ab|c)*(bb*a) ab c concat union closure b b a concat closure concat

/425 Declarative Methods - J. Eisner 25 Given an FSA … Find a regular expression describing all paths from initial state 1 to final state Paths from 1 to 5: e 12 ((e 23 e 33 * e 35 ) | e 24 e 45 ) 5 >

/425 Declarative Methods - J. Eisner 26 Paths from 1 to 5: e 12 ((e 23 e 33 * e 35 ) | e 24 e 45 ) Given an FSA … Find a regular expression describing all paths from initial state 1 to final state Paths from 1 to 5: e 12 ((e 23 e 33 * (e 35 | e 34 e 45 )) | e 24 e 45 ) 5 >

/425 Declarative Methods - J. Eisner 27 Paths from 1 to 5: e 12 ((e 23 e 33 * (e 35 | e 34 e 45 )) | e 24 e 45 ) Given an FSA … Find a regular expression describing all paths from initial state 1 to final state Paths from 1 to 5: e 12 ( (e 23 (e 33 | e 34 e 43 )* (e 35 | e 34 e 45 )) | (e 24 (e 43 e 33 * e 34 )* (e 45 | e 43 e 35 ))) 5 >

/425 Declarative Methods - J. Eisner 28 Given an FSA … Find a regular expression describing all paths from initial state 1 to final state Paths from 1 to 5: ??? 5 >

/425 Declarative Methods - J. Eisner 29 Does there exist any path from initial state 1 to final state 5? Let’s do a simpler variant first … > If there’s a way to get from 1 to 3 and from 3 to 5, then there's a way to get from 1 to 5. slide thanks to R. Tamassia & M. Goodrich (modified) More generally, transitive closure problem: For each A, B, does there exist any path from A to B?

/425 Declarative Methods - J. Eisner 30 If there’s a way to get from 1 to 3 and from 3 to 5, then there's a way to get from 1 to 5. Does there exist any path from initial state 1 to final state 5? Let’s do a simpler variant first … Hmm … should I look for a 1  3 path first in hopes of using it to build a 1  5 path? Or vice-versa? More generally, transitive closure problem: For each A, B, does there exist any path from A to B? > > >

/425 Declarative Methods - J. Eisner 31 If there’s a way to get from 1 to 3 and from 3 to 5, then there's a way to get from 1 to 5. Let’s do a simpler variant first … Hmm … should I look for a 1  3 path first in hopes of using it to build a 1  5 path? Or vice-versa? > > Option #1: Gradually build up longer paths (length-1, length-2, length-3 …)  How do we deal with cycles? Option #2 (less obvious): Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Both have O(n 3 ) runtime. But option #2 allows more flexible handling of cycles. We’ll need that when we return to our FSA problem.

/425 Declarative Methods - J. Eisner 32 If there’s a way to get from 1 to 3 and from 3 to 5, then there's a way to get from 1 to 5. Floyd-Warshall transitive closure algorithm Hmm … should I look for a 1  3 path first in hopes of using it to build a 1  5 path? Or vice-versa? > Option #2 (less obvious): Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. What are the paths of order 0? What are the paths of order 1? What are the paths of order 2? How big can a path’s order be? What are the paths of order 5?

/425 Declarative Methods - J. Eisner 33 If there’s a way to get from 1 to 3 and from 3 to 5, then there's a way to get from 1 to 5. Floyd-Warshall transitive closure algorithm Option #2 (less obvious): Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: p k ij = true iff there is an i  j path of order  k. 1. Define p 0 : For each i,j, set p 0 ij = true iff there is an i  j edge. 2. For k=1, 2, …n, define p k : >

/425 Declarative Methods - J. Eisner 34 If there’s a way to get from 1 to 3 and from 3 to 5, then there's a way to get from 1 to 5. Floyd-Warshall transitive closure algorithm Option #2 (less obvious): Gradually allow paths of higher and higher order, where a path’s order is the number of the highest vertex that the path goes through. Definition: p k ij = true iff there is an i  j path of order  k. 1. Define p 0 : For each i,j, set p 0 ij = true iff there is an i  j edge. 2. For k=1, 2, …n, define p k :  For each i,j, set p ij k = p ij k-1 v (p ik k-1 ^ p kj k-1 ) 3. return p n (e.g., what is p n 1n ?) k j i Uses only vertices numbered 1,…,k-1 Uses only vertices numbered 1,…,k-1 New: but still uses only vertices numbered 1,…,k parts of slide thanks to R. Tamassia & M. Goodrich

/425 Declarative Methods - J. Eisner 35 Floyd-Warshall Example v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified)

/425 Declarative Methods - J. Eisner 36 Floyd-Warshall: k=1 (computes p 1 from p 0 ) v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified)

/425 Declarative Methods - J. Eisner 37 Floyd-Warshall: k=2 (computes p 2 from p 1 ) v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified)

/425 Declarative Methods - J. Eisner 38 v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified) Floyd-Warshall: k=3 (computes p 3 from p 2 )

/425 Declarative Methods - J. Eisner 39 v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified) Floyd-Warshall: k=4 (computes p 4 from p 3 )

/425 Declarative Methods - J. Eisner 40 v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified) Floyd-Warshall: k=5 (computes p 5 from p 4 )

/425 Declarative Methods - J. Eisner 41 v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified) Floyd-Warshall: k=6 (computes p 6 from p 5 )

/425 Declarative Methods - J. Eisner 42 v 2 v 1 v 3 v 4 v 5 v 6 slide thanks to R. Tamassia & M. Goodrich (modified) Floyd-Warshall: k=7 (computes p 7 from p 6 )

/425 Declarative Methods - J. Eisner 43 Paths from 1 to 5: e 12 ((e 23 e 33 * (e 35 | e 34 e 45 )) | e 24 e 45 ) Regular expression version (Kleene/Tarjan) Find a regular expression describing all paths from initial state 1 to final state Paths from 1 to 5: e 12 ( (e 23 (e 33 | e 34 e 43 )* (e 35 | e 34 e 45 )) | (e 24 (e 43 e 33 * e 34 )* (e 45 | e 43 e 35 ))) 5 >

/425 Declarative Methods - J. Eisner 44 Regular expression version (Kleene/Tarjan) Find a regular expression describing all paths from initial state 1 to final state Paths from 1 to 5: ??? 5 >

/425 Declarative Methods - J. Eisner 45 If there’s a way to get from 1 to 3 and from 3 to 5, then there's a way to get from 1 to 5. Regular expression version (Kleene/Tarjan) Definition: p k ij = regular expression describing all i  j paths that have order  k. 1. Define p 0 : For each i,j, set p 0 ij = e ij if that edge exists, else . 2. For k=1, 2, …n, define p k :  For each i,j, set p ij k = p ij k-1 | (p ik k-1 p kk k-1 * p kj k-1 ) (a regexp using all three of union, concat, closure!) 3. return p n (e.g., what is p n 1n ?) k j i Uses only vertices numbered 1,…,k-1 Uses only vertices numbered 1,…,k-1 New: but still uses only vertices numbered 1,…,k parts of slide thanks to R. Tamassia & M. Goodrich

/425 Declarative Methods - J. Eisner 46 Paths from 1 to 5: e 12 ((e 23 e 33 * (e 35 | e 34 e 45 )) | e 24 e 45 ) Regular expression version (Kleene/Tarjan) What if the arcs have labels? Paths from 1 to 5: e 12 ( (e 23 (e 33 | e 34 e 43 )* (e 35 | e 34 e 45 )) | (e 24 (e 43 e 33 * e 34 )* (e 45 | e 43 e 35 ))) 5 > a a b c b  aa 

/425 Declarative Methods - J. Eisner 47 Paths from 1 to 5: e 12 ((e 23 e 33 * (e 35 | e 34 e 45 )) | e 24 e 45 ) Regular expression version (Kleene/Tarjan) What if the arcs have labels? Just substitute them in: Paths from 1 to 5: e 12 ( (e 23 (e 33 | e 34 e 43 )* (e 35 | e 34 e 45 )) | (e 24 (e 43 e 33 * e 34 )* (e 45 | e 43 e 35 ))) 5 > a b c b  aa a  a bca  b b a  c a  

/425 Declarative Methods - J. Eisner 48 Regular languages as points in a high- dimensional space abc  abc abc:2  2abc (weighted) ab|ac  ab + ac a(b|c)  ab + ac a(b|(c:2))  ab + 2ac ab*c  ac + abc + abbc + abbbc + … a(b:2)*c  ac + 2abc + 4abbc +8abbbc + … Instead of dimensions x 2, y 2, xy, etc., every possible string is a dimension and its coefficient is the coordinate (often 0)

/425 Declarative Methods - J. Eisner 49 Suppose P, Q are two regular languages represented as these “formal power series.” What is the sum P+Q?  Union!  We double-count … What is the product PQ?  Concatenation! What is the Hadamard product P  Q?  (i.e., the dot product before you sum: x  y = (x 1 y 1, x 2 y 2, …))  Intersection! What is 1/(1-P)?  * closure! Could we use these techniques to classify strings using kernel SVMs? Regular languages as points in a high- dimensional space

/425 Declarative Methods - J. Eisner 50 Function from strings to... a:x/.5 c:z/.7  :y/.5.3 Acceptors (FSAs)Transducers (FSTs) a:x c:z :y:y a c  Unweighted Weighted a/.5 c/.7  /.5.3 {false, true}strings numbers(string, num) pairs

/425 Declarative Methods - J. Eisner 51 Sample functions Unweighted Weighted {false, true}strings numbers(string, num) pairs Grammatical? How grammatical? Better, how likely? Markup Correction Translation Good markups Good corrections Good translations Acceptors (FSAs)Transducers (FSTs)

/425 Declarative Methods - J. Eisner 52 Sample data, encoded same way Unweighted Weighted {false, true}strings numbers(string, num) pairs Input string Corpus Dictionary Input lattice Reweighted corpus Weighted dictionary Bilingual corpus Bilingual lexicon Database (WordNet) Prob. bilingual lexicon Weighted database Acceptors (FSAs)Transducers (FSTs) ba nana aid d

/425 Declarative Methods - J. Eisner 53 Some Applications  Prediction, classification, generation of text  More generally, “filling in the blanks” (probabilistic reconstruction of hidden data)  Speech recognition  Machine translation, OCR, other noisy-channel models  Sequence alignment / Pdit distance / Computational biology  Text normalization, segmentation, categorization  Information extraction  Stochastic phonology/morphology, including lexicon  Tagging, chunking, finite-state parsing  Syntactic transformations (smoothing PCFG rulesets)

/425 Declarative Methods - J. Eisner 54 Finite-state “programming” Object code compiler Function Source code programmer Finite-state machine regexp compiler Better object code optimizer Better object code determinization, minimization, pruning Function on strings Regular expression programmer c a  a?c* Programming LangsFinite-State World

/425 Declarative Methods - J. Eisner 55 Finite-state “programming” Function composition FST/WFST composition Function inversion (available in Prolog) FST inversion Higher-order functions... Finite-state operators... Small modular cooperating functions (structured programming) Small modular regexps, combined via operators Programming LangsFinite-State World

/425 Declarative Methods - J. Eisner 56 Finite-state “programming” Programming LangsFinite-State World More features you wish other languages had!

/425 Declarative Methods - J. Eisner 57 p(x) = Finite-State Operations  Projection GIVPS YOU marginal distribution domain( p(x,y) ) p(y) = range( p(x,y) ) a : b / 0.3

/425 Declarative Methods - J. Eisner p(x) q(x) = Finite-State Operations  Probabilistic union GIVPS YOU mixture model p(x) q(x) p(x) q(x)

/425 Declarative Methods - J. Eisner 59  p(x) + (1-  )q(x) = Finite-State Operations  Probabilistic union GIVPS YOU mixture model p(x) ++ q(x) p(x) q(x)  1-  Learn the mixture parameter  !

/425 Declarative Methods - J. Eisner 60 p(x|z) = Finite-State Operations  Composition GIVPS YOU chain rule p(x|y) o p(y|z) p(x,z) = o z p(x|y) o p(y|z)  The most popular statistical FSM operation  Cross-product construction

/425 Declarative Methods - J. Eisner 61 Finite-State Operations  Concatenation, probabilistic closure HANDLP unsegmented text p(x)q(x) p(x) q(x) * p(x)  Just glue together machines for the different segments, and let them figure out how to align with the text

/425 Declarative Methods - J. Eisner 62 Finite-State Operations  Directed replacement MODPLS noise or postprocessing p(x, noisy y) = p(x,y) o  Resulting machine compensates for noise or postprocessing D noise model defined by dir. replacement

/425 Declarative Methods - J. Eisner 63 p(x)*q(x) = Finite-State Operations  Intersection GIVPS YOU product models  e.g., exponential / maxent, perceptron, Naïve Bayes, … p(x) & q(x) p NB (y | x)  & p(y) p(A(x)|y) & p(B(x)|y) &  Cross-product construction (like composition)  Need a normalization op too – computes  x f(x) “pathsum” or “partition function”

/425 Declarative Methods - J. Eisner 64 Finite-State Operations  Conditionalization (new operation) p(y | x) = condit( p(x,y) )  Construction: reciprocal(determinize(domain( ))) o p(x,y) not possible for all weighted FSAs  Resulting machine can be composed with other distributions: p(y | x) * q(x)

/425 Declarative Methods - J. Eisner 65 Other Useful Finite-State Constructions  Complete graphs YIPLD n-gram models  Other graphs YIPLD fancy language models (skips, caching, etc.)  Compilation from other formalism  FSM:  Wordlist (cf. trie), pronunciation dictionary...  Speech hypothesis lattice  Decision tree (Sproat & Riley)  Weighted rewrite rules (Mohri & Sproat)  TBL or probabilistic TBL (Roche & Schabes)  PCFG (approximation!) (e.g., Mohri & Nederhof)  Optimality theory grammars (e.g., Pisner)  Logical description of set (Vaillette; Klarlund)

/425 Declarative Methods - J. Eisner 66 Object code compiler Function Source code programmer Finite-state machine regexp compiler Better object code optimizer Better object code determinization, minimization, pruning Function on strings Regular expression programmer c a  a?c* Programming LangsFinite-State World Regular Expression Calculus as a Programming Language

/425 Declarative Methods - J. Eisner 67 Regular Expression Calculus as a Modelling Language  Oops! Statistical FSMs still done “in assembly language”!  Build machines by manipulating arcs and states  For training,  get the weights by some exogenous procedure and patch them onto arcs  you may need extra training data for this  you may need to devise and implement a new variant of PM  Would rather build models declaratively ((a*.7 b) +.5 (ab*.6 ))  repl.9 ((a:(b +.3  ))*,L,R)

/425 Declarative Methods - J. Eisner 68 A Simple Example: Segmentation tapirseatgrass  tapirs eat grass? tapir seat grass?... Strategy: build a finite-state model of p(spaced text, spaceless text) Then maximize p(???, tapirseatgrass) Start with a distribution p(English word)  a machine D (for dictionary) Construct p(spaced text)  (D space)* 0.99 D Compose with p(spaceless | spaced)  ((  space)+(space:  ))*

/425 Declarative Methods - J. Eisner 69 train on spaced or spaceless text Strategy: build a finite-state model of p(spaced text, spaceless text) Then maximize p(???, tapirseatgrass) Start with a distribution p(Pnglish word)  a machine D (for dictionary) Construct p(spaced text)  (D space)* 0.99 D Compose with p(spaceless | spaced)  ((  space)+(space:  ))* A Simple Example: Segmentation D should include novel words: D = KnownWord (Letter* 0.85 Suffix) Could improve to consider letter n-grams, morphology... Noisy channel could do more than just delete spaces: Vowel deletion (Semitic); OCR garbling ( cl  d, ri  n, rn  m...)

/425 Declarative Methods - J. Eisner 70