Composition is Our Friend

Composition is Our Friend
Wednesday PM Kenneth R. Beesley Xerox Research Centre Europe

View composition vertically
p a t + i n + a d + i m + a b Underlying form e -> i || _ .#. Rule 1 p a t + i n + a d + i m + a b Intermediate form d -> j, t -> c || _ (“+”) i Rule 2 p a c + i n + a j + i m + a b Intermediate form b -> p, d -> t, g -> k || _ .#. Rule n p a c + i n + a j + i m + a p Final form

View composition vertically
p a t + i n + a d + i m + a b e -> i || _ .#. .o. A Single FST d -> j, t -> c || _ (“+”) i .o. b -> p, d -> t, g -> k || _ .#. p a c + i n + a j + i m + a p

Composition is Our Friend
The composition operation is often the key to building, modifying, filtering and testing finite-state systems.

You Can Compose Transducers
Regular languages (and the networks that encode them) can be unioned, concatenated, intersected, subtracted and complemented. Regular relations (and the transducers that encode them) can be unioned and concatenated. But you cannot, in general, intersect, complement, or subtract transducers (relations). This is a mathematical restriction. Relations are not closed for these operations. But you can compose transducers

An Example for the Mathematicians
Regular relations are not closed under intersection (&), subtraction (-) or complementation (&). This means that when you intersect, subtract or complement regular relations, the result may no longer be regular. I.e. the result may no longer be finite state, and so cannot be encoded as a finite-state network. The following example is based on intersection

Intersection of Two Finite-State Relations
FST A: [ a:b ]* [ 0:c ]* FST B: [ 0:b ]* [ a:c ]* a:b 0:c 0:b a:c 0:c a:c On the upper side, some number n of as On the lower side, n bs, followed by any number of cs On the upper side, some number n of as On the lower side, any number of bs, followed by n cs

Attempted Intersection of Two Finite-State Relations (FSTs)
00 0:b a:c a:b 0:c a bc a:c 0:c aa bbcc And the bncn language is known to be context-free in power (i.e. beyond finite-state power). The lower-side language of the resulting relation is bncn aaa bbbccc aaaa bbbbcccc

Back Down to Earth Just be aware that transducers cannot, in general, be intersected, subtracted, or complemented. But transducers can be unioned, concatenated, and composed. Composition is often the key operation for modifying, filtering, and combining transducers.

Phonological/Orthographical Rules
Lexicon FST (lexc) “Application” of rules via composition is already familiar to us. .o. Rule 1 .o. Rule 2 .o. Rule n

Orthographical Modification via Composition
Standard German spelling uses ü, ö, ä and ß. An alternative orthography, where these letters are not available, replaces them with “ue”, “oe”, “ae” and “ss” respectively. läßt StandardGermanFST with ü, ö , ä and ß on the lower side ModifiedGermanFST with ue, oe, ae and ss on the lower side .o. [ ü -> u e , ö -> o e , ä -> a e, ß -> s s ] laesst How would we modify StandardGermanFST to analyze both über and ueber, läßt and laesst and laeßt and lässt?

Composition: top and bottom
If you compose a rule on the bottom of an FST, it modifies only the lower-side language of the FST. CoreFST .o. Rule CoreFST .o. Rule If you compose a rule on the top of an FST, it modifies only the upper-side language of the FST. Rule .o. Rule .o. CoreFST CoreFST

Change a Tagname on the Upper Side via Composition
An example of composition on the upper side ... casa[Subst][Masc][Pl] “[Subst]” <- “[Noun]” .o. casa[Noun][Masc][Pl] Baseform+Tags language Core Lexicon casas surface-word language

Simple Filtering to Facilitate Testing
Take a “lexical transducer”, remove everything but adjectives. When a simple language is used in composition, it is automatically treated like an identity relation. $“[Adj]” .o. Baseform+Tags language Core Lexicon surface-word language

Simple Filtering II Take a lexical transducer and remove the adjectives (leave the rest). ~$“[Adj]” .o. Baseform+Tags language Core Lexicon surface-word language

Simple Filtering III Take an English lexical transducer and restrict it to contain Only adjectives that end in -ly $”[Adj]” .o. Baseform+Tags language Core Lexicon friendly, lovely, cowardly, dastardly, … surface-word language .o. ?* l y

Mindtuning for Finite-State Development
Try to imagine all the possible uses/users of your system. Try to create a core system that may, by itself, serve nobody; but which, via filtering, may serve in multiple systems.. If it seems that you have to decide between choice A and choice B, try to create a single core system, with one set of source files, that supports both A and B Language dialects Spelling dialects Spelling relaxations

Language Dialects: equivalent ways to start
Multichar_Symbols Â ^B +Sg +Pl LEXICON Root Nouns ; LEXICON Nouns jailÂ:jail N ; gaol^B:gaol N ; dog N ; LEXICON N +Sg:0 # ; +Pl:s # ; LEXICON Root Nouns ; LEXICON Nouns < j a i l %Â:0 > N ; <g a o l %^B:0 > N ; dog N ; LEXICON N < %+Sg:0 > # ; < %+Pl:s > # ;

One Core, Several Final Products
To leave both American and British words in the lexicon, just remove the dialect tags, mapping them to the empty string. 0 <- %^A .o. 0 <- %^B CommonCoreFST

One Core, Several Products
To leave just British (and common) words in the lexicon, filter out the exclusively American words. Two equivalent ways: 0 <- %^B .o. ~$[%^A] CommonCoreFST 0 <- %^B .o. ~[?*] <- %^A CommonCoreFST

One Core, Several Products
To leave just American (and common) words in the lexicon, filter out the exclusively British words. Two equivalent ways: 0 <- %^A .o. ~$[%^B] CommonCoreFST 0 <- %^A .o. ~[?*] <- %^B CommonCoreFST

Vulgar/Slang/Substandard
Use similar feature symbols on the lexical side, e.g. ^V for vulgar words ^S for slang ^D for substandard forms Then filter them out as necessary, via composition, for each version of the final product.

Spelling Distinctions
If one dialect makes a spelling distinction, and another ignores it, build your core system to show the distinction. lingüístico Adj ; This is the Spanish spelling used in Latin America. Then for Spain, where the ü is not used, modify the core trivially via composition on both sides: u <- ü .o. CommonCoreFST .o. ü -> u

Spelling Relaxations, Accentuation
Build your core system to reflect formally correct spelling. Then relax that spelling in some versions of your system via composition, e.g. to allow accents to be “dropped”. StandardSpanishFST .o. [ é (->) e , í (->) i , á (->) a , ó (->) o , ú (->) u , ü (->) u ]

Relaxed German, accept ü or ue
Standard German spelling uses ü, ö, ä and ß. You might want to accept them AND also ue, oe, ae and ss. StandardGermanFST .o. [ ü (->) u e , ö (->) o e , ä (->) a e , ß (->) s s ]

Summary: About Choices
When it appears that you have to make a choice (dialect, orthography, register, etc.) between A and B, always try to make a common “core” system that is the basis for Choice A alone Choice B alone Choice A and B Composition is often the key to modifying a common core system for a variety of uses. The failure to abstract and generalize is a sign of a finite-state beginner.

Composition is Our Friend

Similar presentations

Presentation on theme: "Composition is Our Friend"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Composition is Our Friend

Similar presentations

Presentation on theme: "Composition is Our Friend"— Presentation transcript:

Similar presentations

About project

Feedback