Download presentation
Presentation is loading. Please wait.
Published byHarold Willis Modified over 8 years ago
1
Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014
2
Coordination Every attribute can only have one value So what do we do with coordinated constituents? Example: gorillas sleep and eat VP --> { … | VP: ! $ ^ CONJ VP: ! $ ^ }.
3
Coordination (cont’d) Coordination can happen basically at any level of the c-structure. Example: the gorillas peel and eat the bananas V --> { … | V: ! $ ^ CONJ V: ! $ ^ }.
4
Coordination (cont’d) Basically any category can be coordinated. Example: the gorillas eat the bananas in the cage and in the garden PP --> { … | PP: ! $ ^ CONJ PP: ! $ ^ }.
5
Coordination (cont’d) How can we capture these generalizations? Via regular-expression macros! SC-COORD(CAT) = CAT: ! $ ^; CONJ CAT: ! $ ^. PP --> {... | @(SC-COORD PP) }.
6
Nominal coordination NP, N, etc. coordination is special because the NUM attribute should typically have the value pl even when the individual set members are in the sg. Examples:Mary and the gorilla like bananas. The boys and girls like bananas.
7
Nominal coordination (cont’d) NP-COORD(CAT) = CAT: ! $ ^; CONJ: ^ = ! (^ NUM) = pl; CAT: ! $ ^. NP --> {... | @(NP-COORD NP) }. N --> {... | @(NP-COORD N) }.
8
METARULEMACRO Macros are nice But can‘t we do better? After all, it‘s pretty tedious to go into almost all rules and invoke either the SC-COORD or the NP-COORD macro XLE has a special macro called the METARULEMACRO Every rule goes through the METARULEMACRO unless specified otherwise
9
METARULEMACRO (cont’d) Takes three arguments: _CAT, _BASECAT, and _RHS _CAT is the category on the left-hand side of the rule _BASECAT is the same as _CAT unless you are dealing with a complex-category rule _RHS is the right-hand side of the rule
10
METARULEMACRO (cont’d) METARULEMACRO(_CAT _BASECAT _RHS)= { _RHS | e: _CAT $ { N NP }; @(NP-COORD _CAT) | e: _CAT ~$ { N NP }; @(SC-COORD _CAT) }.
11
Interfacing finite-state transducers Maintaining a full-form lexicon is tedious Many lexicon entries look alike Is there a way to get the information about the category of a word from somewhere, ideally along with information about morphosyntactic categories such as tense, mood, case, number, person, etc? Finite-state morphologies!
12
Interfacing finite-state transducers Cascade of finite-state transducers used is specified in MORPHOLOGY section At least two subsections: –TOKENIZE –ANALYZE By default, the transducers listed are used both for parsing and for generation This behavior can be altered by prefixing the names of transducer files with P! or G!
13
Tokenization So far, only white spaces are considered as token boundaries However, there are more kinds of token boundaries in real-word text –Punctuation has to be split off the preceding token –Some white spaces should not be treated as token boundaries, e.g. “Sri Lanka” –Upper-case letters at sentence beginnings should optionally be lower-cased A finite-state tokenizer takes care of these things
14
Finite-state morphologies Map surface forms to canonical form (lemma) and series of morphological tags Examples: rode ride +Verb +PastTense +123P rides ride +Verb +Pres +3sg ride +Noun +Pl children child +Noun +Pl
15
Interfacing Finite-state Morphology Morphological tags need to be listed in the lexicon –Sublexical lexicon entries look like regular lexicon entries –Difference: morphcode xle instead of * Lemmas with non-predictable subcategorization frames must be listed in the lexicon Other lemmas can be dealt with by the -unknown entry
16
Lexicon entries for morphology output +Verb V-POS XLE. +Pres TNS XLE @VPRES. +3sg PERS XLE @S-AGR. wait V-S XLE (^ PRED)= ‘wait ’. -unknown A-S XLE @(PRED %stem); N-S XLE @(PRED %stem).
17
Interfacing Finite-state Morphology Morphology output needs to be parsed by sublexical rules –Look like regular rules –Have f-annotations like regular rules –Difference: Sublexical categories are marked with the suffix _BASE
18
Interfacing Finite-state Morphology V --> V-S_BASE V-POS_BASE { TNS_BASE PERS_BASE | ASP_BASE }.
19
XLE Lookup Model Only one entry per headword per lexicon section Same headword may be covered by an explicit entry and by -unknown entry In order to allow this, we need to mark the explicit entry with ; ETC sleep V-S @(INTRANS sleep); ETC. -unknown N-S @(PRED %stem).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.