Dekai Wu Presented by David Goss-Grubbs Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora Dekai Wu Presented by David Goss-Grubbs
Inversion Transduction Grammars What they are How they work What they can do
What an ITG is A formalism for modeling bilingual sentence pairs A transducer: it defines a relation between sentences Conceived as generating pairs of sentences rather than translating
Grammar Scope Not intended to relate a sentence to all and only its correct translations Used to extract useful information from parallel corpora They will overgenerate wildly.
How ITGs work A subset of ‘context-free syntax-directed transduction grammars’. A simple transduction grammar is just a CFG whose terminals are pairs of symbols (or singletons) In an ITG the order of the constituents in one language may be the reverse of the other language, for any given rule.
Notation The right-hand side of a rule is in square brackets [ ] when the order is the same in both languages Angle brackets are used when the order is reversed.
Yoda Speak S SubjAux VP SubjAux [NP Aux] VP ‘begun’ / ‘begunY’ NP ‘the-clone-war’ / ‘the-clone-warY’ Aux ‘has’ / ‘hasY’ ‘the-clone-war has begun’ ‘begunY the-clone-warY hasY’
Yoda Speak S SubjAux VP NP Aux begun The clone war has
Normal Form For any ITG there is a weakly equivalent grammar in the normal form All right-hand sides are either: Terminal couples Terminal singletons Pairs of non-terminals with straight orientation Pairs of non-terminals with inverted orientation
Expressiveness Limits Not every way of matching is possible ‘Obi (has) not won victory’ cannot be matched with ‘notY victoryY ObiY wonY’. This is a good thing: We only have to consider a subset of the possible matchings The percentage of possibilities eliminated increases rapidly as the number of tokens increases
Expressiveness Limits
Stochastic ITGs A probability can be added to each rewrite rule. The probabilities of all the rules with a given left hand side must sum to 1. An SITG will give the most probable matching parse for a sentence pair.
Parsing with an SITG: the chart Five dimensions to the chart: start and stop indices for sentence 1; start and stop indices for sentence 2; nonterminal category. Each cell stores the probability of the most likely parse covering the appropriate substrings, rooted in the appropriate category.
Parsing with an SITG: the algorithm Initialize the cells corresponding to terminals using a translation lexicon For the other cells, find the most probable way of getting that category. Compute the probability by multiplying the probability of the rule by the probabilities of both the constituents Store that probability plus the orientation of the rule
What you can do with an ITG Segmentation Bracketing Alignment Bilingual Constraint Transfer
Segmentation Several words might go together to make a single lexical entry Several characters might go together to make a single word “sandwiches there” vs “sand which is there” A segmentation may make sense in the monolingual case, but not in the bilingual case
Segmentation Change the parsing algorithm: Allow the initialization step to find strings of any length in the translation lexicon. The recursive step stores the most probable way of creating a constituent, whether it came from the lexicon or from rules.
Bracketing How to assign structure to a sentence with no grammar available? Get a parallel corpus pairing it with some other language Get a reasonable translation dictionary Parse it with a bracketing transduction grammar
Bracketing Transduction Grammars A minimal, generic ITG Just one nonterminal A [A A], A A, terminal couples, singletons The important probabilities are on the rules that rewrite to terminal couples, taken from the translation lexicon.
Bracketing Transduction Grammars Given lexical translation probabilities, only a subset of otherwise possible bracketings are available. ‘Obi not won victory’ ‘not victory Obi won’ is a limiting case – no bracketings ‘the-clone-war has begun’ has two bracketings When paired with ‘begun the-clone-war has’, it has just one bracketing
Bracketing with Singletons Singletons don’t help in bracketing. Depending on the language, Wu uniformly attaches them to the left or to the right. They are then ‘pushed down’ using yield-preserving transformations. e.g. [ x A B ] = x A B
Avoiding Arbitrary Choices Some arrangements leave us with an arbitrary choice ‘create a-perimeter around-the-survivors’ ‘around-the-survivors a-perimeter create’ A B C and A B C both work. Use a more complex bracketing grammar that makes such things left-branching Fix it in post-processing to be A B C
Bracketing Experiment They tested on 2000 English-Chinese sentence pairs They rejected sentence pairs that weren’t adequately covered by the translation lexicon. 80% bracket precision for English, 78% bracket precision for Chinese
Alignment Alignments (phrasal or lexical) are a natural byproduct of bilingual parsing. Unlike ‘parse-parse-match’ methods, this Doesn’t require a fancy grammar for both languages Guarantees compatibility between parses Has a principled way of choosing between possible alignments Provides a more reasonable ‘distortion penalty’.
Bilingual Constraint Transfer A high-quality parse for one language can be leveraged to get structure for the other. Alter the parsing algorithm to only allow constituents that match the parse that already exists for the well-known language. This works for any sort of constraint supplied for the well-known language.