Translating Collocations for Bilingual Lexicons Collocations (idiomatic multi-word expressions) difficult to translate semantically opaque cannot be translated word-by-word a major obstacle to second language acquisition Example: demonstrate support prouver son adhésion (prove adherence)
The Champollion approach Input: Large parallel corpora Output: List of collocations in each language, and equivalence mappings between these collocations The method is statistical and language-independent
Algorithm Align sentences across corpora Extract collocations from co-occurrence Identify all words that frequently appear across a source collocation Iteratively consider and score combinations of those words Select best set of words for the translation Determine word order and fill in prepositions
Sample translations additional costs coûts supplémentaires affirmative action action positive free trade libre-échange freer trade libéralisation … échanges take … steps prendre … mesures stock market bourse
Evaluation results Corpus of 3.5 million words, collocations selected from the same corpus: 78% Corpus of 8.5 million words, collocations selected from the same corpus: 74% Corpus of 3.5 million words, collocations selected from a different corpus: 65%
Conclusion Champollion provides for collocation translation Robust Language-independent Requires no tools But: Requires parallel corpora