Statistical Machine Translation Raghav Bashyal
Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original Notice patterns, associate words
Project Translate basic text from Spanish to English Test effectiveness with/without hard-coded components (syntax) Specific procedures/algorithms that add speed
Literature Guides on Statistical Machine Translation NLTK UC Berkeley Christina Wallin UC Berkeley Modifications Larger text more useful “state of the art” implementation Google
Procedure NLTK – Natural Language ToolKit Python Made from Natural Language processing projects Current procedure – read the NLTK book Jump off and write relevant code
Expected Results Probably will be very basic translation Usually perform better with “sample” text than “real” text Highlighted errors Program should use reference data to find some errors Error frequency plots for certain words Test the effectiveness of adjustments Hard coding, other algorithms