Download presentation
Presentation is loading. Please wait.
Published byAmi Clark Modified over 9 years ago
1
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal
2
SMT Statistical Machine Translation Possible only through computers Global audience Use of statistical techniques to produce natural translations
3
Kevin Knight's Book SMT has two parts The second part, N-grams, are simple The first part, the alignment portion, are difficult After many long projects, I made my own algorithm
4
Before that, an introduction to the characters NLTK – simplifying input of corpora Corpora – hold text N-Grams – the frequency of a phrase
5
Algorithm 1. Match a. Take small Spanish input b. Look through the corpus to find instances of the input c. Collect the Spanish sentences in which this input was found, as well as the English translation right below each sentence d. Compare the English sentences to discover similar words e. Find the most common similar words and find permutations of them 2. Check a. Gather bi-gram values for each permutation using the bigram calculator b. Calculate the probabilities for each permutation with Knight’s formula e. Return the most probable permutation as the most likely simple translation
6
Development Simple – goal was to translate Corpora – functional “cosas” and “monkey”
7
Results It works! “ el mono” = “the monkey” Deeper understanding of SMT’s power (Google translate) Expand, elaborate upon algorithm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.