Development of a German- English Translator Felix Zhang Period Thomas Jefferson High School for Science and Technology Computer Systems Research Lab
Summary of Quarter 2 NP Chunking Lemmatization Dictionary Lookup Inflection Noun-verb agreement
Scope for this quarter Focus less on statistical methods Get rudimentary grammar system working Fix all the bugs I’ve made since September
New and Modified Components More info stored in NP chunking Better noun-verb agreement Grammar –Element Assignment –Priority Number Assignment
Noun-verb agreement Simple method to eliminate more ambiguities def eliminateother(attribs, sub, closest): for x in attribs: if x[0][1] == "nou" and x != sub: for y in x[1]: if y[0]== "nom": attribs[attribs.index(x)][1].remove(y) return attribs
Noun phrase chunking Now used for English sentences Stores more info for later methods “the man make the children” NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]]
Element Assignment Based on linguistic information If case is nominative, chunk is subject If accusative, chunk is direct object [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]
Priority Assignment Each sentence element is assigned priority number Based on position in English sentence Assignments: –sub 1 –mverb 2 –auxverb 3 –iobj 4 –dobj 5 Sort by number for English grammar
Full run of program input: “den Mann machen die kleinen Kinder” The small children make the man ~/research $ python proj.py Part of speech tags: [['den', 'art'], ['Mann', 'nou'], ['machen', 'ver'], ['die', 'art'], ['kleinen', 'adj'], ['Kinder', 'nou']] Morphological analysis: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['1', 'pl'], ['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl'], ['akk', 'pl']]]] Disambiguated after noun-verb agreement: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl']]]] Lemmatized: [['Mann', ['Mann', 'Man']], ['machen', ['machen']], ['kleinen', ['klein']], ['Kinder', ['Kind']]] Root translated: [['den', 'the'], ['Mann', 'man'], ['machen', 'make'], ['die', 'the'], ['kleinen', 'small'], ['Kinder', 'child']] NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]] Inflected (only works before chunking): ['the', 'the'] ['man', ['akk', 'mas'], 'man'] ['man', ['dat', 'pl'], 'mans'] ['make', ['3', 'pl'], 'make'] ['the', 'the'] ['small', 'small'] ['child', ['nom', 'pl'], 'childs'] Assigned an element type: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Assigned priority: [['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Rearranged to English structure: [['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj']]
Problems Ambiguities (again) –One ambiguity can change the entire structure of the sentence –“I gave a horse the hat” vs. “I gave the hat a horse” –Attempt at all permutations possible User disambiguation
Problems Inflexible –Grammar can only be rearranged in one specific way –Subject – Main verb – Indirect – Direct – Auxiliary Verb –Does not accommodate for prepositions, conjunctions, etc.
Future research Implement more statistical methods –Morphological info –Actual translation – bilingual corpus Create better parse tree – Dependency grammar