Development of a German- English Translator Felix Zhang Period 5 2007-2008 Thomas Jefferson High School for Science and Technology Computer Systems Research.

Slides:



Advertisements
Similar presentations
English relative pronouns are Which Who Whom Whose That
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
At the end of the presentation return to the main Grammar page.
Identifying Parts of Speech & their Functions Nouns, Pronouns, Verbs, Prepositions, Adjectives, & Adverbs; Subjects & Objects.
Development of a German- English Translator Felix Zhang.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Anna Sågvall Hein, GSLT, January 2003 Direct translation no intermediary sentence structure translation proceeds in a number of steps, each step dedicated.
Identifying Prepositional Phrases
Natural Language Processing Projects Heshaam Feili
Statistical NLP: Lecture 3
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 VERB PHRASE LAY SENGHOR. 2 What is a verb phrase? Definition: A verb phrase is composed of the verbs of the sentence and any modifiers of verbs,including.
Eden German Grammar: main developments March-July 2003 Increase in structural complexity covered –provision of X-bar structural backbone within noun phrases.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Stemming, tagging and chunking Text analysis short of parsing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Creation of a Russian-English Translation Program Karen Shiells.
The syntax of language How do we form sentences? Processing syntax. Language and the brain.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
1 A Chart Parser for Analyzing Modern Standard Arabic Sentence Eman Othman Computer Science Dept., Institute of Statistical Studies and Research (ISSR),
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
CASES Everything you ever wanted to know and didn’t dare ask!
Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.
Language and Culture Prof. R. Hickey SoSe 2006 How language works
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Daily Grammar Practice
Prof. Erik Lu. MORPHOLOGY GRAMMAR MORPHOLOGY MORPHEMES BOUND FREE WORDS LEXICAL GRAMMATICAL NOUNS VERBS ADJECTIVES (ADVERBS) PRONOUNS ARTICLES ADVERBS.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Functions of a Noun A noun is a person, place, thing or idea. A noun can be found in any part of a sentence. The function of a noun will vary depending.
DAILY GRAMMAR PRACTICE (DGP)
Development of a German- English Translator Felix Zhang TJHSST Computer Systems Research Lab Period 5.
Clause Types A descriptive tangent into the types of clauses Note: much of this discussion is based on Radford, Andrew (1989) Transformational Grammar.
The Parts of Speech By Ms. Walsh The 8 Parts of Speech… Nouns Adjectives Pronouns Verbs Adverbs Conjunctions Prepositions Interjections Walsh Publishing.
Linguistics The eleventh week. Chapter 4 Syntax  4.1 Introduction  4.2 Word Classes.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
The Parts of Speech The 8 Parts of Speech… Nouns Adjectives Pronouns Verbs Adverbs Conjunctions Prepositions Interjections.
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
The man bites the dog man bites the dog bites the dog the dog dog Parse Tree NP A N the man bites the dog V N NP S VP A 1. Sentence  noun-phrase verb-phrase.
Natural Language Processing
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Latin Nouns Part 1: Case Usage Latin II Grammar Review.
Parsing and Translating
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
November 2009HLT: Sentence Parsing1 HLT Sentence Parsing Algorithms 2 Problems with Depth First Top Down Parsing.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
NATURAL LANGUAGE PROCESSING
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Ross Chapter p Adjectives have same inflection as nouns
Parts of Speech Review.
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Words, Phrases, Clauses, & Sentences
Statistical NLP: Lecture 3
Bringing English Together
SYNTAX.
A Parser for Sinhala Language First Step Towards English to Sinhala Machine Translation
Tagging and Statistically Translating Latin Sentences
PREPOSITIONAL PHRASES
Linguistic Essentials
Development of a German-English Translator
Chapter 5 Introduction to English Nouns Greatest Obstacle for Most!
Presentation transcript:

Development of a German- English Translator Felix Zhang Period Thomas Jefferson High School for Science and Technology Computer Systems Research Lab

Summary of Quarter 2 NP Chunking Lemmatization Dictionary Lookup Inflection Noun-verb agreement

Scope for this quarter Focus less on statistical methods Get rudimentary grammar system working Fix all the bugs I’ve made since September

New and Modified Components More info stored in NP chunking Better noun-verb agreement Grammar –Element Assignment –Priority Number Assignment

Noun-verb agreement Simple method to eliminate more ambiguities def eliminateother(attribs, sub, closest): for x in attribs: if x[0][1] == "nou" and x != sub: for y in x[1]: if y[0]== "nom": attribs[attribs.index(x)][1].remove(y)‏ return attribs

Noun phrase chunking Now used for English sentences Stores more info for later methods “the man make the children” NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]]

Element Assignment Based on linguistic information If case is nominative, chunk is subject If accusative, chunk is direct object [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]

Priority Assignment Each sentence element is assigned priority number Based on position in English sentence Assignments: –sub 1 –mverb 2 –auxverb 3 –iobj 4 –dobj 5 Sort by number for English grammar

Full run of program input: “den Mann machen die kleinen Kinder” The small children make the man ~/research $ python proj.py Part of speech tags: [['den', 'art'], ['Mann', 'nou'], ['machen', 'ver'], ['die', 'art'], ['kleinen', 'adj'], ['Kinder', 'nou']] Morphological analysis: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['1', 'pl'], ['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl'], ['akk', 'pl']]]] Disambiguated after noun-verb agreement: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl']]]] Lemmatized: [['Mann', ['Mann', 'Man']], ['machen', ['machen']], ['kleinen', ['klein']], ['Kinder', ['Kind']]] Root translated: [['den', 'the'], ['Mann', 'man'], ['machen', 'make'], ['die', 'the'], ['kleinen', 'small'], ['Kinder', 'child']] NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]] Inflected (only works before chunking): ['the', 'the'] ['man', ['akk', 'mas'], 'man'] ['man', ['dat', 'pl'], 'mans'] ['make', ['3', 'pl'], 'make'] ['the', 'the'] ['small', 'small'] ['child', ['nom', 'pl'], 'childs'] Assigned an element type: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Assigned priority: [['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Rearranged to English structure: [['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj']]

Problems Ambiguities (again)‏ –One ambiguity can change the entire structure of the sentence –“I gave a horse the hat” vs. “I gave the hat a horse” –Attempt at all permutations possible User disambiguation

Problems Inflexible –Grammar can only be rearranged in one specific way –Subject – Main verb – Indirect – Direct – Auxiliary Verb –Does not accommodate for prepositions, conjunctions, etc.

Future research Implement more statistical methods –Morphological info –Actual translation – bilingual corpus Create better parse tree – Dependency grammar