Creation of a Russian-English Translation Program Karen Shiells.

Slides:



Advertisements
Similar presentations
Statistical NLP: Lecture 3
Advertisements

Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Stemming, tagging and chunking Text analysis short of parsing.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Machine Translation History of Machine Translation Difficulties in Machine Translation Structure of Machine Translation System Research methods for Machine.
1 A Chart Parser for Analyzing Modern Standard Arabic Sentence Eman Othman Computer Science Dept., Institute of Statistical Studies and Research (ISSR),
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
What are imperatives? Why do we care? The Solution: A brief syntactic background: Movement in X-bar theory: Paula Hagen  English Linguistics  University.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Linguistic Essentials
Linguistics The eleventh week. Chapter 4 Syntax  4.1 Introduction  4.2 Word Classes.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
Introduction to Compiling
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
INTRODUCTION TO COMPILERS(cond….) Prepared By: Mayank Varshney(04CS3019)
November 16, 2004 Lexicon (An Interacting Subsystem in UG) Part-II Rajat Kumar Mohanty IIT Bombay.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
What do we mean by Syntax? Unit 6 – Presentation 1 “the order or arrangement of words within a sentence” And what is a ‘sentence’? A group of words that.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Natural Language Processing Slides adapted from Pedro Domingos
NEW REQUIREMENTS New requirements – American Sign Language – Recently Generated Sentences Issues with Requirements Options for Implementation Choice and.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
NATURAL LANGUAGE PROCESSING
Composing Music with Grammars. grammar the whole system and structure of a language or of languages in general, usually taken as consisting of syntax.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
7.2 Programming Languages - An Introduction to Informatics WMN Lab. Hye-Jin Lee.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Welcome to the flashcards tool for ‘The Study of Language, 5 th edition’, Chapter 8 This is designed as a simple supplementary resource for this textbook,
Lexical and Syntax Analysis
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
CS 388: Natural Language Processing: Syntactic Parsing
Chapter 9 Structuring System Requirements: Logic Modeling
Interpreter Pattern.
Linguistic Essentials
CS416 Compiler Design lec00-outline February 23, 2019
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Chapter 9 Structuring System Requirements: Logic Modeling
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Presentation transcript:

Creation of a Russian-English Translation Program Karen Shiells

Purpose Object-oriented approach Interactive machine translation Designed for aid, not independent translation Explore algorithms used in machine translation Identify grammatical obstacles to translation Create a base to expand later

Scope of Study Machine translation is and will be imperfect Modern translation uses statistical methods Project is limited to:  Separating base words from morphological endings  Constructing syntax trees from source text  Generating simple English output from tree  Identifying words already known to the program

Other Research Part-of-speech tagging:  Uses probability to identify parts of speech  Applied to unknown words and structures  Complex labeling systems, beyond conventional Translation algorithms:  Massive dictionaries store words and information  Aided by verb categorization  Omit unknown words and translate without  Usually comprehensible, but require human revision

Old Methods Direct Translation  First method  Rearranges sentences without parsing  Based on rules of transfer for specific languages Interlingua  From era of international languages  Uses one representation as an intermediary  Intermediary is usually a constructed language  Easier to add language pairs

Syntactic Transfer Similar to interlingua Generates syntax tree using specific parser Rearranges tree to fit target structure Uses specific generation method to form output Entire algorithm specific to one language pair Best quality translations Relatively new Not as common in commercial software

Alternative Structures Valency  Stores number of complements for each word  Type of complements not specified  Occupies less space in dictionary Phrase-Structure Representation  Most familiar: noun phrase, verb phrase, etc.  Breaks sentence into superstructures  Puts terminal symbols only in leaves  Non-terminal symbols for branches

Dependency Trees Uses words as nodes, not just leaves Examples:  Verb dependent on subject  Objects dependent on verb  Adjectives dependent on nouns  Prepositions vary by type of prepositional phrase Easier to verify agreement between words Occupies less space

Object Orientation Object-oriented approach allows more flexibility Endings, cases, and declensions are classes Fewer hard-coded rules Methods for locating dependents are in classes Modular design allows gradual changes  Changes in lexical analysis do not affect parsing  Changes in dictionary do not affect translation

Verb Typing Divides verbs into categories, for example:  Transitive  Intransitive  Directional or Non-directional motion Condenses structure storage  Dictionary stores only type of a verb  Particular structures taken from general  Code can apply to general structures, not specific

Dictionary Open, save, add, remove, and search functions Stores:  Russian nominative  English nominatives  Part of speech  Noun/pronoun attributes  Verb types

Translator Uses transliteration for ease of testing Can be easily converted to Unicode Cyrillic Debugging output to terminal window

Results Subject, verb, direct object translated  Subject is first nominative  Verb matched by gender, number, and person  Direct object is first accusative Adjectives matched to nouns  Matched by case, number, and gender  Word order not considered Word order should be accounted for, but aren't  Adjectives to nearest, not matching  Prepositional objects should be nearby

Conclusions Part-of-speech guessing could be added easily  When a subordinate is not found, add to list  For each unmatched word, prompt user  Allow selection between subordinates not found Verb typing would be harder, but helpful  Restricting complements makes more precise  More efficient, not searching for all possible  Prepositions could be associated with nouns Even in inflecting languages, word order matters  Subordinates should be located by proximity  Multiple functions use the same inflections

Bibliography Allen, James. Natural Language Understanding. New York: Benjamin/Cummings Publishing Company, Arnold, Doug, Lorna Balkan, Siety Meijer, R. Lee Humphreys, and Louisa Sandler. Machine Translation: An Introductory Guide. London: NCC Blackwell, Available Online: Barber, Charles. The English Language: A Historical Introduction. Cambridge: Cambridge University Press, Beard, Robert. “Russian: An Interactive On-Line Reference Grammar”. November 1, Available Online: Comrie, Bernard, ed. The World's Major Languages. Oxford: Oxford University Press, Hutchins, John and Harold Somers. An Introduction to Machine Translation. London: Academic Press, Available Online: