A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Machine Translation Anna Sågvall Hein Mösg F
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Aki Hecht Seminar in Databases (236826) January 2009
EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.
Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Lecture 1 Introduction: Linguistic Theory and Theories
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Creation of a Russian-English Translation Program Karen Shiells.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Machine translation Context-based approach Lucia Otoyo.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Chapter 8: Systems analysis and design
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Computational Linguistics Ling 200 Spring 2006.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Postgraduate Diploma in Translation Lecture 1 Computers and Language.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
What is a Computer An electronic, digital device that stores and processes information. A machine that accepts input, processes it according to specified.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Approaches to Machine Translation
Information Retrieval
Presentation transcript:

A Simple English-to-Punjabi Translation System By : Shailendra Singh

Introduction Internet has influenced multilingualism and language industry Internet users require information in the language they understand comprehensively Machine Translation (MT) as a computer system is needed to translate from a source language to target language Currently English-to-Punjabi computer translation systems are mostly based on word to word translation only The focus here is on a simple machine translation from English-to-Punjabi

Introduction English is structured in terms of Subject –Verb-Object Punjabi is structured in terms of Subject –Object-Verb

Literature Review MT has numerous strategies applied over time Strategies are ranging from direct approach to the latest ones like example based machine translation

Direct Approach Most Primitive Translation is word-for-word or phrase-to-phrase Need very large bilingual dictionary Very little of language analysis is involved because mostly just based on dictionary In short the translation result is very inaccurate and many errors Example system : SYSTRAN

Rule Based Full with many different types of rule example are syntactic rule, lexical rules, lexical transfer rules, rules for syntactic generation, rules for morphology and etc Starts with building morphological tree and transformed into syntactic tree and lastly into semantic tree (Hutchins 1994). Crucial step is transformation of source language to target language All the rules here actually refers to the particular grammars of the languages involved in translation Example system : The Ariane, SUSY

Rule Based Advantages : Deep analysis on the translation process Disadvantages : 1) Requires much linguistic knowledge. 2) Impossible to write rules that cover all a language. 3) Transformation rules are always specified for a single language pair and the system is therefore difficult to overlook. 4) Introduce inconsistency when the rules increase and involves a lot of cost

Knowledge Based Mainly describes a rule based system displaying extensive semantic and pragmatic knowledge of a domain, including an ability to reason, to some limited extent, about concepts in the domain. Arnold et al. (1994, page 190). Mostly the features are the same like rule based Distinctive in terms focused towards a particular domain thus minimizes ambiguity Example system : KANT – translates electronic manuals

Knowledge Based Advantages : Avoid ambiguity thus gives a quality result on the translation Disadvantages : 1)Focused just on a domain thus limits its capability 2)Domain chosen must have enough knowledge to accommodate the translation

Statistical Based Geared by an experiment in 1989 by a team from IBM The results of experiment seams to be attractive and acceptable This new methodology was fully based on statistical methods Marked a new approach towards MT which is called as ‘corpus base’ Vast corpus of language is a main component Alignment of words and phrase will be done with the corpus and later calculate the probabilities (Hutchins 1994) Example system : The Candide

Statistical Based Disadvantages : 1) Requires training on huge data with good quality bilingual corpora 2) Cannot work for complex translation as the process becomes too complex to handle

Example Based Translation is done by analogy Relies on past translation examples Past translation examples are regarded as accurate in term of syntactically, grammatically and also semantically Example of translation are kept in a store also known as corpus. Hence this is also ‘corpus base’ Sentence to be translated will be matched against examples in the database The closest match will be selected and replaced according to cater for the input sentence

Methodology In this paper we are taking the EBMT approach Justification for choosing EBMT : 1.Has eliminated the problems of tractability, scalability and performance which is found in older MT strategies – (extensive knowledge) 2.Very minimal formal work has been done on Punjabi language it self thus it builds a barrier if we would take rule based approach 3.In EBMT correct and accurate example translations are needed which fulfills to the situation

Methodology The first step in EBMT is preparing the corpus Design for the corpus is as below : LanguageIdKeyLeftRight English1eatNP.PN Punjabi1khaanNP.NP

Methodology The sentence to be translated is “I am going to eat vegetables” First do sentence tokenizing Following is morphological analysis and tagging of part of speech Key matching with the ‘corpus’ In this case the key is eat

Methodology Based on the template here the input sentence will be NP.P eat N output will be NP.N khaan P The necessary filling of the template will be done based on the lexicon look up

Methodology “I am going to eat vegetables” “Meh savaji khaan lageaa hai”

Analysis Not much of linguistic knowledge is needed Faster in terms of performance because not much of linguistic processing is involved

Conclusion EBMT is suitable in the case where there is not much of formal linguistic knowledge of languages involved is available EBMT is good in the case where you do not need much deep analysis of linguistic

References Hutchins J. ( 1994), Research methods and system designs in machine translation a ten-year review, , International conference 'Machine translation: ten years on‘. Arnold, D., Balkan, L., Humphreys, R. L., Meijer, S. & Sadler, L. (1994), Machine translation: an introductory guide, Blackwells /NCC, London.

Thank You

Q & A