CONTENTS Abstract Motivation Literature Survey Existing System

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

CODE/ CODE SWITCHING.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Part-Of-Speech Tagging and Chunking using CRF & TBL
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
A New Approach for HMM Based Chunking for Hindi Ashish Tiwari Arnab Sinha Under the guidance of Dr. Sudeshna Sarkar Department of Computer Science and.
NERIL: Named Entity Recognition for Indian FIRE 2013.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
POS Tagger and Chunker for Tamil
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Language Identification and Part-of-Speech Tagging
EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA
Dr. Pushpak Bhattacharyya
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Approaches to Machine Translation
Sentiment analysis algorithms and applications: A survey
CSC 594 Topics in AI – Natural Language Processing
Tools for Natural Language Processing Applications
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
University of Computer Studies, Mandalay
Statistical NLP: Lecture 13
--Mengxue Zhang, Qingyang Li
Machine Learning in Natural Language Processing
Tagging and Statistically Translating Latin Sentences
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Automatic Detection of Causal Relations for Question Answering
Approaches to Machine Translation
CS4705 Natural Language Processing
Translingual Knowledge Projection and Statistical Machine Translation
Computational Linguistics: New Vistas
Statistical Machine Translation Papers from COLING 2004
Natural Language Processing
Hindi POS Tagger By Naveen Sharma ( )
Artificial Intelligence 2004 Speech & Natural Language Processing
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

CONTENTS Abstract Motivation Literature Survey Existing System Proposed System Framework Modules Description Comparative Analysis Experimental Results Conclusion References

ABSTRACT Machine Translation is one of the major area under NLP. While translating English - Tamil, preposition in English sentences should be translated into postpositions in Tamil to make meaningful sentences. This project mainly focused to eliminate the prepositional phrase attachment and orthographical errors.

MOTIVATION Machine translation quality has improved substantially in recent years. Prepositions are plays sound role in meaningful translation for any languages. The prepositional phase errors are the major issue. The motivation of this project is to improve the English-Tamil translation quality. Use some semantic rule to correct the prepositional errors.

LITERATURE SURVEY Word Alignment Problem No Authour & Year Approaches 1 R.Harshawardhan et.al, IJCSE, 2011 Linear Programming 2 S.Vetrivel and Diana Baby,ICN,2010 HMM-Viterbi Algorithm Sentence Simplification Problem S. No Authour & Year Approaches 1 R.Harshawardhan et.al., IJCA, 2011 Concept Labeling 2 Thiruumeni P G et.al., IJCA,2011 Idioms and Phrasal Verbs 3 C.Poornima et.al., IJCA,2011 Rule Based

Contd… Morphological Analyzer and Generator POS Tagging S.No Authour & Year Approaches 1 M.Selvam and A M. Natarajan,IJCSE,2009 Rule Based 2 V.Dhanalakshmi and S.Rajendran, IJCA,2010 SVM Based 3 Anand Kumar M et.al.,IJCSE,2010 Sequence Labeling 4 Antony P.J and K P Soman,IJCSET,2012 Suffix Stripping POS Tagging S.No Authour & Year Approaches 1 D.Chandrakanth,IJCE,2012 SVM Based 2 Selvam M et.al., IJCPL,2008 Phrase Structure Tree Bank 3 Adam R. Teichert et.al,EMNL,2010 HMM Based

EXISTING SYSTEM

PROPOSED SYSTEM FRAMEWORK

POS TAGGING

WORD BY WORD TRANSLATION

WORD BY WORD TRANSLATION

MORPHOLOGICAL ANALYSIS

RULES OF PREPOSITIONAL PHRASE ATTACHMENT Rules of the prepositional phrase “of” 1. <NN><IN><DT> or <NN><IN><NN> = Prepositional phrase is “udaiya/in”. 2.<NN><IN><JJ> = Prepositional phrase is “kkaana” 3.<RB><IN><NNP> = Prepositional phrase is “il”. 4.<VBN><IN><NN> = Prepositional phrase is “aal”. Rules of the prepositional phrase “by” <POSP1><IN><POSP2> = Prepositional phrase is “aal”. Rules of the prepositional phrase “on” <POSP1><IN><POSP2> = Prepositional phrase is “mele/il”. Rules of the prepositional phrase “in” <POSP1><IN><POSP2>=Prepositional phrase is “il”. Rules of the prepositional phrase “to” <POSP1><IN><POSP2> = Prepositional phrase is “kku”. Rules of the prepositional phrase “from ” <POSP1><IN><POSP2> =Prepositional phrase is “irunthu”.

PREPOSITIONAL PHRASE ATTACHMENT NN IN NN உடைய/இன் A Page of the Book – புக்கினுடைய பக்கம்

PREPOSITIONAL PHRASE ATTACHMENT NN IN JJ க்கான Cotton is a crop of subtropical climate – பருத்தி ஒரு மித வெப்ப மண்டல காலநிலைக்கான பயிராகும்

PREPOSITIONAL PHRASE ATTACHMENT RB IN NNP இல் He lives south of London– அவர் தெற்கு லண்டனில் வசிக்கிறார்

PREPOSITIONAL PHRASE ATTACHMENT VBN IN NN ஆல் Most tables are made of the wood – பெரும்பாலான மேஜைகள் மரத்தால் செய்யப்பட்டது

ORTHOGRAPHICAL RULES Rule 1: Rule 2: Rule 3:

WORDS REORDERING Reorder went to Shop He கடைக்கு அவன் சென்றான் கடைக்கு

ENGLISH-TAMIL TRANSLATION

COMPARATIVE STUDY

Total No. of Translated Words No. of Correct sentences EXPERIMENTAL RESULTS System/ Metrics Total. No. of Sentences Total No. of Words Total No. of Translated Words No. of Correct sentences No. of Correct words *P *R *F Proposed System 200 1020 970 185 940 92% 97% 94% TDIL Translate 120 610 60% 63% 61% Google Translate 160 820 80% 85% 82% P*- Precision , R*-Recall,F*-F-Measure

CONCLUSION There has been a significant advancement in the area of machine translation than the existing system. This work is mainly focused to identify the exact meaning of the preposition with respect to the content and place for English-Tamil translation. Thus the accuracy of the proposed translation system is 92%, 97% and 94%.

REFERENCES R.Harshawardhan, Mridula sara Augustine and Dr.K.P.Soman(2011), “A simplified approach to word alignment algorithm for English-Tamil translation”,IJCSE,Vol.2,No.1 Pages:94-100. S.Vetrivel and Diana Baby (2010), “English to Tamil statistical machine translation and alignment using HMM”, Proceedings of the 12th international conference on Networking, VLSI and signal processing, Pages: 182-186. R. Harshawardhan, Mridula Sara Augustine and Dr K. P. Soman(2011), “Phrase based English-Tamil translation system by concept labeling using translation memory”,IJCA,Vol.20,No.3,Pages:1-6. Thiruumeni P G,Anand Kumar M,Dhanalakshmi and V,Soman K P(2011), “An approach to handle idioms and phrasal verbs in English-Tamil machine translation system”, IJCA,Vol.26,No.10,Pages:36-41. Poornima C,Dhanalakshmi V,Anand Kumar and M,Soman K P(2011), “Rule based sentence simplification for English to Tamil Machine Translation system”,IJCA,Vol.25,No.8,Pages:38-42. M.Selvam and A M.Natarajan (2009), “Improvement of Rule Based Morphological Analysis and POS Tagging in Tamil Language via Projection and Induction Techniques”,IJCSE,Vol.3,Pages:357-367. Dhanalakshmi and Rajendran(2010), “Natural Language processing tools for Tamil grammar learning and teaching”,IJCA,Vol.8,No.14,Pages:26-30. Lakshmana Pandian S and Kumanan Kadhirvelu(2012), “Machine translation from English to Tamil using Hybrid Technique”,IJCA,Vol.46,No.16,Pages:36-42. Anand Kumar M, Dhanalakshmi V,Soman K P and Rajendran(2010), “A Sequence labeling approach to morphological analyzer for Tamil language”,IJCSE,Vol.2,No.6,Pages:1944-1951. Anand Kumar M,Dhanalakshmi V, Rekha R U,Soman K P and Rajendran(2010), “A Novel data driven algorithm for Tamil morphological generator”,IJCA,Vol.6,No.12,Pages:52-56.

CONTD… Antony P J and K P Soman(2012), “Computational Morphology and Natural language parsing for Indian languages: A literature Survey”, IJCSET,Vol.3,No.4,Pages:136-146. D.Chandrakanth, M.Anand Kumar and S.Gunasekaran(2012), “Parts-of-Speech tagging for Tamil language”,IJCE,Vol.6,No.6,Pages:88-93. Dinesh Kumar and Gurpreet Singh Josan(2010), “Part of speech Taggers for morphologically rich Indian languages: A survey”, IJCA,Vol.6,No.5,Pages:1-9. Selvam M,Natarajan.A M, and Thangarajan R(2008), “Structural parsing of Natural Language text in Tamil using phrase structure Hybrid Language Model”,IJCPL,World Academy of Science,Engineering and Technology,Vol.22,No.3,Pages:463-469. Adam R.Teichert and Hal Daume III (2010), “Unsupervised Part of Speech Tagging without a Lexicon”, In Proceedings of the 2010 Conference on Empirical Methods in Natural Language, Processing,Pages:1-6. Antony P J and Soman K P(2011), “Parts of Speech tagging for Indian languages: A Literature Survey”,IJCA,Vol.34,No.8,Pages:22-29. S.Saraswathi,P.Kanivadhana,M.Anusiya and S.Sathiya(2011), “Bilingual Translation System” ,IJCSE,Vol.3,No.3 , Pages: 1168-1174. 18. Matt Post,Chris Callison-Burch and Miles Osborne(2012), “Constructing parallel corpora for six Indian languages via crowd sourcing”, Proceeding of the 7th workshop on Statistical machine translation, Pages:401-409. 19. Meera Subhash,Wilscy M and S A Shanavas(2012), “A Rule based approach for Root word identification in Malayalam language”,IJCSIT,Vol.4,No.3,Pages:159-166. 20. Pushpak Bhattacharyya (2012), “Natural Language processing: A perspective from computation in presence of Ambiguity ,Resource constraint and Multilinguality”, CSI Journal of Computing,Vol.1,No.2,Pages:1-13.

CONTD… 21. Kuang-hua and Hsin-His Chen(1996), “A Rule based and Corpus-Oriented approach to prepositional phrase attachment”, Proceedings of the 16th conference on Computational linguistics,Vol.1,Pages:216-221. 22. Vincent Van Asch and Walter Daelemans(2009), “Prepositional phrase attachment in shallow parsing”, Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing, Pages: 12-17. 23. Rajat Kumar Mohanty,Ashish Francis,Almeida and Pushpak Bhattacharyya(2005), “ Prepositional Phrase attachment and Interlingua”, Research on Computing Science,Pages:241-253. Sudip Kumar Naskar and Sivaji Bandyopadhyay(2006), “Handling of prepositions in English to Bengali Machine translation”, Proceedings of the Third ACL-SIGSEM Workshop on prepositions, Association for Computational Linguistics,Pages:89-94. I.Dan Melamed,Ryan Green and Joseph P.Turian (2006), “Precision and Recall of Machine Translation”, 03 Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology,Vol.2,Pages 61-63.  

Thank You !