Download presentation
Presentation is loading. Please wait.
Published byKory French Modified over 6 years ago
1
Machine Translation: Techniques, Technologies and Challenges
Presented By Bibekananda Kundu, CDAC Kolkata
2
Why we need a Machine Translation System?
3
Translation ACK:
4
Translation ACK:
5
Why is Translation Difficult?
Ambiguity Language Divergence
6
Morphological Ambiguity
আমার এই জামাই চাই আমার এই জামাই চাই
7
ACK: http://ttt.org/theory/mt4me/mtambiguity.html
Lexical Ambiguity The pen was in the box. The box was in the pen. ACK:
8
ACK: http://specgram.com/CLIII.4/08.phlogiston.cartoon.zhe.html
Lexical Ambiguity So we have two possible readings, represented here pictorially. And we have two different kinds of ambiguity. First, there is semantic ambiguity. “Flies” could be a noun, meaning insects, or it could be a verb, meaning travel by air. And “like” could be a verb, meaning love, or it could be a preposition, meaning similar to. [“like” is a remarkable word: it can be used as a noun, verb, adverb, adjective, preposition, particle, conjunction, or interjection.] And then we also have a syntactic ambiguity. It could be that “fruit flies” is the subject, and “like a banana” is the predicate. Or it could be that “fruit” is the subject, and “flies like a banana” is the predicate. These kind of ambiguities show up all over the place, and as humans, we’re so good at resolving them that usually we don’t even notice them consciously. But sometimes, unintentional ambiguities produce a quite comic effect. [>] ACK:
9
Syntactic and Semantic Ambiguity
syntactic ambiguity NP NP VP S NP NP PP semantic ambiguity So we have two possible readings, represented here pictorially. And we have two different kinds of ambiguity. First, there is semantic ambiguity. “Flies” could be a noun, meaning insects, or it could be a verb, meaning travel by air. And “like” could be a verb, meaning love, or it could be a preposition, meaning similar to. [“like” is a remarkable word: it can be used as a noun, verb, adverb, adjective, preposition, particle, conjunction, or interjection.] And then we also have a syntactic ambiguity. It could be that “fruit flies” is the subject, and “like a banana” is the predicate. Or it could be that “fruit” is the subject, and “flies like a banana” is the predicate. These kind of ambiguities show up all over the place, and as humans, we’re so good at resolving them that usually we don’t even notice them consciously. But sometimes, unintentional ambiguities produce a quite comic effect. [>] Fruit flies like a banana Fruit flies like a banana ACK : nlp.stanford.edu/~wcmac/papers/ symsys-100-nlp.pptx
10
ACK: http://specgram.com/CLIII.4/08.phlogiston.cartoon.zhe.html
11
Pronoun Reference Ambiguity
ACK:
12
Pronoun Reference Ambiguity
13
Why is Translation Difficult?
Ambiguity Language Divergence
14
Agglutination Finnish: istahtaisinkohan
English: I wonder if I should sit down for a while
15
Agglutination Finnish: istahtaisinkohan
• ist + "sit", verb stem • ahta + verb derivation morpheme, "to do something for a while" • isi + conditional affix • n + 1st person singular suffix • ko + question particle • han a particle for things like reminder (with declaratives) or "softening" (with questions and imperatives) English: I wonder if I should sit down for a while
16
The excellent novel has been published in the book fair.
Free Word Order বইমেলায় অসাধারণ উপন্যাসটা প্রকাশিত হয়েছে ৷ The excellent novel has been published in the book fair. বইমেলায় প্রকাশিত হয়েছে অসাধারণ উপন্যাসটা ৷ অসাধারণ উপন্যাসটা বইমেলায় প্রকাশিত হয়েছে ৷ অসাধারণ উপন্যাসটা প্রকাশিত হয়েছে বইমেলায় ৷
17
Examples of English-Bangla Divergence
Ram is a very good boy. রাম [একটি ]খুব ভাল ছেলে [হয় ]৷ I was in foreign then. আমি তখন বিদেশে [ছিলাম ]৷ A girl with beautiful eyes সুন্দর চোখের একটি মেয়ে A boy with high fever প্রচন্ড জ্বরে আক্রান্ত একটি ছেলে He wrote with a pen সে কলম দিয়ে লিখেছিল । He has two books. তার দুটো বই গুলো আছে ।
18
Examples of English-Bangla Divergence
I eat rice. আমি ভাত খাই ৷ I drink water. আমি জল খাই ৷ He smoked cigarette. সে সিগারেটটা খেয়েছিল৷ It is raining. বৃষ্টি হচ্ছে । There is a tiger in the forest. বনে বাঘ আছে৷
19
Looking inside of a Machine Translation System
20
Issues to Handle Sentence: I went with my friend, John, to the bank to withdraw some money but was disappointed to find it closed. Parts of Speech
22
Named Entity Recognition
Issues to Handle Sentence: I went with my friend, John, to the bank to withdraw some money but was disappointed to find it closed. Parts of Speech Named Entity Recognition
24
Issues to Handle Sentence: I went with my friend, John, to the bank to withdraw some money but was disappointed to find it closed. Parts of Speech Named Entity Recognition Word Sense Disambiguation
26
Issues to Handle Sentence: I went with my friend, John, to the bank to withdraw some money but was disappointed to find it closed. Parts of Speech Named Entity Recognition Word Sense Disambiguation Co-reference
28
Issues to Handle Sentence: I went with my friend, John, to the bank to withdraw some money but was disappointed to find it closed. Parts of Speech Named Entity Recognition Word Sense Disambiguation Co-reference Subject Drop
30
Machine Translation Trinity
MT Trinity Level of Transfer Interlingua based Semantic Level Syntactic Level Direct Level Rule based Statistical Example based Hybrid Approach Language Pair English-Bangla Bangla-Hindi English-Hindi ACK:
31
Vauquois Triangle for MT
Level of Transfer Interlingua based MT Trinity Semantic Level Syntactic Level Direct Level Bangla-Hindi Rule based Statistical English-Bangla English-Hindi Example based Hybrid Language Pair Approach ACK:
32
Vauquois Triangle for MT
Level of Transfer Interlingua based MT Trinity Semantic Level Syntactic Level Direct Level Bangla-Hindi Rule based Statistical English-Bangla English-Hindi Example based Hybrid Language Pair Ram play + s football Approach Ram plays football
34
Vauquois Triangle for MT
Level of Transfer Interlingua based MT Trinity Semantic Level Syntactic Level Direct Level Bangla-Hindi Rule based Statistical English-Bangla English-Hindi Ram/PN play/VBF + s football/CN Example based Hybrid CN Language Pair Ram play + s football Approach Ram plays football
35
Vauquois Triangle for MT
MT Trinity S Bangla-Hindi S NP VP NP VP English-Bangla NP English-Hindi NP Ram/PN play/VBF + s football/CN CN Language Pair রাম/PN ফুটবল /CN খেল/VBF + ে / বাজা/VBF + য় / নাটক/CN Ram play + s football রাম ফুটবল খেল + ে / বাজা + য় Ram plays football রাম ফুটবল খেলে / বাজায়
36
Vauquois Triangle for MT
Physical object Play animate inanimate human thing instrument MT Trinity খেলে বাজায় Subject: Human Subject: Human Subject Object Object Object: thing Object: instrument Ram Football Guitar S Bangla-Hindi S NP VP NP VP English-Bangla NP English-Hindi NP Ram/PN play/VBF + s football/CN CN Language Pair রাম/PN ফুটবল /CN খেল/VBF + ে / বাজা/VBF + য় / নাটক/CN Ram play + s football রাম ফুটবল খেল + ে Ram plays football রাম ফুটবল খেলে
37
Vauquois Triangle for MT
<aff {sub_np ( ramnoun dont_care singular third [human] [rAma:m 8] [] [] ) } {obj1_np ( football noun neuter singular third [thing] [Putabala:m 8] [] [] ) } k1 {main_vp_active ( play_1 verb_2 normal normal dont_care singular third [Kela] 5 [] [] ) } > . MT Trinity Bangla-Hindi English-Bangla English-Hindi Language Pair Ram plays football রাম ফুটবল খেলে
38
Major Machine Translation Systems in Indian Language
AnglaBharati IIT Kanpur, CDAC Kol, Noida, Hyderabad, Tvm Rule-based and Interlingua based English-Hindi, Bangla, Urdu, Punjabi, Telugu, Malayalam, Assamese Sampark IIIT Hyderabad, University of Hyderabad, IIT Bombay, IIT Kharagpur, CDAC Noida Rule-based and dictionary-based algorithms with statistical machine learning. It uses Computational Paninian Grammar. Hindi to Punjabi, Punjabi to Hindi, Telugu to Tamil and Urdu to Hindi Anuvadaksh C-DAC Pune, IISc Bangalore, IIIT Hyderabad, C-DAC Mumbai, IIT Bombay, IIIT Allahabad Four Machine Translation Technologies: TAG (Tree-Adjoining-Grammar based MT), SMT (Statistical based MT), AnalGen (Rules-Based MT) and EBMT (Example Based MT) English-Hindi, Marathi, Bengali, Urdu, Oriya, Tamil, Gujarati and Bodo Mantra-RajBhasha CDAC Pune Rule-based system using Augmented Transition Network (ATN) and Tree Adjoining Grammar (TAG) formalisms. Translates documents pertaining to Personnel Administration, Finance, Small Scale Industries, Agriculture, Information Technology, HealthCare, Education and Banking domains from English to Hindi.
39
Building blocks of a Rule-based Machine Translation System
40
Block Diagram Input Sentence Pre Processor Exception Handler Knowledge
Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Courtesy: Sudipta Debnath, CDAC Kolkata
41
Input > Input Sentence Pre Processor Exception Handler Knowledge
Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100.
42
Input > Pre Processor > Input Sentence Pre Processor
Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 rrr01/100 টাকা
43
Exception Handler >
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 Exception Handler > rrr01/100 টাকা
44
Exception Handler >
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 Exception Handler > Phrase Marker > the price of access_data_base_management is rrr01 rrr01/100 টাকা
45
Exception Handler >
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 Exception Handler > Phrase Marker > the price of access_data_base_management is rrr01 Word Analyzer > the: the Price: NOUN, neuter, singular, third, finance, দাম, … Of: of access_data_base_management: NOUN, neuter, singular, third,activity,অ্যাকসেস ডেটাবেস ম্যানেজমেন্ট… is: is rrr01: ADJECTIVE, *,NIL,rrr01, … rrr01/100 টাকা
46
Sentence Analyzer >
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 : Phrase Marker > the price of access_data_base_management is rrr01 Word Analyzer > the: the price: NOUN, neuter, singular, third, finance, দাম, … of: of access_data_base_management: NOUN, neuter, singular, third,activity,অ্যাকসেস ডেটাবেস ম্যানেজমেন্ট… is: is rrr01: ADJECTIVE, *,NIL,rrr01, … Sentence Analyzer > <aff {sub_np ( accessdatabasemanagement noun neuter singular third [activity] [Akasesa detAbesa myAnejamenta:m 8] [] [] ) ( of prep [ of ] ) ( the det [] [anda] [A] ) ( price noun neuter singular third [finance] [xAma:m 8] [] [] ) } ( ggg01 adjective any [NIL] [ggg01] [] [] ) {main_vp v1 } > . Sviram rrr01/100 টাকা
47
Sentence Analyzer >
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 : Phrase Marker > the price of access_data_base_management is rrr01 Word Analyzer > the: the Price: NOUN, neuter, singular, third, finance, দাম, … Of: of access_data_base_management: NOUN, neuter, singular, third,activity,অ্যাকসেস ডেটাবেস ম্যানেজমেন্ট… is: is rrr01: ADJECTIVE, *,NIL,rrr01, … Sentence Analyzer > <aff {sub_np ( accessdatabasemanagement noun neuter singular third [activity] [Akasesa detAbesa myAnejamenta:m 8] [] [] ) ( of prep [ of ] ) ( the det [] [anda] [A] ) ( price noun neuter singular third [finance] [xAma:m 8] [] [] ) } ( ggg01 adjective any [NIL] [ggg01] [] [] ) {main_vp v1 } > . Sviram rrr01/100 টাকা
48
Sentence Analyzer >
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 : Phrase Marker > the price of access_data_base_management is rrr01 Word Analyzer > the: the Price: NOUN, neuter, singular, third, finance, দাম, … rrr01: ADJECTIVE, *,NIL,rrr01, … Sentence Analyzer > <aff {sub_np ( accessdatabasemanagement noun neuter singular : ( price noun neuter singular third [finance] [xAma:m 8] [] [] ) } ( ggg01 adjective any [NIL] [ggg01] [] [] ) {main_vp v1 } > . Sviram Output Generator > অ্যাকসেস ডেটাবেস ম্যানেজমেন্টের দাম rrr01 rrr01/100 টাকা
49
Sentence Analyzer >
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 : Phrase Marker > the price of access_data_base_management is rrr01 Word Analyzer > the: the price: NOUN, neuter, singular, third, finance, দাম, … rrr01: ADJECTIVE, *,NIL,rrr01, … Sentence Analyzer > <aff {sub_np ( accessdatabasemanagement noun neuter singular : ( price noun neuter singular third [finance] [xAma:m 8] [] [] ) } ( ggg01 adjective any [NIL] [ggg01] [] [] ) {main_vp v1 } > . Sviram Output Generator > অ্যাকসেস ডেটাবেস ম্যানেজমেন্টের দাম rrr01 Post Processor > অ্যাকসেস ডেটাবেস ম্যানেজমেন্টের দাম ১০০ টাকা rrr01/100 টাকা
50
The price of access data base management is Rs. 100.
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. Pre Processor > the price of access data base management is rrr01 : Phrase Marker > the price of access_data_base_management is rrr01 Word Analyzer > the: the Price: NOUN, neuter, singular, third, finance, দাম, … Sentence Analyzer > <aff {sub_np ( accessdatabasemanagement noun neuter singular : Output Generator > অ্যাকসেস ডেটাবেস ম্যানেজমেন্টের দাম rrr01 Post Processor > অ্যাকসেস ডেটাবেস ম্যানেজমেন্টের দাম ১০০ টাকা rrr01/100 টাকা
51
The price of access data base management is Rs. 100.
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input > The price of access data base management is Rs. 100. : Output > অ্যাকসেস ডেটাবেস ম্যানেজমেন্টের দাম ১০০ টাকা ৷
52
Basic Steps Involves in Statistical Machine Translation
53
Parallel Aligned Sentences
ACK:
54
Word Alignment ACK:
55
Initial Probability
56
Expected Count
57
Revised Probability
58
Expected Count
59
Re-Revised Probability
60
Learning Phrase Tables from Word Alignments
Prof C.N.R. Rao was honored with the Bharat Ratna प्रोफेसर सी.एन.आर राव को भारतरत्न से सम्मानित किया गया Central Idea: A consecutive sequence of aligned words constitutes a “phrase pair”
61
Example of Phrase Based Machine Translation
Ram ate rice with the spoon राम खाये धान के साथ यह चमचा राम ने खा लिया चावल से वह चम्मच राम को खा लिया है एक राम से चम्मच चम्मच से चम्मच के साथ ACK:
62
Example of Phrase Based Machine Translation
खा लिया राम ने चम्मच से चावल चावल खाये चम्मच ACK:
63
Example of Phrase Based Machine Translation
Ram ate rice with the spoon राम ने चम्मच से चावल खाये ACK:
64
Factored Translation Model
Input output word word lemma lemma parts of speech parts of speech morphology morphology word class word class ACK:
65
Factored Translation Model
Input output ছেলেগুলো boys word word ছেলে boy lemma lemma CN parts of speech parts of speech CN Plural Plural morphology morphology Animate Animate word class word class ACK:
66
Conclusion Why we need a Machine Translation System?
Why is Translation Difficult? Looking inside of a Machine Translation System . How to judge a Machine Translation System?
67
Thank You
68
Input One Sentence At a time Input Sentence Pre Processor
Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Input One Sentence At a time
69
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Pre Processor Input : English sentence Output : modified sentence symbol table (will be used at Post Processoring phase) It detects word / patterns as per “Knowledge Base” and replaces them with predefined symbols. It also Stores those symbols with translation to the “symbol table” for future use.
70
Pre Processor (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Pre Processor (Sample Input Output) Action Input Modified input Symbol table Initial/ Final hello, how are you ? how are you ? </হ্যালো,/… May I know your name, please ? May I know your name ? >/, অনুগ্রহ করে /…
71
Pre Processor (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Pre Processor (Sample Input Output) Action Input Modified input Symbol table Expand It’s 52 cm. long. It is 52 centimeter long. x You can talk with Prof. D. Gupta. You can talk with professor D. Gupta. He is d/o ram. He is daughter of ram.
72
Pre Processor (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Pre Processor (Sample Input Output) Action Input Modified input Symbol table Date Meeting will be on 24 th April . Meeting will be on ddd01 . ddd01/২৪ শে এপ্রিল Meeting will be on 24/07/2015 . Meeting will be on ddd01 . ddd01/২৪/০৭/ ২০১৫ Meeting will be on April 24, ddd01/এপ্রিল ২৪, ২০১৫
73
Pre Processor (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Pre Processor (Sample Input Output) Type Input Modified input Symbol table Time Event will be on 9 pm. ttt01 . ttt01/রাত্রি ৯ টা Number I can give you 21 stick. I can give you nnn01 stick. nnn01/২১ Rupee I can give you Rs. 21. I can give you rrr01 . rrr01/২১ টাকা
74
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Pre Processor Similarly this module also finds following patterns as defined in the “Knowledge Base” : Acronym : IIT, CDAC etc. Bracket : ( { [ . Quotes : ‘ “. URL : etc. Slash : school/college, boy/girl etc. Dash : delhi – pune, 100 – 250 etc. …….. Etc.
75
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Exception Handler Input : Modified sentence Output : Translated output(s). Not _Found. This process searches in the “Knowledge Base” for matching. Matching may be total sentence or pattern. If matched it produces translated output otherwise “Not_Found” produces.
76
Exception Handler (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Exception Handler (Sample Input Output) Action Input Matched Rule Output Full your faithfully your faithfully#আপনার বিশ্বাসপাত্র আপনার বিশ্বস্ত good morning #সুপ্রভাত সুপ্রভাত
77
Exception Handler (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Exception Handler (Sample Input Output) Action Input Matched Rule Output Template Happy birthday to you. happy <np1([]),*,*,*> {to} <np2([human]),*,*,*> # <np2([human]),*,*,*> কে জানাই শুভ <np1([]),*,*,*> তোমাকে জানাই শুভ জন্মদিন
78
Exception Handler (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Exception Handler (Sample Input Output) Action Input Matched Rule Output Template Depawali greetings and best wishes. <np1([]),*,*,*> greetings and <np2([]),*,*,*> # <np1([]),*,*,*> এর শুভকামনা এবং <np2([]),*,*,*> দীপাবলীর শুভকামনা এবং সবচেয়ে ভালো শুভেচ্ছা
79
Exception Handler (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Exception Handler (Sample Input Output) Action Input Matched Rule Output Full + Template I have a pen. x Not_Found Book the ticket for me.
80
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Phrase Marker Input : Modified sentence Output : Phrase clubbed sentence It detects and join multiple words which must be treated as phrasal unit to get proper translation. Detection is done as per “Knowledge Base”. Word wise translation will be incorrect otherwise.
81
Phrase Marker (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Phrase Marker (Sample Input Output) Type Input Output NP (Noun Phrase) Abul salam international award for journalism will be announced this week. Abul_salam_international_award_for_journalism will be announced this week. We can use access data base management for this purpose. We can use access_data_base_management for this purpose.
82
Phrase Marker (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Phrase Marker (Sample Input Output) Type Input Output VP (Verb Phrase) This budget tries to bring down the inflation. This budget tries to bring_down the inflation. He wants to cut down on extra steps. He wants to cut_down on extra steps. After being addicted he falls apart. After being addicted he falls_apart.
83
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Word Analyzer Input : Modified sentence Output : List of Token (word) detail This process searches in the “knowledge Base” for each token (word / combined word block) and produces details of each. Some important details are : Parts Of Speech Tense (present/past/future) Number(singular/plural) Meaning Target Language etc..
84
Word Analyzer (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Word Analyzer (Sample Input Output) Actual Input Ram is a good boy. Modified uuu01 is a good boy. Output unu01: NOUN, *, singular, third, human, uuu01, … is : is a : a good: ADJ, positive, ভালো ,…. boy: NOUN, masculine, singular, third, human, ছেলে,….
85
Word Analyzer (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Word Analyzer (Sample Input Output) Actual Input I have a pen. Modified i have a pen. Output i : PRONOUN, *, singular, first, human, আমি , … have : have a : a pen : NOUN, neuter, singular, third, thing, কলম , …
86
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Sentence Analyzer Input : List of Token (word) detail Output : PLIL ( Pseudo Lingual for Indian Languages) It parses the sentence using the list of word detail. It produces output in a predefined textual structure called PLIL. PLIL is conceptually like a tree with arrangement as per Indian language structure (SOV: Subject Object Verb).
87
Sentence Analyzer (Sample Input Output)
Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Sentence Analyzer (Sample Input Output) Sentence Input Ram is a good boy. Input (at this module) unu01: NOUN, *, singular, third, human, uuu01, … is : is a : a good: ADJ, positive, ভালো ,…. boy: NOUN, masculine, singular, third, human, ছেলে,…. Output PLIL <aff {sub_np ( unu01 noun dont_care singular third [human] [unu01:m 8] [] [] ) } {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( good adjective positive [NIL] [BAla] [] [] ) ( boy noun masculine singular third [human] [Cele:m 8] [] [] ) } {main_vp v1 } > . sviram
88
Sentence Analyzer (Sample Input Output)
Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Sentence Analyzer (Sample Input Output) Sentence Input Ram is a good boy. Input (at this module) unu01: NOUN, *, singular, third, human, uuu01, … is : is a : a good: ADJ, positive, ভালো ,…. boy: NOUN, masculine, singular, third, human, ছেলে,…. Output PLIL <aff {sub_np ( unu01 noun dont_care singular third [human] [unu01:m 8] [] [] ) } {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( good adjective positive [NIL] [BAla] [] [] ) ( boy noun masculine singular third [human] [Cele:m 8] [] [] ) } {main_vp v1 } > . sviram
89
Sentence Analyzer (Sample Input Output)
Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Sentence Analyzer (Sample Input Output) Sentence Input I have a pen. Input (at this module) i : PRONOUN, *, singular, first, human, আমি , … have : have a : a pen : NOUN, neuter, singular, third, thing, কলম , … Output PLIL <aff {sub_np ( i noun dont_care singular first [human] [Ami:m 8] [] [] ) } ke_pas {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( pen noun neuter singular third [thing] [kalama:m 8] [] [] ) } {main_vp aux_have } > . sviram
90
Sentence Analyzer (Sample Input Output)
Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Sentence Analyzer (Sample Input Output) Sentence Input I have a pen. Input (at this module) i : PRONOUN, *, singular, first, human, আমি , … have : have a : a pen : NOUN, neuter, singular, third, thing, কলম , … Output PLIL <aff {sub_np ( i noun dont_care singular first [human] [Ami:m 8] [] [] ) } ke_pas {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( pen noun neuter singular third [thing] [kalama:m 8] [] [] ) } {main_vp aux_have } > . sviram
91
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Generate Output(s) Input : PLIL Output : translated output(s) This process generates target language output(s) as per PLIL value. Some specific tasks are like following : Generate actual translated value for NOUN, PRONOUN, VERB, PREPOSITION / POSTPOSITION etc. Movement of word for more accuracy and language specific style. etc..
92
Generate Output(s) (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Generate Output(s) (Sample Input Output) English Input Ram is a good boy. PLIL <aff {sub_np ( unu01 noun dont_care singular third [human] [unu01:m 8] [] [] ) } {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( good adjective positive [NIL] [BAla] [] [] ) ( boy noun masculine singular third [human] [Cele:m 8] [] [] ) } {main_vp v1 } > . sviram Output
93
Generate Output(s) (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Generate Output(s) (Sample Input Output) English Input Ram is a good boy. PLIL <aff {sub_np ( unu01 noun dont_care singular third [human] [unu01:m 8] [] [] ) } {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( good adjective positive [NIL] [BAla] [] [] ) ( boy noun masculine singular third [human] [Cele:m 8] [] [] ) } {main_vp v1 } > . sviram Output unu01 ^একটি | {} ~ ভাল ছেলে {}
94
Generate Output(s) (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Generate Output(s) (Sample Input Output) English Input I have a pen. PLIL <aff {sub_np ( i noun dont_care singular first [human] [Ami:m 8] [] [] ) } ke_pas {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( pen noun neuter singular third [thing] [kalama:m 8] [] [] ) } {main_vp aux_have } > . sviram Output
95
Generate Output(s) (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Generate Output(s) (Sample Input Output) English Input I have a pen. PLIL <aff {sub_np ( i noun dont_care singular first [human] [Ami:m 8] [] [] ) } ke_pas {pp } {obj1_np ( a det [ekati/{}] [tamil_a] [telugu_a] ) ( pen noun neuter singular third [thing] [kalama:m 8] [] [] ) } {main_vp aux_have } > . sviram Output আমার ^একটি | {} ~ কলম আছে
96
Input Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Post Processor Input : translated output, symbol table Output : Final output(s) It placed final changes (if required) at each translated output. Some important tasks are as following : Replace symbols from symbol table, placed by preprocessor (if any). Movement of word for more accuracy and language specific style. etc..
97
Post Processor (Sample Input Output)
Sentence Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Post Processor (Sample Input Output) Action Input Symbol table Final Output Initial/ আপনি কেমন আছেন </হ্যালো,/… হ্যালো, আপনি কেমন আছেন Symbol unu01 ^একটি | {} ~ ভাল ছেলে {} unu01/রাম /… রাম^একটি | {} ~ ভাল ছেলে {}
98
Sum up Input Sentence Pre Processor Exception Handler Knowledge Base
Output Sentence(s) Pre Processor Exception Handler Phrase Marker Word Analyzer Sentence Analyzer Output Generator Post Processoror Symbol table Knowledge Base PLIL Sum up
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.