Presentation is loading. Please wait.

Presentation is loading. Please wait.

FIRE 2013 By:- Hardik Joshi 1, Apurva Bhatt 1, Honey Patel 2 1 Department of Computer Science, Gujarat.

Similar presentations


Presentation on theme: "FIRE 2013 By:- Hardik Joshi 1, Apurva Bhatt 1, Honey Patel 2 1 Department of Computer Science, Gujarat."— Presentation transcript:

1 FIRE 2013 By:- Hardik Joshi 1, Apurva Bhatt 1, Honey Patel 2 {hardikjjoshi,apurva.bhatt7,Honeypatel.39}@gmail.com 1 Department of Computer Science, Gujarat University, Ahmedabad, India. 2 L.J. College of Engineering, Ahmedabad, India Dec Presentation on : Transliterated Search using Syllabification Approach @FIRE 4rth Dec 2013

2  Introduction  Our Approach  Syllabification  Our Results  Error And Analysis  Conclusion Content

3  There is need to provide local language support in web based applications because various domains such as ecommerce sites require English knowledge.  The challenge in transliteration is take the word “ राष्ट्रपति ” for this word “rashtrapati”, “rashtrapathi”, “raashtrapathy”, “raashtrpati” are various possible combinations may possible which one should be correct is again an issue.  Transliteration­­ tasks become difficult in presence of out of vocabulary words (OOV) and noisy words. Introduction

4  In both the subtasks, the transliteration was performed using syllabification approach.  In the subtask-1, we had done the morphological analysis of English words, then a corpus based approach used to identify frequently occurring Hindi words.  In the subtask-2, the queries were formulated that contained both Roman and Devanagari script and Roman script for separate run submissions.

5  Linguists have different languages have constraints on possible consonant and vowel sequences that characterize not only the word structure for the language but also the syllable structure.  Vowels @ center (nucleus)  consonant @ beginning (onset)  End is coda Syllabification Approach syllable Onset nucleus coda Rhyme

6  Word  Sprint Syllable Structure Example

7

8 Source Target s u d a k a r स ◌ ु द ◌ ा क र c h h a g a n छ ग ण j i t e s h ज ि ◌ त ◌ े श n a r a y a n न ◌ ा र ◌ ा य ण s h i v श ि ◌ व m a d h a v म ◌ ा ध व m o h a m m a d म ◌ ो ह म ◌ ् म द Training Format

9 Algorithm for subtask-I  Step 1: First of all words are fetching in English dictionary.  Step 2: perform spell-check,stemming and also morphological analysis for English language, if no spell error and match found then label the word as English =E.  Step 3: If English word are not found then check with English corpus of US News paper.  Step 4: If English word found then check with English corpus of Indian news paper.  Step 5: If English word found in US News paper and not found in Indian news paper then word=E.

10  Step 6: Step 2 and step 5 are parallel apply for English words and label as =\E.  Step 7: Remaining words would be transliterate into Hindi words and Label the word as = \H.  Step 8: Apply to Moses tool,which one is help English words transliterate into Hindi words.

11 RESULT OF SUBTASK-1

12 Results For Subtask 2  Run 1 “ मेरे सापनोन कि रानी काब् आयेगी तु mere sapnon ki rani kab aayegi tu”.  Run 2 “mere sapnon ki rani kab aayegi tu”. MetricsRun-1Run-2 Maximum Score Median Score nDCG@50.56270.52620.80520.5620 nDCG@100.56190.52320.80020.5608 MAP0.25460.21630.42360.2355 MRR0.58350.57300.84400.5884

13  There are some problems in the transliteration which decreased the precision.  Error in the maatra : “sapnon” => “ सापनोन ”, “ki” => “ की ”, “kab” => “ काब ”, “main” => “ मिन ” & “mein” => “ मीन ”, na => न & ka => क  Multiple Mapping of the words e.g. T = त, ट, i.e. tera=> टेरा, tum => तूम, to => टो, teri => टेरि.  Missing sounds ( फ, ख, छ ‘chh’, ksh) i. e. for word “accha” we got “ आक्का ”, for, “poochho” we got “ पूछोट ”. Error And Analysis

14  Multiple Transliterations- c,k  The vowel are not giving perfect answers i.e. “lo” => “ लॉ ”, “ho”=> “ होर ”, “ko” => “ कॉ ”  Spelling Variations(shree,shri)  Conjuncts formation(“kya” => “ केया ”)  Missing of vowels ‘ak tr khan’ ( अक ् तर खान )  ‘y’ As Vowel: ‘anthony’ & ‘Shyam’

15  We used the syllabification approach and considered the most probable term in the transliteration process. The word labeling task was performed assuming that a term either belongs to English language or Hindi language. We were able to get high accuracy in English recall as the labeling approach used morphological analysis and dictionary approach. However due to syllabification model, the transliteration did not give high precision resulting in lower precision of transliteration tasks and subsequently lower precision metrics in the song lyrics retrieval tasks. Conclusion


Download ppt "FIRE 2013 By:- Hardik Joshi 1, Apurva Bhatt 1, Honey Patel 2 1 Department of Computer Science, Gujarat."

Similar presentations


Ads by Google