FIRE 2013 By:- Hardik Joshi 1, Apurva Bhatt 1, Honey Patel 2 1 Department of Computer Science, Gujarat.

Slides:

Advertisements

Similar presentations

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:

Advertisements

Chapter 5: Introduction to Information Retrieval

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.

Pushpak Bhattacharyya CSE Dept. IIT Bombay 1st Nov, 2012

Syllable. Definition A syllable is a unit of sound composed of a central peak of sonority (usually a vowel), and the consonants that cluster around this.

A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Evaluating Search Engine

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:

ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. Various techniques for using music and songs to teach listening.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

AU-KBC FIRE2008 Submission - Cross Lingual Information Retrieval Track: Tamil- English Pattabhi R.K Rao and Sobha. L AU-KBC Research Centre, MIT Campus,

Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.

MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian.

Entropy in Machine Transliteration & Phonology Bhargava Reddy B.Tech Project.

Shared Task Proposal, FIRE 2012 Monojit Choudhury Microsoft Research Lab India.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

Transliteration Transliteration CS 626 course seminar by Purva Joshi Mugdha Bapat Aditya Joshi Manasi Bapat

Transliteration System

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.

Presenter: Shanshan Lu 03/04/2010

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Extracting bilingual terminologies from comparable corpora By: Ahmet Aker, Monica Paramita, Robert Gaizauskasl CS671: Natural Language Processing Prof.

Truth Discovery with Multiple Conflicting Information Providers on the Web KDD 07.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

LOGO 1 Corroborate and Learn Facts from the Web Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Shubin Zhao, Jonathan Betz (KDD '07 )

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using the Web for Automated Translation Extraction in.

An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.

Learning Phonetic Similarity for Matching Named Entity Translations and Mining New Translations Wai Lam Ruizhang Huang Pik-Shan Cheung Department of Systems.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,

A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Closing Session  FIRE shared task  Results of yesterday’s experiments  Open discussion and Your Feedback.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

An Efficient Hindi-Urdu Transliteration System Nisar Ahmed PhD Scholar Department of Computer Science and Engineering, UET Lahore.

Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.

Language Identification and Part-of-Speech Tagging

Sampath Jayarathna Cal Poly Pomona

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky

Evaluation of IR Systems

College of Engineering

Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2

Sounds of English Semester II Lesson 7.

Information Retrieval and Web Design

Presentation transcript:

FIRE 2013 By:- Hardik Joshi 1, Apurva Bhatt 1, Honey Patel 2 1 Department of Computer Science, Gujarat University, Ahmedabad, India. 2 L.J. College of Engineering, Ahmedabad, India Dec Presentation on : Transliterated Search using Syllabification 4rth Dec 2013

 Introduction  Our Approach  Syllabification  Our Results  Error And Analysis  Conclusion Content

 There is need to provide local language support in web based applications because various domains such as ecommerce sites require English knowledge.  The challenge in transliteration is take the word “ राष्ट्रपति ” for this word “rashtrapati”, “rashtrapathi”, “raashtrapathy”, “raashtrpati” are various possible combinations may possible which one should be correct is again an issue.  Transliteration tasks become difficult in presence of out of vocabulary words (OOV) and noisy words. Introduction

 In both the subtasks, the transliteration was performed using syllabification approach.  In the subtask-1, we had done the morphological analysis of English words, then a corpus based approach used to identify frequently occurring Hindi words.  In the subtask-2, the queries were formulated that contained both Roman and Devanagari script and Roman script for separate run submissions.

 Linguists have different languages have constraints on possible consonant and vowel sequences that characterize not only the word structure for the language but also the syllable structure.  center (nucleus)  beginning (onset)  End is coda Syllabification Approach syllable Onset nucleus coda Rhyme

 Word  Sprint Syllable Structure Example

Source Target s u d a k a r स ◌ ु द ◌ ा क र c h h a g a n छ ग ण j i t e s h ज ि ◌ त ◌ े श n a r a y a n न ◌ ा र ◌ ा य ण s h i v श ि ◌ व m a d h a v म ◌ ा ध व m o h a m m a d म ◌ ो ह म ◌ ् म द Training Format

Algorithm for subtask-I  Step 1: First of all words are fetching in English dictionary.  Step 2: perform spell-check,stemming and also morphological analysis for English language, if no spell error and match found then label the word as English =E.  Step 3: If English word are not found then check with English corpus of US News paper.  Step 4: If English word found then check with English corpus of Indian news paper.  Step 5: If English word found in US News paper and not found in Indian news paper then word=E.

 Step 6: Step 2 and step 5 are parallel apply for English words and label as =\E.  Step 7: Remaining words would be transliterate into Hindi words and Label the word as = \H.  Step 8: Apply to Moses tool,which one is help English words transliterate into Hindi words.

RESULT OF SUBTASK-1

Results For Subtask 2  Run 1 “ मेरे सापनोन कि रानी काब् आयेगी तु mere sapnon ki rani kab aayegi tu”.  Run 2 “mere sapnon ki rani kab aayegi tu”. MetricsRun-1Run-2 Maximum Score Median Score MAP MRR

 There are some problems in the transliteration which decreased the precision.  Error in the maatra : “sapnon” => “ सापनोन ”, “ki” => “ की ”, “kab” => “ काब ”, “main” => “ मिन ” & “mein” => “ मीन ”, na => न & ka => क  Multiple Mapping of the words e.g. T = त, ट, i.e. tera=> टेरा, tum => तूम, to => टो, teri => टेरि.  Missing sounds ( फ, ख, छ ‘chh’, ksh) i. e. for word “accha” we got “ आक्का ”, for, “poochho” we got “ पूछोट ”. Error And Analysis

 Multiple Transliterations- c,k  The vowel are not giving perfect answers i.e. “lo” => “ लॉ ”, “ho”=> “ होर ”, “ko” => “ कॉ ”  Spelling Variations(shree,shri)  Conjuncts formation(“kya” => “ केया ”)  Missing of vowels ‘ak tr khan’ ( अक ् तर खान )  ‘y’ As Vowel: ‘anthony’ & ‘Shyam’

 We used the syllabification approach and considered the most probable term in the transliteration process. The word labeling task was performed assuming that a term either belongs to English language or Hindi language. We were able to get high accuracy in English recall as the labeling approach used morphological analysis and dictionary approach. However due to syllabification model, the transliteration did not give high precision resulting in lower precision of transliteration tasks and subsequently lower precision metrics in the song lyrics retrieval tasks. Conclusion