Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Multimedia Database Systems
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
Information Retrieval in Practice
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Aparna Kulkarni Nachal Ramasamy Rashmi Havaldar N-grams to Process Hindi Queries.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
POPULATION AND HOUSING CENSUSES IN SLOVAKIA ON THE WEBSITE Miroslav Hudec Pavol Büchler INFOSTAT – Bratislava MSIS Geneva
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Query Operations Relevance Feedback & Query Expansion.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Chapter 6: Information Retrieval and Web Search
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Web- and Multimedia-based Information Systems Lecture 2.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Information Retrieval in Practice
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Search Engine Architecture
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
An Automatic Construction of Arabic Similarity Thesaurus
Information Retrieval and Web Search
MID-SEM REVIEW.
Multimedia Information Retrieval
The Anatomy of a Large-Scale Hypertextual Web Search Engine
אחזור מידע, מנועי חיפוש וספריות
Introduction to Search Engines
Basic Information Retrieval
MATERIAL Resources for Cross-Lingual Information Retrieval
Magnet & /facet Zheng Liang
Lecture 8 Information Retrieval Introduction
Chapter 5: Information Retrieval and Web Search
CS246: Information Retrieval
A Suite to Compile and Analyze an LSP Corpus
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
Information Retrieval and Web Design
Introduction to Search Engines
Introduction Dataset search
Presentation transcript:

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone, today I'm going to present a paper which mainly talks about Cross-language Information Retrieval. Zheng Wang

Content Introduction Literature Review Problem Statement Proposed approach Conclusion Content This paper aims to give us a brief overview of cross language information retrieval and related techniques in this field. We will mainly discuss about five parts in this paper. First we’ll have an introduction of what is Cross Language Information Retrieval. Then, we will have a brief review of some papers in this field. After that we'll look at what problems that current system has. And then propose some approach to address these problems. In the end, we give the conclusion of our proposed approach.

Cross Language Information Retrieval permits the user to retrieve their documents in other language than the query.  Overview In general, cross language information retrieval permits the user to retrieve their documents in other language than the query. For example, you can use a Chinese query to retrieve both English and Chinese documents.

“ Query Expansion for Cross Language Information Retrieval Improvement ”  CLIR module:  The CLIR module adds a translation service to the IR engine.  QE module: It aims to provide query terms for some corresponding expansion terms. Literature Review In this part, we are going to review four important papers which address different aspect in this field. We will just go through the main concepts or techniques used in these papers instead of looking into every detail of them. The first paper is Query Expansion for … This paper mainly talks about how to overcome the problem of difference between translated data and human language which is induced by using Query expansion(QE). Basically, Query Expansion is the process of a search engine adding search terms to a user‘s weighted search in order to improve precision and/or recall. There are two important concepts in this paper: The first one is CLIR module, which adds a translation service to the IR engine. This module involves translating the contents before indexing them. The original text is kept in the memory of the engine in order to present the original version of texts to users. only the translation is indexed. The second concept is QE module, It aims … The engine searches the index of translated metadata with the expanded query from QE module

Literature Review “Enhanced Query Expansion in English-Arabic CLIR” Query Expansion proved to be effective  Using the top retrieved documents the concept of a query is enhanced by adding to its context related terms.  co-occurrence analysis assumption: if two terms co-occur then they tend to be related. Apply Disambiguation to reach optimum effectiveness: not all expanded terms are necessarily related to the query  Literature Review The paper tells that Query Expansion proved to be effective.  Using the top….The concept of a query Related terms are extracted from these documents using co-occurrence analysis. Co-occurrence analysis assumes ….. the final expanded query is formed by adding those terms.  Furthermore Optimum effectiveness could be reached by applying disambiguation: It assumes that not all …. In this case only the terms that are directly related to the initial query are considered and analyzed 

“Hindi - English Based Cross Language Information Retrieval System for Allahabad Museum”  Conversion of Hindi words to English words :a dictionary database  Retrieval of documents and images : Vector based model  Clustering: a method to process the data and classify documents into different classes and group  Spell checkers : Edit distance algorithm Conversion of English Documents to Hindi Documents  Literature Review Okay, the third paper is…. The paper talks about the system developed for CLIR  The documents and images for query processing were related to Allahabad museum. The languages used were Hindi and English.  The documents were stored in English. The user has a choice to input a query keyword in either Hindi or English.  The system retrieves the relevant documents and displays them in the desired language.  To convert Hindi words to English words: a dictionary database was used, for those not in the dictionary database, Hindi character to English character mapping has been used.  To retrieve documents and images, vector based model was used. all the documents were in English format. Preprocessing of documents is done by stemming and stop word removal. After that each documents is represented by vector in vector space.  Finally after retrieval of documents from the above retrieval algorithm they showed the output in the language of user choice. If it is English then there is no need to convert but if it is Hindi then conversion is required by using API.  In this paper they also used a technique called clustering, it is a method to process the data and classify documents into different classes and group so that we can know what object lies in which category and can find the similarity easily. By applying clustering in vector space, the complexity of search time was reduced  In this paper they have discussed the importance of spell checkers, They have worked with web as corpus in order to design flexible and dynamic spell checker. Edit distance algorithm has been used.

“Cross Language Information Retrieval: In Indian Language Perspective”  Literature Review This paper is relatively simple than the others. So we just quickly go through it. I just give the name of this paper as your reference. this paper review the work done in the field of Indian languages by various researchers for CLIR system.

The existing CLIR system for English Hindi has some of the following limitations  Translation Disambiguation  Out-of-Vocabulary problem  Translating phrases  Named entities  Problem Statement This paper focus on developing a better CLIR systems for Hindi language. Here are some limitations that current system has. They are trying to develop a new system that would overcome the existing limitations.

Proposed Approach Processing the Query Building the Index A dictionary based on particular domain. Index of English words and their corresponding Hindi words. Proposed Approach Okay, let’s see what approach they have proposed to develop a better system. Basically the project is divided in two parts: First Processing the Query and second Building the Index  query processing will be done to simplify the query. The query could be both English and Hindi. we will keep the count of Hindi and English words so during retrieval of document more weight can be given to the documents in the language with higher score. The second module consists of index which will contain the English words and their corresponding Hindi words and relationships.  The documents will be processed and stored in index.  All the terms will be stored in English as well as Hindi.  The English term will also contain the term id of its respective Hindi term and the documents that contain that English term and vice versa. So when user types query in English and wants answer in Hindi the Hindi term id will be selected and the documents in Hindi corresponding to that term will be retrieved.  Here is a architecture of the system.

For the input query for a particular domain in Hindi or English the documents will be retrieved. The documents can be both in English and Hindi. The Precision and Recall will be used as accuracy measures to evaluate the system.  Conclusion Here is the Conclusion of the new system. In summary, the documents could be both English and Hindi, the precision and recall will be used to evaluate the system.

Review What is Cross-language information retrieval system Query Expansion, CLIR module, QE module. Conversion between two languages, clustering. Problem and approach Review Okay finally we take a very brief review about what we have learn. First we talk about what is CLIR system. Then, we reviewed four papers where we got to know some concept like: Query Expansion, CLIR module, QE module. Then we talked about how to enhance the Query Expansion In the third paper, we have seen how to convert between two languages and how to reduce the search complexity by applying clustering in vector space. Finally we give some approaches regarding current problems.

Thanks! Hope it’s helpful.