Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.

Slides:



Advertisements
Similar presentations
Million Book Project Today Gloriana St. Clair October 21, 2003 OCLC.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Tel More Telugu Morphological Generator
The Tiger Project: Korea Culture and Heritage DL Kim, Sung Hyuk Division of Information Science Sookmyung Women’s University, Seoul, Korea.
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Cross Language IR Philip Resnik Salim Roukos Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12,
FROM INFORMATION, KNOWLEDGE Prof. Marti Hearst MIMS Visit Day, 2006 Some Research Projects.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
S ANDHAN Indian language search engine. S ANDHAN – C ONSORTIUM P ROJECT IIT Bombay (co-ordinator) CDAC Noida (co-cordinator) CDAC Pune IIT Kharaghpur.
Content Level Access to Digital Library of India Pages
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
1 Unit 7 Computer-aided Translation. 2 MT and CAT  Human-aided Machine Translation (HAMT)  The machine (the computer) plays the central role in translation.
1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Million Book Project (MBP) Gloriana St. Clair Johns Hopkins University February 5, 2003.
Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University
Million Book Project (MBP) Coalition for Networked Information December 5-6, 2002.
IIIT Hyderabad - 1 Book Reading Interface: Image Processing Issues J.Chetan, V.Sreekanth, Rakesh Babu Vamshi Ambati and C.V.Jawahar.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics November, Blaubeuren,
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
UNED at iCLEF 2008: Analysis of a large log of multilingual image searches in Flickr Victor Peinado, Javier Artiles, Julio Gonzalo and Fernando López-Ostenero.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
1 Statistical Machine Translation Models for Personalized Search Rohini U AOL India R&D, Bangalore India Vamshi Ambati Language.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Extracting Keyphrases from Books using Language Modeling Approaches Rohini U AOL India R&D, Bangalore India Bangalore
EXAMPLES OF DIGITAL ARCHIVES AND LIBRARIES Advanced Techniques in Processing Images Advanced Techniques in Processing Images Chapter 6. Slide 57.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
Carnegie Mellon University’s Million Book Project (MBP) Laurel Foundation – August 27, 2002.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Multilingual Search Shibamouli Lahiri
Types of Dictionaries A. Types of Dictionaries in terms of form/medium: - Books (advantages & disadvantages) - CDs (advantages & disadvantages) - Internet/Online.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Removing the Language Barrier Machine Translation And Digital Libraries.
Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information.
Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005.
ASSISTANT It is much more, than a simple translator …
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Food and Agriculture Organization of the UN GILW Library and Documentation Systems Division Food, Nutrition and Agriculture Ontology Portal.
Digital Video Library - Jacky Ma.
Approaches to Machine Translation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
Basque language: is IT right on?
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Million Book Project Today
College of Information
Multilingual Information Access in a Digital Library
Peggy van der Kreeft Deutsche Welle
Approaches to Machine Translation
MATERIAL Resources for Cross-Lingual Information Retrieval
Overseas Business Director
Presentation transcript:

Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information Technology Hyderabad, India

IIIT Hyderabad Context Digital Library of India Digital Library of India 155,000 English books 155,000 English books 145,000 Other language books 145,000 Other language books Population of literates Population of literates 20% of India understand English 20% of India understand English 80% can not 80% can not

IIIT Hyderabad Multilingual Access to Information Retrieve a book Retrieve a book By metadata By metadata By keyword / content By keyword / content Cross Lingual Information Retrieval Cross Lingual Information Retrieval Read a book Read a book Help understand sentences in a language Help understand sentences in a language Help understand sentences across languages Help understand sentences across languages Machine Translation Machine Translation

IIIT Hyderabad Approaches to Multilingual Access Cross Lingual Retrieval Cross Lingual Retrieval Translate Query to Document Language Translate Query to Document Language Translate Document to Query Language Translate Document to Query Language Machine Translation Machine Translation Knowledge Based Approaches Knowledge Based Approaches Corpus Based Approaches Corpus Based Approaches Hybrid Approaches Hybrid Approaches

IIIT Hyderabad Challenges in Multilingual Access Corpus Based Approaches Corpus Based Approaches Unavailability of Parallel Corpus for pairs of languages Unavailability of Parallel Corpus for pairs of languages Unavailability of Computational Linguistics Resources Unavailability of Computational Linguistics Resources Dictionary Based Approaches Dictionary Based Approaches Unavailability of multiple bilingual dictionaries Unavailability of multiple bilingual dictionaries

IIIT Hyderabad Resources Universal Dictionary Universal Dictionary Conceived and implemented by Michael Shamos at CMU, USA Conceived and implemented by Michael Shamos at CMU, USA ITRANS ITRANS A transcription scheme and associated tool built by IISc, IIIT and CMU A transcription scheme and associated tool built by IISc, IIIT and CMU Corpus Corpus Data Entry by TTD and DLI project Data Entry by TTD and DLI project TIDES project TIDES project

IIIT Hyderabad Universal Dictionary

IIIT Hyderabad How are we doing it Cross Lingual Search (Identify Information) Cross Lingual Search (Identify Information) Dictionary lookup Dictionary lookup User feedback based User feedback based Lucene Search Engine Lucene Search Engine Machine Translation (Understand Information) Machine Translation (Understand Information) Corpus based technique (EBMT) Corpus based technique (EBMT) Dictionary based word-word lookup Dictionary based word-word lookup Good-enough translation vs Perfect translation Good-enough translation vs Perfect translation

IIIT Hyderabad Cross Lingual Retrieval

IIIT Hyderabad Cross Lingual Retrieval

IIIT Hyderabad Reading Assistant System

IIIT Hyderabad Reading Assistant

IIIT Hyderabad Status Today CLIR for 6 languages CLIR for 6 languages MT for 3 languages MT for 3 languages Shakti (a knowledge based MT system) Shakti (a knowledge based MT system) Parallel Corpus for Hindi-Eng Parallel Corpus for Hindi-Eng UDICT UDICT About 40 Foreign Languages About 40 Foreign Languages 6 Indian Languages 6 Indian Languages

IIIT Hyderabad What more is needed? UDICT UDICT Improving coverage of existing languages Improving coverage of existing languages Adding new languages Adding new languages Machine Translation Machine Translation Corpus acquisition Corpus acquisition State of art techniques applied to Indian Languages State of art techniques applied to Indian Languages Multi-way parallel corpus development Multi-way parallel corpus development Textual format for the books Textual format for the books Books currently are in Image formats Books currently are in Image formats OCR should be developed for textual content OCR should be developed for textual content

Thank You Questions ?