Multilingual Information Access in a Digital Library

Slides:



Advertisements
Similar presentations
Million Book Project Today Gloriana St. Clair October 21, 2003 OCLC.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Tel More Telugu Morphological Generator
The Tiger Project: Korea Culture and Heritage DL Kim, Sung Hyuk Division of Information Science Sookmyung Women’s University, Seoul, Korea.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
7/16/2002JCDL 2002, Ray Larson The “Entry Vocabulary Index” Approach to Multilingual Search Ray R. Larson, Fredric Gey, Aitao Chen, Michael Buckland University.
Cross Language IR Philip Resnik Salim Roukos Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12,
FROM INFORMATION, KNOWLEDGE Prof. Marti Hearst MIMS Visit Day, 2006 Some Research Projects.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
S ANDHAN Indian language search engine. S ANDHAN – C ONSORTIUM P ROJECT IIT Bombay (co-ordinator) CDAC Noida (co-cordinator) CDAC Pune IIT Kharaghpur.
Content Level Access to Digital Library of India Pages
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Million Book Project (MBP) Gloriana St. Clair Johns Hopkins University February 5, 2003.
Overview of RISOT: Retrieval of Indic Script OCR’d Text Utpal GarainIndian Statistical Institute, Kolkata Tamaltaru PalIndian Statistical Institute, Kolkata.
Million Book Project (MBP) Coalition for Networked Information December 5-6, 2002.
IIIT Hyderabad - 1 Book Reading Interface: Image Processing Issues J.Chetan, V.Sreekanth, Rakesh Babu Vamshi Ambati and C.V.Jawahar.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
MINERVA Survey of Multilingualism Israel Dr. Allison Kupietzky, Coordinator WP 3, Minerva Israel Berlin, April 7 th, 2005.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
UNED at iCLEF 2008: Analysis of a large log of multilingual image searches in Flickr Victor Peinado, Javier Artiles, Julio Gonzalo and Fernando López-Ostenero.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
1 Statistical Machine Translation Models for Personalized Search Rohini U AOL India R&D, Bangalore India Vamshi Ambati Language.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Extracting Keyphrases from Books using Language Modeling Approaches Rohini U AOL India R&D, Bangalore India Bangalore
EXAMPLES OF DIGITAL ARCHIVES AND LIBRARIES Advanced Techniques in Processing Images Advanced Techniques in Processing Images Chapter 6. Slide 57.
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
Carnegie Mellon University’s Million Book Project (MBP) Laurel Foundation – August 27, 2002.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Multilingual Search Shibamouli Lahiri
Types of Dictionaries A. Types of Dictionaries in terms of form/medium: - Books (advantages & disadvantages) - CDs (advantages & disadvantages) - Internet/Online.
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Removing the Language Barrier Machine Translation And Digital Libraries.
Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information.
Million Book Project: Vision Becoming Reality Gabrielle Michalek, Carnegie Mellon Presentation to Carnegie Mellon Qatar Library November 9 & 10, 2005.
ASSISTANT It is much more, than a simple translator …
Food and Agriculture Organization of the UN GILW Library and Documentation Systems Division Food, Nutrition and Agriculture Ontology Portal.
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Digital Video Library - Jacky Ma.
Approaches to Machine Translation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
Multilingual Indexes for Detection and Translation
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Federated & Meta Search
Million Book Project Today
College of Information
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
European Network of e-Lexicography
11th Annual Conference April 10-13, 2003 Charlotte, North Carolina
Smart Media Interactions
Overview of PATENTSCOPE® search service Webinar September 2010
Peggy van der Kreeft Deutsche Welle
Approaches to Machine Translation
MATERIAL Resources for Cross-Lingual Information Retrieval
Overseas Business Director
Cross Language Information Retrieval (CLIR)
Performance and Scalability Issues of Multimedia Digital Library
Language Centered Research, Test Beds and Applications
Presentation transcript:

Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information Technology Hyderabad, India

IIIT Hyderabad - http://dli.iiit.ac.in Context Digital Library of India 155,000 English books 145,000 Other language books Population of literates 20% of India understand English 80% can not IIIT Hyderabad - http://dli.iiit.ac.in

Multilingual Access to Information Retrieve a book By metadata By keyword / content Cross Lingual Information Retrieval Read a book Help understand sentences in a language Help understand sentences across languages Machine Translation IIIT Hyderabad - http://dli.iiit.ac.in

Approaches to Multilingual Access Cross Lingual Retrieval Translate Query to Document Language Translate Document to Query Language Machine Translation Knowledge Based Approaches Corpus Based Approaches Hybrid Approaches IIIT Hyderabad - http://dli.iiit.ac.in

Challenges in Multilingual Access Corpus Based Approaches Unavailability of Parallel Corpus for pairs of languages Unavailability of Computational Linguistics Resources Dictionary Based Approaches Unavailability of multiple bilingual dictionaries IIIT Hyderabad - http://dli.iiit.ac.in

IIIT Hyderabad - http://dli.iiit.ac.in Resources Universal Dictionary Conceived and implemented by Michael Shamos at CMU, USA ITRANS A transcription scheme and associated tool built by IISc, IIIT and CMU Corpus Data Entry by TTD and DLI project TIDES project IIIT Hyderabad - http://dli.iiit.ac.in

IIIT Hyderabad - http://dli.iiit.ac.in Universal Dictionary IIIT Hyderabad - http://dli.iiit.ac.in

IIIT Hyderabad - http://dli.iiit.ac.in How are we doing it Cross Lingual Search (Identify Information) Dictionary lookup User feedback based Lucene Search Engine Machine Translation (Understand Information) Corpus based technique (EBMT) Dictionary based word-word lookup Good-enough translation vs Perfect translation IIIT Hyderabad - http://dli.iiit.ac.in

Cross Lingual Retrieval IIIT Hyderabad - http://dli.iiit.ac.in

Cross Lingual Retrieval IIIT Hyderabad - http://dli.iiit.ac.in

Reading Assistant System IIIT Hyderabad - http://dli.iiit.ac.in

IIIT Hyderabad - http://dli.iiit.ac.in Reading Assistant IIIT Hyderabad - http://dli.iiit.ac.in

IIIT Hyderabad - http://dli.iiit.ac.in Status Today CLIR for 6 languages MT for 3 languages Shakti (a knowledge based MT system) Parallel Corpus for Hindi-Eng UDICT About 40 Foreign Languages 6 Indian Languages IIIT Hyderabad - http://dli.iiit.ac.in

IIIT Hyderabad - http://dli.iiit.ac.in What more is needed? UDICT Improving coverage of existing languages Adding new languages Machine Translation Corpus acquisition State of art techniques applied to Indian Languages Multi-way parallel corpus development Textual format for the books Books currently are in Image formats OCR should be developed for textual content IIIT Hyderabad - http://dli.iiit.ac.in

Thank You Questions ?