Download presentation
Presentation is loading. Please wait.
Published byMarlene Haynes Modified over 6 years ago
1
Multilingual Information Access in a Digital Library
Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information Technology Hyderabad, India
2
IIIT Hyderabad - http://dli.iiit.ac.in
Context Digital Library of India 155,000 English books 145,000 Other language books Population of literates 20% of India understand English 80% can not IIIT Hyderabad -
3
Multilingual Access to Information
Retrieve a book By metadata By keyword / content Cross Lingual Information Retrieval Read a book Help understand sentences in a language Help understand sentences across languages Machine Translation IIIT Hyderabad -
4
Approaches to Multilingual Access
Cross Lingual Retrieval Translate Query to Document Language Translate Document to Query Language Machine Translation Knowledge Based Approaches Corpus Based Approaches Hybrid Approaches IIIT Hyderabad -
5
Challenges in Multilingual Access
Corpus Based Approaches Unavailability of Parallel Corpus for pairs of languages Unavailability of Computational Linguistics Resources Dictionary Based Approaches Unavailability of multiple bilingual dictionaries IIIT Hyderabad -
6
IIIT Hyderabad - http://dli.iiit.ac.in
Resources Universal Dictionary Conceived and implemented by Michael Shamos at CMU, USA ITRANS A transcription scheme and associated tool built by IISc, IIIT and CMU Corpus Data Entry by TTD and DLI project TIDES project IIIT Hyderabad -
7
IIIT Hyderabad - http://dli.iiit.ac.in
Universal Dictionary IIIT Hyderabad -
8
IIIT Hyderabad - http://dli.iiit.ac.in
How are we doing it Cross Lingual Search (Identify Information) Dictionary lookup User feedback based Lucene Search Engine Machine Translation (Understand Information) Corpus based technique (EBMT) Dictionary based word-word lookup Good-enough translation vs Perfect translation IIIT Hyderabad -
9
Cross Lingual Retrieval
IIIT Hyderabad -
10
Cross Lingual Retrieval
IIIT Hyderabad -
11
Reading Assistant System
IIIT Hyderabad -
12
IIIT Hyderabad - http://dli.iiit.ac.in
Reading Assistant IIIT Hyderabad -
13
IIIT Hyderabad - http://dli.iiit.ac.in
Status Today CLIR for 6 languages MT for 3 languages Shakti (a knowledge based MT system) Parallel Corpus for Hindi-Eng UDICT About 40 Foreign Languages 6 Indian Languages IIIT Hyderabad -
14
IIIT Hyderabad - http://dli.iiit.ac.in
What more is needed? UDICT Improving coverage of existing languages Adding new languages Machine Translation Corpus acquisition State of art techniques applied to Indian Languages Multi-way parallel corpus development Textual format for the books Books currently are in Image formats OCR should be developed for textual content IIIT Hyderabad -
15
Thank You Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.