Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multilingual Information Access in a Digital Library

Similar presentations


Presentation on theme: "Multilingual Information Access in a Digital Library"— Presentation transcript:

1 Multilingual Information Access in a Digital Library
Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information Technology Hyderabad, India

2 IIIT Hyderabad - http://dli.iiit.ac.in
Context Digital Library of India 155,000 English books 145,000 Other language books Population of literates 20% of India understand English 80% can not IIIT Hyderabad -

3 Multilingual Access to Information
Retrieve a book By metadata By keyword / content Cross Lingual Information Retrieval Read a book Help understand sentences in a language Help understand sentences across languages Machine Translation IIIT Hyderabad -

4 Approaches to Multilingual Access
Cross Lingual Retrieval Translate Query to Document Language Translate Document to Query Language Machine Translation Knowledge Based Approaches Corpus Based Approaches Hybrid Approaches IIIT Hyderabad -

5 Challenges in Multilingual Access
Corpus Based Approaches Unavailability of Parallel Corpus for pairs of languages Unavailability of Computational Linguistics Resources Dictionary Based Approaches Unavailability of multiple bilingual dictionaries IIIT Hyderabad -

6 IIIT Hyderabad - http://dli.iiit.ac.in
Resources Universal Dictionary Conceived and implemented by Michael Shamos at CMU, USA ITRANS A transcription scheme and associated tool built by IISc, IIIT and CMU Corpus Data Entry by TTD and DLI project TIDES project IIIT Hyderabad -

7 IIIT Hyderabad - http://dli.iiit.ac.in
Universal Dictionary IIIT Hyderabad -

8 IIIT Hyderabad - http://dli.iiit.ac.in
How are we doing it Cross Lingual Search (Identify Information) Dictionary lookup User feedback based Lucene Search Engine Machine Translation (Understand Information) Corpus based technique (EBMT) Dictionary based word-word lookup Good-enough translation vs Perfect translation IIIT Hyderabad -

9 Cross Lingual Retrieval
IIIT Hyderabad -

10 Cross Lingual Retrieval
IIIT Hyderabad -

11 Reading Assistant System
IIIT Hyderabad -

12 IIIT Hyderabad - http://dli.iiit.ac.in
Reading Assistant IIIT Hyderabad -

13 IIIT Hyderabad - http://dli.iiit.ac.in
Status Today CLIR for 6 languages MT for 3 languages Shakti (a knowledge based MT system) Parallel Corpus for Hindi-Eng UDICT About 40 Foreign Languages 6 Indian Languages IIIT Hyderabad -

14 IIIT Hyderabad - http://dli.iiit.ac.in
What more is needed? UDICT Improving coverage of existing languages Adding new languages Machine Translation Corpus acquisition State of art techniques applied to Indian Languages Multi-way parallel corpus development Textual format for the books Books currently are in Image formats OCR should be developed for textual content IIIT Hyderabad -

15 Thank You Questions ?


Download ppt "Multilingual Information Access in a Digital Library"

Similar presentations


Ads by Google