S ANDHAN Indian language search engine. S ANDHAN – C ONSORTIUM P ROJECT IIT Bombay (co-ordinator) CDAC Noida (co-cordinator) CDAC Pune IIT Kharaghpur.

Slides:



Advertisements
Similar presentations
Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014.
Advertisements

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
MET-2013 Amit Jain Nitish Gupta Sukomal Pal Indian School of Mines, Dhanbad.
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
The Marathi Portal with a Search Engine Center for Indian Language Technology Solutions, IIT Bombay.
Knowledge Sharing Platform Empowering Communities through regional Content and Services C. Kathiresan C-DAC, Hyderabad, India Session V : e-Content & ICT.
Cross Language IR Philip Resnik Salim Roukos Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12,
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Advance Information Retrieval Topics Hassan Bashiri.
सुस्वागतम् Welcome Technology Development for Indian Languages
1. 2 Indian Languages AdiGaroKolamiMaltoRengma Afghani / Kabuli / PashtoGondiKomMaramSangtam AnalHalabiKondaMaringSavara AngamiHalamKonyakMiri.
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
Overview of Search Engines
Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008 Nilesh Padariya, Manoj Chinnakotla, Ajay Nagesh and Om P. Damani Center.
April 7, 2006 Natural Language Processing/Language Technology for the Web Cross-Language Information Retrieval (CLIR) Ananthakrishnan R Computer Science.
AU-KBC FIRE2008 Submission - Cross Lingual Information Retrieval Track: Tamil- English Pattabhi R.K Rao and Sobha. L AU-KBC Research Centre, MIT Campus,
Presentation of the CLIA Project by Pushpak Bhattacharyya, IIT Bombay, On behalf of the CLIA Consortium 12 Dec 2008 On the occasion of FIREatKolkata.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Shared Task Proposal, FIRE 2012 Monojit Choudhury Microsoft Research Lab India.
Kalyani Patel K.S.School of Business Management,Gujarat University.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Introduction to Apache Lucene/Solr CSCI 572: Information Retrieval and Search Engines Summer 2010.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking, Crawling and Indexing in IR.
Overview of RISOT: Retrieval of Indic Script OCR’d Text Utpal GarainIndian Statistical Institute, Kolkata Tamaltaru PalIndian Statistical Institute, Kolkata.
Language Technologies for Multilingual Societies META-FORUM 2011, June 27/28, 2011, Budapest, Hungary Swaran Lata Director & Head, Technology Development.
NERIL: Named Entity Recognition for Indian FIRE 2013.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Modular InfoTech’s Modular Infotech is proud to offer Tools and Components enabled with Indian language so as to address each & every client located across.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
India Jai Hind!. Cuisine Places Culture Languages Dresses Traditions.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
OPTIMAL TEXT SELECTION ALGORITHM ASR Project Meetings Dt: 08 June Rohit Kumar - LTRC, IIIT Hyderabad.
Audio Capture Capabilities. Overview Datamatics brings the powerful combination of expertise Market Research and Technology that gives the edge to your.
02/19/13English-Indian Language MT (Phase-II)1 English – Indian Language Machine Translation Anuvadaksh Phase – II - The SMT Team, CDAC Mumbai.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Semantic Relations; UNL; Towards Dependency Parsing.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
An ISO 9001:2008 Company With all the tools you need to compute in Indian Languages.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Closing Session  FIRE shared task  Results of yesterday’s experiments  Open discussion and Your Feedback.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Precision, Recall, F- score, Map.
Multilingual Search Shibamouli Lahiri
Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information.
Crescendo Transcriptions Pvt. Ltd. Translation Manual.
1 Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung.
Managing diversity in a changing environment A cross cultural comparison with Germany.
Vision Transtech India– About Us Established in 2004 A Global Services company Adopters of New technology Customization 150+ highly skilled resources always.
Cross-Language Information Retrieval (CLIR)
Products/Solutions/Expertise of C-DAC Mumbai in Smart City Domain
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
We Translate… You Market!!
To read live sports news in English, visit -
Query Expansion using PRF based CBD approach for Documents Retrieval
Testing Challenges in Indic Languages
Technology Development
Multilingual Information Access in a Digital Library
India Geography and Languages
Computational Linguistics: New Vistas
India Geography and Languages
Introduction to Search Engines
Presentation transcript:

S ANDHAN Indian language search engine

S ANDHAN – C ONSORTIUM P ROJECT IIT Bombay (co-ordinator) CDAC Noida (co-cordinator) CDAC Pune IIT Kharaghpur Jadhavpur University ISI Kolkata IIIT Hyderabad AU KBC AU CEG Gauhati University DAIICT Gujarat IIIT Bhubaneswar TDIL 2

I NTRODUCTION Cross Lingual Information Retrieval (CLIR) engine for Indian languages Input: Query in one of the six Indian languages ( Hindi, Marathi, Tamil, Telugu, Bengali, Punjabi, Assamese. Gujarati, Oriya) Output: In Hindi, English and Query Language Currently in the second phase of the project Three new languages are added in second phase Assamese, Gujarati, Oriya Built on top of Nutch Framework 3

S OFTWARE U SED Nutch v0.9 – Framework Hadoop – Distributed Crawling Lucene – Indexing Moses/GIZA++ - Training models Tomcat – Deployment 4

5 Fetcher Web Analyzer MWE Lookup NE Lookup Domain Identifier Language Identifier Font Transcoder Indexer CMLifier UNL Index Snippet Translation Summary Generation Snippet Generation Translation /Transliteration MWE Lookup NE Lookup Analyzer Query Formulation Index Information Extraction

R ESOURCES D EVELOPED Language specific analyzers Stop word List Bilingual Dictionary ( X-English, X-Hindi) NE List MWE List Transliteration Models 6

IIT B OMBAY P ARTICIPATION Marathi Vertical Code Integration and Maintenance MWE Identification Development of Tracker Error Analysis Relevance Judgement 7

A CTION P LAN Public release of 5 languages monolingual search engine on April 14 th 2012 Bengali, Hindi, Marathi, Tamil, Telugu Public Release of remaining 4 languages monolingual search and 5 languages cross lingual search August 15 th 2012 Assamese, Gujarati, Oriya, Punjabi (Monolingual) Bengali, Hindi, Marathi, Tamil, Telugu (Cross lingual) 8

H ORIZONTAL T ASKS D ISTRIBUTION 9 Horizontal TaskResponsible Institute GUICDAC Pune Query Formulation IIIT Hyderabad Language/Domain Identifier Font-Transcoding CDAC Pune Crawling Information Extraction AU-KBC NE Identification MWE IdentificationIIT Bombay Ranking IIT Kharagpur Indexing CMLifier EvaluationISI Kolkata

D ISTRIBUTION OF V ERTICAL T ASKS LanguageResponsible Institutes Coordinating Institute Hindi IIT Bombay, IIIT Hyderabad, CDAC Noida CDAC Noida MarathiIIT Bombay, CDAC-PuneIIT Bombay Bengali IIT Kharagpur, JU, ISI Kolkata IIT Kharagpur PunjabiCDAC-Noida TamilAU-KBC, AU-CEGAU-KBC TeluguIIIT Hyderabad AssameseGauhati University OriyaIIIT Bhubaneshwar GujaratiDAIICT 10

K EY A CHIEVEMENTS Organized Forum for Information Retrieval (FIRE) 2008, 2010 and a workshop for CLIR evaluation for Indian Languages Demonstrated a basic integrated version of the system at IJCNLP 2008 and ELITEX Media coverage by ‘ The Indian Express ’ news paper and ‘Hindustan Times’ ( ( Development of a strong and connected research community around CLIR in Indian languages. Publications in top IR and NLP forum 11