Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008.

Slides:



Advertisements
Similar presentations
CODE/ CODE SWITCHING.
Advertisements

MAIN NOTIONS OF MORPHOLOGY
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
The Universal Networking Language UNL Foundation United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Statistical NLP: Lecture 3
MORPHOLOGY - morphemes are the building blocks that make up words.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
CSE Department, I.I.T. Bombay Automatic Lexicon Generation through WordNet by Nitin Verma and Pushpak Bhattacharyya Jan 21, 2004.
Using Schema Matching to Simplify Heterogeneous Data Translation Tova Milo, Sagit Zohar Tel Aviv University.
Stemming, tagging and chunking Text analysis short of parsing.
Universal Networking Language
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Creation of a Russian-English Translation Program Karen Shiells.
Generative Grammar(Part ii)
1 A Chart Parser for Analyzing Modern Standard Arabic Sentence Eman Othman Computer Science Dept., Institute of Statistical Studies and Research (ISSR),
Natural Language Processing DR. SADAF RAUF. Topic Morphology: Indian Language and European Language Maryam Zahid.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Morphology For Marathi POS-Tagger Veena Dixit 11/ 10 /2005.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Paradigm based Morphological Analyzers Dr. Radhika Mamidi.
Universal Networking Language (UNL) by Pantha Kanti Nath (05IT6021) Under the Guidance of Prof. Debasis Samanta School of Information Technology Indian.
Artificial Intelligence for Universal Networking Language (UNL) (Perspective Bengali Language) By Deen Islam Muslim ID: Ariful Hoque Tuhin ID:
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
CS : Speech, Natural Language Processing and the Web/Topics in Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12: Deeper.
Parsing arithmetic expressions Reading material: These notes and an implementation (see course web page). The best way to prepare [to be a programmer]
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
Finite State Automata and Tries Sambhav Jain IIIT Hyderabad.
8 November 2003 PP attachment problem1 Prepositional Phrase Attachment Problem 03M05601 Ashish Almeida.
Intelligent Systems Lecture 20 Examples of NLP in searching systems.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 37– Semantics; Universal Networking Language) Pushpak Bhattacharyya CSE Dept.,
Languages A Language L over a finite alphabet  is a set of strings of characters from the alphabet. Examples are : 1.L = { s | s is a grammatical English.
Machine Translation marazI to UNL Presented by Ashwini, Salil Center for Indian Language Technology Solutions CSE, IIT Powai.
Overview of Previous Lesson(s) Over View  An ambiguous grammar which fails to be LR and thus is not in any of the classes of grammars i.e SLR, LALR.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Semantic Relations; UNL; Towards Dependency Parsing.
CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Rules, Movement, Ambiguity
LESSON 04.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
SYNTAX.
Levels of Linguistic Analysis
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
VOCABULARY BUILDING ONE. WORDS ARE A GROUP OF LETTERS WHICH FORM A MEANING.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
NATURAL LANGUAGE PROCESSING
Hindi Generation from Interlingua (UNL) Om P. Damani, IIT Bombay (Joint work with S. Singh, M. Dalal, V. Vachhani, P. Bhattacharya)
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Lecture 9 Symbol Table and Attributed Grammars
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
A Simple Syntax-Directed Translator
Standardization of Lexicon
Revision Outcome 1, Unit 1 The Nature and Functions of Language
An Introduction to Universal Networking Language (UNL)
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Língua Inglesa - Aspectos Morfossintáticos
Levels of Linguistic Analysis
SYNTAX DIRECTED DEFINITION
The Complexity of OF in English
Morphology Mrs. Veena Dixit 14/9/04 Mrs. Veena Dixit 14/9/04
A Link Grammar for an Agglutinative Language
Presentation transcript:

Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008

Agro Explorer A Meaning Based Multilingual Search Engine Vishal Vachhani2

 Web-site for Indian farmers  Farmers can submit their problems related to their crops  Queries are answered by Agricultural Experts at KVK, Baramati  Languages supported: Marathi, Hindi, English Vishal Vachhani3

Why Need Multilingual Search  Vast Amount of Information available on the Web  Almost 70% of the Information is in English  The Indian rural populace is not English- Literate  “A Big Language Barrier”  Information has to be made available to them in their local languages. Vishal Vachhani4

Why Need Meaning Based Search  Most of the current Search Engines are Keyword Based.  They do not consider the semantics of the query  The result set contains a large number of extraneous documents.  Search based on the Meaning of the query will help narrow down on the desired information quickly. Vishal Vachhani5

6 Query in Hindi English Document System Marathi Document search English Document Result in Hindi

Vishal Vachhani7 Same Keywords Different Semantics Moneylenders Exploit Farmers Farmers Exploit Moneylenders Found 1 ResultFound 0 Result

Provides both  Meaning Based Search  Cross-Lingual Information Access Vishal Vachhani8

System Architecture Vishal Vachhani9

10

Vishal Vachhani11

Vishal Vachhani12

Vishal Vachhani13

Vishal Vachhani14

Conclusion Provides two independent features  Multi-Linguality  Meaning Based Search. Because of UNL both multi-lingual and meaning based properties can be incorporated together rather than using separate language translators in search engines. The scheme admits itself to Integration of multiple languages in a seamless, scalable manner. Vishal Vachhani15

Vishal Vachhani16 UNL Universal Networking Language

Vishal Vachhani17 UNL Englis h Frenc h Tam il Marath i Hind i

 Direct translation - translation will be done directly - N*(N-1) translator are needed for N languages translation.  Intermediate Language - intermediate language will be used for language translation - Only 2*N translators are required. Vishal Vachhani18

 UNL is an acronym for “Universal Networking Language”.  UNL is a computer language that enables computers to process information and knowledge across the language barriers.  UNL is a language for representing information and knowledge provided by natural languages  Unlike natural languages, UNL expressions are unambiguous. Vishal Vachhani19

 Although the UNL is a language for computers, it has all the components of a natural language.  It is composed of Universal Words (UWs), Relations, Attributes.  Knowledge :semantic graph ◦ Nodes  concepts ◦ Arcs  relation between concepts Vishal Vachhani20

 A UW represents simple or compound concepts. There are two classes of UWs: ◦ unit concepts ◦ compound structures of binary relations grouped together ( indicated with Compound UW-Ids)  A UW is made up of a character string (an English- language word) followed by a list of constraints. ◦ ::= [ ] ◦ example  state(icl>express)  state(icl>country) Vishal Vachhani21

◦ A relation label is represented as strings of 3 characters or less. ◦ The relations between UWs are binary.  rel (UW1, UW2) ◦ They have different labels according to the different roles they play. ◦ At present, there are 46 relations in UNL ◦ For example, agt (agent), ins (instrument), pur (purpose), etc. Vishal Vachhani22

 Attribute labels express additional information about the Universal Words that appear in a sentence. ◦ They show what is said from the speaker’s point of view; how the speaker views what is said. (time, reference, emphasis, attitude, etc. Vishal Vachhani23

Example: Ram eats rice. {unl} Ram) rice(icl>eatable)) {/unl} Vishal Vachhani24

Vishal Vachhani25 Ram eat rice plcagt

Example: The boy who works here went to school. {unl} :01) n)) agt:01(work(icl>do), plc:01(work(icl>do),here) {/unl} Vishal Vachhani26

Vishal Vachhani27 agt plc plt agt go here workschool boy :01

Vishal Vachhani28 Enconvertor Intermediate Language Deconvertor Source language target language

 It’s a Language Independent Generator  It can deconvert UNL expressions into a variety of native languages, using a number of linguistic data such as Word Dictionary, Grammatical Rules of each language.  The DeConverter transforms the sentence represented by a UNL expression into Natural language sentence. Vishal Vachhani29

Vishal Vachhani30

Vishal Vachhani31 Dictionary Syntax Planning Rules UNL Parser Case Marking Module Morphology Module Syntax Planning Module Case Marking Rules Morphology Rules UNL Doc Hind iDoc Language dependent Module Language Independent Module

UNL parser module will do following tasks –Check input format of UNL document –Separate attributes form UWs –Separate attributes form dictionary entries –Replace UWs with Hindi root words

 Category of morpho-syntactic properties which distinguish the various relations that a noun phrase may bear to a governing head.  ने, पर, के, से, पे,etc.  A rule base based on : ◦ UNL attributes ◦ lexical attributes from dictionary Vishal Vachhani33

 Case marking is implemented using rules.  We analyze all UNL as well as dictionary attributes and decide next and previous case marker.  Also we use relation with parent to extract the right case mark. Vishal Vachhani34

 agt:null:null:null: ने  Structure ◦ relName : ◦ parent previous case marker: ◦ parent next case marker: ◦ child previous case marker: ◦ child next case marker: ◦ the rest four are in form of ◦ attr'REL'relationname ◦ and attr will be separated by # ◦ also relation name are separated by # Vishal Vachhani35

 What is Morphology ◦ Study of Morphemes ◦ Their formation into words, including inflection, derivation and composition Vishal Vachhani36

 Noun, Verb and Adjective Morphology ◦ Depends on the phonetic properties of the Hindi word  Noun Morphology ◦ Depends on gender, number and vowel ending of the noun  Adjective Morphology ◦ अच्छा लडका, अच्छी लडकी, अच्छे लडके ◦ adjective अच्छ changes, lexical attribute “AdjA”  Verb Morphology ◦ Depends upon tense, gender, number, person etc. Vishal Vachhani37

 Verbs are categorized by ◦ Tense (past,present,future) ◦ Gender(male,female) ◦ Person (1 st, 2 nd, 3 rd ) ◦ Number (sg,pl)  Example ◦ Ladaka khana kha raha hai.  It contains present continuous tense,male, sg, and 3 rd person Vishal Vachhani38

 Arranging word according to the language structure  Rule based module  It is priority based graph traversal Vishal Vachhani39

Algorithm for Syntax Planning: 1) Start traversing the UNL graph from the entry node. 2) If node has no children then add this node to final string. 3) If there is more than one child of one node then sort children based on the priority of the relations. Relation having highest priority will be traversed first. 4) Mark that node as visited node. 5) Repeat steps 3 and 4 until all the children of that node get visited. 6) If all the children of that node get visited then add that node to final string. 7) Repeat steps 2 to 4 until all the nodes get traversed. Vishal Vachhani40

 Also, spray 5% Neemark solution. Vishal Vachhani 41 man qua mod obj spray also solution Neemarkpercent 5 obj:17 man:9 mod:5 qua:5 U-3

Vishal Vachhani42 spray Entry

Vishal Vachhani43 spray Entry objman

Vishal Vachhani44 spray Entry obj:17man:9

Vishal Vachhani45 spray Entry obj:17man:9 solution

Vishal Vachhani46 spray Entry obj:17man:9 solution mod

Vishal Vachhani47 spray Entry obj:17man:9 solution mod:5

Vishal Vachhani48 spray Entry obj:17man:9 solution mod:5 percent

Vishal Vachhani49 spray Entry obj:17man:9 solution mod:5 percent

Vishal Vachhani50 spray Entry obj:17man:9 solution mod:5 percent qua:5

Vishal Vachhani51 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Output : 5

Vishal Vachhani52 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Output : 5 percent

Vishal Vachhani53 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark Output : 5 percent Neemark

Vishal Vachhani54 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark Output : 5 percent Neemark solution

Vishal Vachhani55 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark also Output : 5 percent Neemark Solution also

Vishal Vachhani56 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark also Output : 5 percent Neemark Solution also spray

Output: 5 percent Neemark solution also spray 5 प्रतिशत नीमअर्क घोल भी छिड़क् | 5 प्रतिशत नीमअर्क घोल भी छिड़को | Vishal Vachhani57

Vishal Vachhani58 Input sentence: Its roots are affected by bacterial infection. ModuleOutput UNL parser जड़् प्रभावित जीवाण्विक संक्रमण् Case marking Morphology Syntax Planning जड़् प्रभावित जीवाण्विक संक्रमण् से इसकी जड़ें जीवाण्विक प्रभावित होती हैं संक्रमण से | जीवाण्विक संक्रमण से इसकी जड़ें प्रभावित होती हैं | Output: जीवाण्विक संक्रमण से इसकी जड़ें प्रभावित होती हैं | InputIts roots are affected by bacterial infection.

 UNL 2005 Specifications:  S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and O.Damani “Hindi generation from interlingua” MTsummit 2007 (  Mrugank Surve, Sarvjeet Singh, Satish Kagathara, Venkatasivaramasastry K, Sunil Dubey, Gajanan Rane, Jaya Saraswati, Salil Badodekar, Akshay Iyer, Ashish Almeida, Roopali Nikam, Carolina Gallardo Perez, Pushpak Bhattacharyya, AgroExplorer Group: AgroExplorer: a Meaning Based Multilingual Search Engine, International Conference on Digital Libraries (ICDL), New Delhi, India, Feb  Agro Explorer :  aAQUA : Vishal Vachhani59