Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions Computer Science and Engineering.

Slides:



Advertisements
Similar presentations
Literacy Test Preparation
Advertisements

Unit A4 Translation shifts
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Statistical NLP: Lecture 3
Albert Gatt LIN3021 Formal Semantics Lecture 5. In this lecture Modification: How adjectives modify nouns The problem of vagueness Different types of.
The quest for meaning in language documentation Felix Ameka.
Python Programming Chapter 1: The way of the program Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
Efficient Search in Large Textual Collections with Redundancy Jiangong Zhang and Torsten Suel Review by Newton Alex
PSY 369: Psycholinguistics Some basic linguistic theory part3.
CS /29/2004 (Recitation Objectives) and Computer Science and Objects and Algorithms.
1 Word meaning and equivalence M.A. Literary Translation- Lesson 1 prof. Hugo Bowles January
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Chapter 2: Algorithm Discovery and Design
Mining Metamodels From Instance Models: The MARS System Faizan Javed Department of Computer & Information Sciences, University of Alabama at Birmingham.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
English 9 Introduction: Short Story Unit, Literature, Language, and Culture What is Culture?
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Topic: Hindi Wordnet, Formalization.
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Linguistic Transference and Interference: Interpreting Between English and ASL Jeffrey Davis Davis, Jeffrey E Linguistic transference and interference:
Structure ie the order and arrangement of ideas and concepts Language Features CONNECTIVES – linking words and phrases ie allows the explanation / analysis.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Chapter 6 The Relational Database Model: Additional Concepts
Antonym Creation Tool Presented By Thapar University WordNet Development Team.
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Language and Thought.
Galina Bogdanova, Konstantin Rangochev, Desislava Paneva-Marinova, Nikolay Noev Institute of Mathematics and Informatics, Bulgarian Academy of Sciences.
The Program Development Cycle
Unit A1 What is Translation?
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
The Procedure Abstraction, Part VI: Inheritance in OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction to computer vision Chapter 2: Image.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
CSC 230: C and Software Tools Rudra Dutta Computer Science Department Course Introduction.
Unaddition (Subtraction)
Engineering 5895: Software Design 9/11/01Class Diagrams 1.
Program Development Cycle Modern software developers base many of their techniques on traditional approaches to mathematical problem solving. One such.
A Survey of English Lexicology
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Wordnet - A lexical database for the English Language.
© Michael Lacewing Conceptual schemes Michael Lacewing.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
AF1.1 L1-2 Using models for and in explanations Compare features or parts of objects, living things or events.
Intertheoretic Reduction and Explanation in Mathematics
Levels of Linguistic Analysis
Chapter 6, part-2- Language Learning and Teaching Processes and Young Children.
Searching Topics Sequential Search Binary Search.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
Community Language Learning (CLL)
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Why languages differ: Variation in the conventionalization of constraints on inference By: Randy J. LaPolla City University of Hong Kong Presented by:
Safa J. Abu Rahma. A proverb is “ a saying, usually short, that expresses a general truth about life”. Proverbs give advice, make an observation or present.
CULTURAL WORDS AND PHRASES Language and Culture. Culture: the whole way of life of a certain linguistic community. This includes not only material aspects.
The effect on word understanding of active and passive participation in communication. Judit Fazekas 1, Csaba Pléh 1 1Department of Cognitive Science,
Learner’s Competences
Ahmedabad Institute of Technology
Statistical NLP: Lecture 3
Representation of Actions as an Interlingua
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Software Design Methodology
Introduction Artificial Intelligent.
TERMINOLOGY AND TRANSLATION
LEARNING OBJECTIVE: TALK ABOUT LOCAL ENVIRONMENTAL PROBLEMS.
Levels of Linguistic Analysis
Origin Stories Native American Myths.
Databases and Information Management
The Lexical Approach By: Yajaira Carrillo and Lorena Chirinos.
Presentation transcript:

Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions Computer Science and Engineering Department, IIT Bombay

Outline What is Language specific synset (LSS) What is the need of LSS Linkage of LSS Problems related to it Solution to the problems

What is Language specific synset A Language specific Synset is the synset based on the concept which is available only in a particular language, and which has no conceptual match in other languages. e.g., सेल रोटी in Nepali sela rotii ring shaped deep fried sweet roti made of rice flour.

What is the need of LSS The need for LSS arise to capture the following types of lexical items in a particular language to retain the uniqueness of the language.  Lexical Uniqueness  Lexical Gap  Cultural Gap  Pragmatic Gap  Lexical Mismatch

Lexical Uniqueness Every language does posses a list of unique lexical items which refer to some unique concepts and ideas for which no conceptual equivalents are available in other languages. e.g., भुत्या (in Maraathii) Bhutyaa A devotee of Bhavaani devii.

Lexical Gap This refers to the phenomenon of lack of lexical equivalence between any two or more languages. When meanings of words of a language do not exactly fit into the meanings of words of the other language. e.g., Challenge (in English) There is no word, phrase or multi word to justify its meaning in Bangla

Cultural Gap A cultural gap may originate from socio-cultural differences between the languages. It may happen that A particular language community observes some socio-cultural rites, rituals, festivals, practices etc., which are not known to the members of another language. e.g., राजा raajaa a unique socio-cultural ritual which is practiced by Oriya language groups.

Pragmatic Gap This is caused due to the differences in lexicalization between the languages. It says that the basic concept is known to both the languages, but not expressed in the same manner. While it is expressed in a single lexicalized form in one language, it is expressed in the form of a multiword expression (i.e., phrases, idioms, etc.) in another language. e.g., भानवस, भाणवस (in maraathii) Bhaanavasa चूल्हे का पाट a Platform behind the village cooking stove

Lexical mismatch This is a unique linguistic phenomenon where a lexical item refers a particular concept in a language, while the same lexical item refers to a different concept in another language. e.g.,शिक्षा Shikshaa punishment education, in Marathipreachment, moral etc. in Hindi

Linkage of LSS Words having LS concepts is selected by a particular language group and synsets are created in the language for the concepts by the group and, parallelly, the group creates a Hindi synsets for these concepts as well. LSSs created in this manner are sent to IITB. HWN group will verify and correct grammatical errors etc. of the Hindi synsets. Duplicate synsets will be deleted. After verification and correction, it will be sent back to the language group to see whether corrected Hindi synsets are right or not. If green signal is given, then it will be loaded to repository with their relations.

Problems in linkage Duplication of synsets may occur since a concept can be in other languages as well and lexicographer may not be familiar with it. Linkage of lexical relations e.g., antonymy relation LSSs linked with hypernymy-hyponymy relation.

Solutions Duplicate synsets will be nulled, as we have been doing so. Interface will be created to link lexical relations like antonymy. Brijesh will give suggestion for the third problem.

Linking of WordNets Language specific synsets Culture specific Food Items Places Traditions Same concept in different languages? Lexical gap Kashmiri doesn’t have lexeme for ‘Water’, However there is a lexeme for ‘Drinking Water’. Modification in hierarchy?

Creating Language Specific Synsets Use hypernymy to describe gloss Try to distinguish between co-hyponym Define domain (Food Items, Place etc) Translate gloss in Hindi and English

Common Concept Hierarchy EnglishGujaratiKannada UncleKaka (Paternal Uncle)‘Doddappa’ (Father’s elder brother) Mama (Maternal Uncle)‘Chikkappa’ (Father’s younger brother) Uncle Kaka Chikkappa Doddappa mama

Common Index Creating Common concept hierarchy for all languages – Use concept hierarchy of Hindi language as starting point – Add concepts and modify hierarchy for each language – Translate gloss in Hindi & English to compare synsets of two different languages.

Thank you