Hindi Treebank Dipti Misra Sharma LTRC International Institute of Information Technology Hyderabad India.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

The Structure of Sentences Asian 401
Constraint based Dependency Telugu Parser Guided by - Dr.Rajeev Sangal Dr.Dipti Misra Samar Hussain Team members - Phani Chaitanya Ravi kiran.
Constraint Based Hindi Parser LTRC, IIIT Hyderabad.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Semantics (Representing Meaning)
Lexical Functional Grammar History: –Joan Bresnan (linguist, MIT and Stanford) –Ron Kaplan (computational psycholinguist, Xerox PARC) –Around 1978.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad.
Statistical NLP: Lecture 3
1 Syntactic Alternations of Hindi Verbs with Reference to the Morphological Paradigm Debasri Chakrabarti Debasri Chakrabarti Dr. Pushpak Bhattacharyya.
October 8, : Grammars and Lexicons Lori Levin (Examples from Kroeger)
Two-Stage Constraint Based Hindi Parser LTRC, IIIT Hyderabad.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.
The Hindi-Urdu Treebank Lecture 7: 7/29/ Multi-representational, Multi-layered treebank Traditional approach: – Syntactic treebank: PS or DS, but.
DS-to-PS conversion Fei Xia University of Washington July 29,
Sag et al., Chapter 4 Complex Feature Values 10/7/04 Michael Mulyar.
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 11 Syntax 2.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Sanjukta Ghosh Department of Linguistics Banaras Hindu University.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
Writing English Preparation Ali Abdullah al shehan : Supervision
Linguistic Theory Lecture 2 Phrase Structure. What was there before structure? Classical studies: Classical studies: –Languages such as Latin Rich morphology.
Computational Paninian Grammar for Dependency Parsing Dipti Misra Sharma LTRC, IIIT, Hyderabad NLP Winter School
Syntax Lecture 8: Verb Types 1. Introduction We have seen: – The subject starts off close to the verb, but moves to specifier of IP – The verb starts.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Introduction to Computational Linguistics Dipti Misra Sharma IIIT, Hyderabad IASNLP
Hindi Parsing Samar Husain LTRC, IIIT-Hyderabad, India.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 13, Feb 16, 2007.
LINGUA INGLESE 1 modulo A/B Introduction to English Linguistics prof. Hugo Bowles Lesson 8 Syntax 1 1.
Sentence Structure: Four Types of Sentences September 3, 2014.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
Notes on Pinker ch.7 Grammar, parsing, meaning. What is a grammar? A grammar is a code or function that is a database specifying what kind of sounds correspond.
Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.
CS460/626 : Natural Language Processing/Speech, NLP and the Web Some parse tree examples (from quiz 3) Pushpak Bhattacharyya CSE Dept., IIT Bombay 12 th.
Linguistic Essentials
1 Syntax 2 Essays: deadline extended to Monday 2nd, but must be printed on paper. ed essays must still arrive by 30th. Morphology homework by 26th.
Role of NLP in Linguistics Dipti Misra Sharma Language Technologies Research Centre International Institute of Information Technology Hyderabad.
Rules, Movement, Ambiguity
The man bites the dog man bites the dog bites the dog the dog dog Parse Tree NP A N the man bites the dog V N NP S VP A 1. Sentence  noun-phrase verb-phrase.
Section 11.3 Features structures in the Grammar ─ Jin Wang.
Topic 3: predicates Introduction to Semantics. Definition Any word which can function as the predicator of a sentence. Predicators The parts which are.
Simple and Complete. Sentence  Subject and Predicate Every sentence has: Subject Predicate.
CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.
Role of NLP in Linguistics Dipti Misra Sharma Language Technologies Research Centre International Institute of Information Technology Hyderabad.
Constraint Based Hindi Dependency Parser Samar Husain LTRC, IIIT Hyderabad.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
1 Some English Constructions Transformational Framework October 2, 2012 Lecture 7.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Natural Language Processing Vasile Rus
Beginning Syntax Linda Thomas
The World of Verbs.
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Syntax Lecture 9: Verb Types 1.
Statistical NLP: Lecture 3
A Parser for Sinhala Language First Step Towards English to Sinhala Machine Translation
Natural Language Processing
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
Chapter 4 Basics of English Grammar
PREPOSITIONAL PHRASES
Linguistic Essentials
Chapter 4 Basics of English Grammar
Presentation transcript:

Hindi Treebank Dipti Misra Sharma LTRC International Institute of Information Technology Hyderabad India

Outline Hindi Language Some Problem Cases Our Approach The team

Hindi Language Relatively simpler morphology – compared to other Indian Languages Relatively flexible word order For example, 1. a) baccaa phala khaataa hai ‘child’ ‘fruit’ ‘eats’ b) phala baccaa khaataa hai c) phala khaataa hai baccaa d) baccaa khaataa hai phala

Basic Structure in PS NP VP S N baccaa NPVP NVAux haikhaataa phala 1 a) baccaa phala khaataa hai ‘child’ ‘fruit’ ‘eat’ ‘pres’ Subject – baccaa ‘child’ Object - phala ‘fruit’

PS for 1(b) 1 b) phala baccaa khaataa hai ‘fruit’ ‘child’ ‘eat’ ‘pres’ Topic – phala ‘fruit’ Subject - baccaa ‘child’ Object - t Movement involved Tree - I

Problems Complex tree In what ways subject (baccaa) is different from object (phala) ? Agreement does not hold Position does not hold

How to Draw PSs for 1 (c-d) ? 1 c) baccaa khaata hai phala 'child' 'eat+hab' 'pres' 'fruit' 1 d) phala khaata hai baccaa 'fruit' 'eat+hab' 'pres' 'child' Simple and perfectly natural sentences - difficult to handle in Phrase Structure Dependency structures make it easy

Dependency Structure khaataa_ hai phala baccaa baccaa phala khaataa hai ‘child’ ‘fruit’ ‘eat’ ‘is’ phala baccaa khaataa hai ‘fruit’ ‘child’ ‘eat’ ‘is’ baccaa khaata hai phala ‘child’ ‘eat’ ‘is’ ‘fruit’ phala khaata hai baccaa ‘fruit’ ‘eat’ ‘is’ ‘child’ k1 k2 One dependency for all (1 a-d) Additional attribute of 'order' can be included to capture the variation in order Case and postpositions be encoded in role

Complex predicates Two different entities behave as a single unit Conjunct verb ‘prashna kiyaa’ below 2. mohana ne ravi se prashna kiyaa ' Mohan' 'erg' 'Ravi' 'to' 'question' 'did' “Mohan asked Ravi a question' A conjunct verb can have partial modification 3. mohana ne eka prashna kiyaa thaa 'Mohan' 'erg' 'one' 'question' 'do+perf' 'past' The elements in a complex predicate can also be dis- contiguous 4. prashna to mohana ne kiyaa thaa 'question' 'part' 'Mohan' 'erg' 'do+perf' 'past'

Elegant representations in PS are difficult 2 and 3 can still be captured 4 will need a complex solution Can be easily represented in a dependency structure

DS for Dis-contiguous Elements prashna to mohana ne kiyaa thaa Disjoint conjunct verb Practically anything can come between prashna kiyaa Will involve complex operations in PS Can be handled with ease in dependency frame work Use of POF ( ‘Part Of’ relation ) kiyaa mohanaprashna POF k1

Selected Paninian Dependency Model Paninian Grammatical model Better for languages with flexible word order Works at syntactico-semantic level Offers two levels of analysis Relations : Kaaraka : Direct relations of nouns to a verb Other relations : Possesive, reason, cause etc (semantic dependencies) Vibhaktis : Relation markers ( For details Akshar Bharati et al Natural Language Parsing - a Paninian Perspective (1995) ) k/index.html

The Team Rajeev Sangal Dipti Misra Sharma Lakshmi Bai Ramakrishmacharyulu Rafiya Begam Samar Husain Arun Dhwaj

Some More Phenomena jo-vo constructions mujhase gilaasa TuuTa gayaa 'by me' 'glass' 'break' 'went' * “The glass broke by me” raama sabase baDaa putra thaa dasharatha kaa 'Ram' 'most' 'big' 'son' 'was' 'Dasharatha' 'of' “Ram was the eldest son of Dasharatha'

PS NP VP S N mujhse NPVP N gilaas mujhse gilaasa tuuta gayaa ‘me-by’ ‘glass’ ‘break’ ‘went’ Subject – mujhse * Object - gilaas * Wrongly Represented 'tuuta' intransitive verb 'gilaasa' subject 'mujhase' causer tuuta gaya

DS tuutaa glaasa k1 k3 mujhse Dependency tree brings out the right nuance

Thank You