Hindi Treebank Dipti Misra Sharma LTRC International Institute of Information Technology Hyderabad India
Outline Hindi Language Some Problem Cases Our Approach The team
Hindi Language Relatively simpler morphology – compared to other Indian Languages Relatively flexible word order For example, 1. a) baccaa phala khaataa hai ‘child’ ‘fruit’ ‘eats’ b) phala baccaa khaataa hai c) phala khaataa hai baccaa d) baccaa khaataa hai phala
Basic Structure in PS NP VP S N baccaa NPVP NVAux haikhaataa phala 1 a) baccaa phala khaataa hai ‘child’ ‘fruit’ ‘eat’ ‘pres’ Subject – baccaa ‘child’ Object - phala ‘fruit’
PS for 1(b) 1 b) phala baccaa khaataa hai ‘fruit’ ‘child’ ‘eat’ ‘pres’ Topic – phala ‘fruit’ Subject - baccaa ‘child’ Object - t Movement involved Tree - I
Problems Complex tree In what ways subject (baccaa) is different from object (phala) ? Agreement does not hold Position does not hold
How to Draw PSs for 1 (c-d) ? 1 c) baccaa khaata hai phala 'child' 'eat+hab' 'pres' 'fruit' 1 d) phala khaata hai baccaa 'fruit' 'eat+hab' 'pres' 'child' Simple and perfectly natural sentences - difficult to handle in Phrase Structure Dependency structures make it easy
Dependency Structure khaataa_ hai phala baccaa baccaa phala khaataa hai ‘child’ ‘fruit’ ‘eat’ ‘is’ phala baccaa khaataa hai ‘fruit’ ‘child’ ‘eat’ ‘is’ baccaa khaata hai phala ‘child’ ‘eat’ ‘is’ ‘fruit’ phala khaata hai baccaa ‘fruit’ ‘eat’ ‘is’ ‘child’ k1 k2 One dependency for all (1 a-d) Additional attribute of 'order' can be included to capture the variation in order Case and postpositions be encoded in role
Complex predicates Two different entities behave as a single unit Conjunct verb ‘prashna kiyaa’ below 2. mohana ne ravi se prashna kiyaa ' Mohan' 'erg' 'Ravi' 'to' 'question' 'did' “Mohan asked Ravi a question' A conjunct verb can have partial modification 3. mohana ne eka prashna kiyaa thaa 'Mohan' 'erg' 'one' 'question' 'do+perf' 'past' The elements in a complex predicate can also be dis- contiguous 4. prashna to mohana ne kiyaa thaa 'question' 'part' 'Mohan' 'erg' 'do+perf' 'past'
Elegant representations in PS are difficult 2 and 3 can still be captured 4 will need a complex solution Can be easily represented in a dependency structure
DS for Dis-contiguous Elements prashna to mohana ne kiyaa thaa Disjoint conjunct verb Practically anything can come between prashna kiyaa Will involve complex operations in PS Can be handled with ease in dependency frame work Use of POF ( ‘Part Of’ relation ) kiyaa mohanaprashna POF k1
Selected Paninian Dependency Model Paninian Grammatical model Better for languages with flexible word order Works at syntactico-semantic level Offers two levels of analysis Relations : Kaaraka : Direct relations of nouns to a verb Other relations : Possesive, reason, cause etc (semantic dependencies) Vibhaktis : Relation markers ( For details Akshar Bharati et al Natural Language Parsing - a Paninian Perspective (1995) ) k/index.html
The Team Rajeev Sangal Dipti Misra Sharma Lakshmi Bai Ramakrishmacharyulu Rafiya Begam Samar Husain Arun Dhwaj
Some More Phenomena jo-vo constructions mujhase gilaasa TuuTa gayaa 'by me' 'glass' 'break' 'went' * “The glass broke by me” raama sabase baDaa putra thaa dasharatha kaa 'Ram' 'most' 'big' 'son' 'was' 'Dasharatha' 'of' “Ram was the eldest son of Dasharatha'
PS NP VP S N mujhse NPVP N gilaas mujhse gilaasa tuuta gayaa ‘me-by’ ‘glass’ ‘break’ ‘went’ Subject – mujhse * Object - gilaas * Wrongly Represented 'tuuta' intransitive verb 'gilaasa' subject 'mujhase' causer tuuta gaya
DS tuutaa glaasa k1 k3 mujhse Dependency tree brings out the right nuance
Thank You