Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Slides:



Advertisements
Similar presentations
The Structure of Sentences Asian 401
Advertisements

2 nd : the world of experience must of necessity include the sum of human knowledge.
Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
Term 2 Week 3 Semantics.
The Hindi-Urdu Treebank Lecture 7: 7/29/ Multi-representational, Multi-layered treebank Traditional approach: – Syntactic treebank: PS or DS, but.
Introduction to treebanks Session 1: 7/08/
General course information Session 1: 7/08/
Chapter 6 Identifying Grammatical Morphemes Morphology Lane 333.
1 Annotation Guidelines for the Penn Discourse Treebank Part B Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
DS-to-PS conversion Fei Xia University of Washington July 29,
LING 364: Introduction to Formal Semantics Lecture 4 January 24th.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Phonetics, Phonology, Morphology and Syntax
Speaking Sample Items for Scoring Practice Speaking Components Speaking Scoring Guide Test Administration Manual Student Speaking Prompts- on CD Input.
ELN – Natural Language Processing Giuseppe Attardi
On the Neurocognitive Basis of Syntax Sydney Lamb l 2010 November 12 Wenzao Ursuline College of Languages Kaohsiung, Taiwan.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
Reading. How do you think we read? -memorizing words on the page -extracting just the meanings of the words -playing a mental movie in our heads of what.
Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Nulls in PropBank Sep 17 th What is a null category.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
1 Chapter 1 Introduction. 2 Outlines 1.1 Overview and History 1.2 What Do Compilers Do? 1.3 The Structure of a Compiler 1.4 The Syntax and Semantics of.
Role of NLP in Linguistics Dipti Misra Sharma Language Technologies Research Centre International Institute of Information Technology Hyderabad.
SE367 Course Project Shourya Sonkar Roy Burman (Y8487) Learning Grammatical Gender in an Artificial Language Based on Hindi.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
Sentence Fluency and Conventions An In-Depth Training Session For English Language Arts Teachers.
Topic 3: predicates Introduction to Semantics. Definition Any word which can function as the predicator of a sentence. Predicators The parts which are.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.
Introduction Chapter 1 Foundations of statistical natural language processing.
Natural Language Processing Chapter 2 : Morphology.
Done by: Amir Tazhinov, For ENG 100a Class American University in Bulgaria, 2014.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
1 Some English Constructions Transformational Framework October 2, 2012 Lecture 7.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
VOCABULARY BUILDING ONE. WORDS ARE A GROUP OF LETTERS WHICH FORM A MEANING.
Passive Generalizations Li, Charles N. & Thompson, Sandra A. (1981). Mandarin Chinese - A Functional Reference Grammar. Los Angeles: University of California.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
Why languages differ: Variation in the conventionalization of constraints on inference By: Randy J. LaPolla City University of Hong Kong Presented by:
Linguistics at SOAS Insight Day – 2 nd March 2016.
Inflection. Inflection refers to word formation that does not change category and does not create new lexemes, but rather changes the form of lexemes.
Expanding verb phrases
Grammatical Issues in translation
WEICOME. INTRODUCING Md.Rafiqul Islam Assistant Teacher(Computer) Bara plashbari islamic alim madrasah,Balidangi,Thak urgaon.
Chapter 4 Syntax a branch of linguistics that studies how words are combined to form sentences and the rules that govern the formation of sentences.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Introduction to Linguistics
Beginning Syntax Linda Thomas
[A Contrastive Study of Syntacto-Semantic Dependencies]
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
4.3 The Generative Approach
Grammar Workshop Thursday 9th June.
Welcome to the Year 3/4 “Meet the Teacher” Event
Learning Development & Student Writing
Introduction to Linguistics
Owen Rambow 6 Minutes.
Presentation transcript:

Annotation for Hindi PropBank

Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments Tasks to be carried out Tools for annotation Timesheets, tips Practice Tasks to be carried out Tools for annotation Timesheets, tips Practice

Creation of Resources For machines rather than humans Imagine a dictionary/ thesaurus for computers A requirement for Natural Language Processing – Large annotated resources Annotation implies addition of linguistic information Tailored to language specific requirements Needs to be as consistent as possible – Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation For machines rather than humans Imagine a dictionary/ thesaurus for computers A requirement for Natural Language Processing – Large annotated resources Annotation implies addition of linguistic information Tailored to language specific requirements Needs to be as consistent as possible – Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation

Hindi-Urdu Treebank Project One of the first efforts to make a large-scale resource for Hindi-Urdu Similar resources exist for Chinese, Arabic and English Three main components – Hindi-Urdu dependency treebank – Hindi-Urdu PropBank – Hindi-Urdu phrase structure treebank [derived] One of the first efforts to make a large-scale resource for Hindi-Urdu Similar resources exist for Chinese, Arabic and English Three main components – Hindi-Urdu dependency treebank – Hindi-Urdu PropBank – Hindi-Urdu phrase structure treebank [derived]

PropBank PropBank resource creation at CU Boulder We annotate semantic information on top of syntactic information PropBank involves annotation of predicate argument structure – Mainly concerned with verbs & their arguments – And the semantic nature of the arguments PropBank resource creation at CU Boulder We annotate semantic information on top of syntactic information PropBank involves annotation of predicate argument structure – Mainly concerned with verbs & their arguments – And the semantic nature of the arguments

What are verbs? Verbs are predicating elements e.g daud, pii, baras etc Encode (very broadly) actions and states Also have two kinds of grammatical information – Tense, aspect (present, future ; perfect, continuous) – Gender, number, person (masc/fem; sing, pl; 1 st, 2 nd, 3 rd ) Verbs are predicating elements e.g daud, pii, baras etc Encode (very broadly) actions and states Also have two kinds of grammatical information – Tense, aspect (present, future ; perfect, continuous) – Gender, number, person (masc/fem; sing, pl; 1 st, 2 nd, 3 rd )

What are arguments? In a sentence, e.g Ram ate an apple / Raam ne seb khaaya: – A verb, ‘eat’ or ‘khaa’ predicate – A person eating ‘Raam’ ARGUMENT – Thing eaten ‘apple’ / ‘seb’ ARGUMENT Without arguments, the meaning of the verb ‘ate’ is not realized completely Together, they make up the predicate argument structure of the sentence In a sentence, e.g Ram ate an apple / Raam ne seb khaaya: – A verb, ‘eat’ or ‘khaa’ predicate – A person eating ‘Raam’ ARGUMENT – Thing eaten ‘apple’ / ‘seb’ ARGUMENT Without arguments, the meaning of the verb ‘ate’ is not realized completely Together, they make up the predicate argument structure of the sentence

Arguments show what’s important Raam ne jaldi se seb khaaya – Raam, seb are arguments – But ‘jaldi se’ is not It’s all about the verb – It projects its need for certain arguments – Sift what’s mandatory from what’s optional Raam ne jaldi se seb khaaya – Raam, seb are arguments – But ‘jaldi se’ is not It’s all about the verb – It projects its need for certain arguments – Sift what’s mandatory from what’s optional

Like Unix commands Some commands require only one argument. – cd /home/student/ashwini – cp hmwk1.txt hmwk2.txt If the command is typed with too many or too few arguments… Some commands require only one argument. – cd /home/student/ashwini – cp hmwk1.txt hmwk2.txt If the command is typed with too many or too few arguments…

Error!

Making information explicit As speakers of Hindi or English, we already have knowledge of predicate argument structure E.g. hari ___ pahuMcaa – Capturing this knowledge for the machine is essential – Ram ne seb khaaya aur paani piyaa – Who drank the water? As speakers of Hindi or English, we already have knowledge of predicate argument structure E.g. hari ___ pahuMcaa – Capturing this knowledge for the machine is essential – Ram ne seb khaaya aur paani piyaa – Who drank the water?

Identify arguments In PropBank, we first identify arguments of a verb When explicitly present, they are called ARG Further, they are numbered as ARG0, ARG1, ARG2 etc. Often, you have ARG as well as ARG-M – Ram ARG0 ne jaldi se ARG-M seb ARG1 khaaya In PropBank, we first identify arguments of a verb When explicitly present, they are called ARG Further, they are numbered as ARG0, ARG1, ARG2 etc. Often, you have ARG as well as ARG-M – Ram ARG0 ne jaldi se ARG-M seb ARG1 khaaya

Null arguments What if arguments are not explicit? – E.g Ram ne seb khaaya aur ___ paani piyaa – Ram is also the person drinking water – It can be dropped, because of conjunction aur – For the machine, it must be retrieved from the sentence We also mark these missing or null arguments What if arguments are not explicit? – E.g Ram ne seb khaaya aur ___ paani piyaa – Ram is also the person drinking water – It can be dropped, because of conjunction aur – For the machine, it must be retrieved from the sentence We also mark these missing or null arguments

Tasks to be carried out Null argument insertion Argument annotation Null argument insertion Argument annotation

Tools to be used Sanchay – GUI for annotators. We use it especially for Null argument insertion Use your verbs account to access Sanchay Wiki for annotator resources Sanchay – GUI for annotators. We use it especially for Null argument insertion Use your verbs account to access Sanchay Wiki for annotator resources

Timesheets & tips Being honest about filling out timesheets is quite important We can access the amount of time you spend on verbs I will ask you to keep track of number of annotations per hour to cross check Turn in the timesheets at my CINC mailbox in physical form, with your signature Being honest about filling out timesheets is quite important We can access the amount of time you spend on verbs I will ask you to keep track of number of annotations per hour to cross check Turn in the timesheets at my CINC mailbox in physical form, with your signature

Practice We need to learn about four kinds of empty categories Plan to proceed – Recognizing syntactic constructions – Getting familiar with the tool – Practice with the corpus – Q & A based on null argument insertion We need to learn about four kinds of empty categories Plan to proceed – Recognizing syntactic constructions – Getting familiar with the tool – Practice with the corpus – Q & A based on null argument insertion