A Maximum Entropy Based Honorificity Identification for Bengali Pronominal Anaphora Resolution Apurbalal Senapati and Utpal Garain Presented by Samik Some.

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Automatic Timeline Generation Jessica Jenkins Josh Taylor CS 276b.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Segmentation via Maximum Entropy Model. Goals Is it possible to learn the segmentation problem automatically? Using a model which is frequently used in.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Frustratingly Easy Domain Adaptation
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
Lecture 20 Object recognition I
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Conditional Random Fields
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Results: Prominence prediction without lexical information Each type of feature reduces the error rate over the baseline. SRF and INF features appear to.
Raili Hildén University of Helsinki Relating the Finnish School Scale to the CEFR.
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
A brief maximum entropy tutorial. Overview Statistical modeling addresses the problem of modeling the behavior of a random process In constructing this.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Some Advances in Transformation-Based Part of Speech Tagging
ME 2304: 3D Geometry & Vector Calculus Dr. Faraz Junejo Double Integrals.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Measures of Variability OBJECTIVES To understand the different measures of variability To determine the range, variance, quartile deviation, mean deviation.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Information Theory Metrics Giancarlo Schrementi. Expected Value Example: Die Roll (1/6)*1+(1/6)*2+(1/6)*3+(1/6)*4+(1/6)*5+( 1/6)*6 = 3.5 The equation.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Letters.
MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran – Dikkala Sai Nishanth – Ashwin P. Paranjape
© Copyright McGraw-Hill 2004
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
Copyright © Cengage Learning. All rights reserved. 8 PROBABILITY DISTRIBUTIONS AND STATISTICS.
Subject Verb Agreement
12. Principles of Parameter Estimation
LECTURE 23: INFORMATION THEORY REVIEW
Applied Linguistics Chapter Four: Corpus Linguistics
12. Principles of Parameter Estimation
The Improved Iterative Scaling Algorithm: A gentle Introduction
Presentation transcript:

A Maximum Entropy Based Honorificity Identification for Bengali Pronominal Anaphora Resolution Apurbalal Senapati and Utpal Garain Presented by Samik Some

Introduction Pronomial anaphora resolution is important for several NLP tasks such as question answering, summarization etc. In English pronouns carry gender information such as he/she/it which is helpful when resolving them to find the nouns they refer to. In Bengali pronouns do not have gender information however there is honorificity information embedded in pronouns. eg. Three forms of the verb খাওয়া ; খা, খাও and খান. Such honorific information is present in both written and spoken forms, and may be used for resolving pronoun references. There are multiple features which contribute to determining noun honorificity in Bengali which led to the selection of a maximum entropy model. Such a model offers better performance than earlier methods.

Honorific Information in Bengali Three types of honorificity exist in Bengali language both in written and spoken form. The highest degree (SUP class) refers to people who are of higher status or generally respectable people such as doctors, teachers, elders, etc. Next is the neutral form (NEU class) which is used when referring to close family members, children, younger family members, etc. The lowest level (INF class) is used for very close friends, very close relations, or people who are presumed to be of lower social status such as rickshaw pullers, housemaids, etc. Such honorific information can be extracted from several sources such as the title placed before or after a name. It can also be obtained by observing the inflection of the main verb. Pronouns also conform with such honorificity and there are different forms for different levels.

Maximum Entropy Modelling A training sample is summarized in terms of its empirical probability distribution. Here x is the context and y is the corresponding class. We define events in terms of binary values feature functions which gives 1 if the context x contains some particular information and the corresponding class is y. The statistic we are interested in is the expected value of the feature function with respect to the empirical distribution p~(x,y).

Maximum Entropy Modelling (contd.) The expected value that the model p(y|x) assigns to f is given as… We require that our model conforms to the following restriction. Thus the solution space for our model is… We can find the most uniform model in this space by maximizing the following entropy function.

Feature functions A 10-dimensional feature vector is used. f1(x, y) = 1 when y is SUP and x is the person and honorific addressing term is in the left position; 0 otherwise. f2(x, y) = 1 when y is SUP and x is the person and honorific addressing term is in right position; 0 otherwise. f3(x, y) = 1 when y is SUP and x is the person and honorific addressing terms are both in left and right positions; 0 otherwise. f4(x, y) = 1 when y is SUP and x is the person and honorific addressing term is in left position and the main verb associated with x is in SUP class; 0 otherwise. f5(x, y) = 1 when y is SUP and x is the person and honorific addressing term is in right position and the main verb associated with x is in SUP class; 0 otherwise.

Feature functions (contd.) f6(x, y) = 1 when y is SUP and x is the person and honorific addressing terms are both in left and right positions and the main verb associated with x is in SUP class; 0 otherwise. f7(x, y) = 1 when y is SUP and x is the person and the main verb associated with x is in SUP class; 0 otherwise. f8(x, y) = 1 when y is NEU and x is the person and the main verb associated with x is in NEU class; 0 otherwise. f9(x, y) = 1 when y is NEU and x is the person and there is neither left honorific terms nor right honorific terms and the main verb is absent; 0 otherwise. f10(x, y) = 1 when y is INF and x is the person and no honorific term is either in left or in right position and the main verb is in INF class; 0 other- wise.

Training The feature functions are computed locally within a sentence and the model was trained using a large Bengali corpus (35 million words) The data was defined in the form (x,y) where x is the sentence or phrase containing a person with context information and y is the honorific information. Format of training data Coverage

Results Tested on an extended version of ICON 2011 annotated data for anaphora resolution in Bengali. The results are as shown. Coverage Honorificity detection Pronomial anaphora resolution

Thank you