REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch, Dmitrij A. Zalmanov

Slides:

Advertisements

Similar presentations

REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik Night Whites.

Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

A CORPUS-BASED STUDY OF REFERENTIAL CHOICE: Multiplicity of factors and machine learning techniques Andrej A. Kibrik, Grigorij B. Dobrov, Mariya V. Khudyakova,

Interlanguage phonology: Phonological description of what constitute ‘foreign accents’ have been developed. Studies about the reception of such accents.

Measuring Referring Expressions in a Story Context Phyllis Schneider, Speech Pathology & Audiology, University of Alberta Denyse Hayward, University of.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.

Generation of Referring Expressions: Modeling Partner Effects Surabhi Gupta Advisor: Amanda Stent Department of Computer Science.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.

Introduction to experimental errors

1/13 Parsing III Probabilistic Parsing and Conclusions.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Reference and inference By: Esra’a Rawah

Scalable Text Mining with Sparse Generative Models

The Langue/Parole distinction`

C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.

Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.

 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Relative clauses Chapter 11.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.

Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU

Data Mining and Decision Support

Yule: “Words themselves do not refer to anything, people refer” Reference and inference Pragmatics: Reference and inference.

Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

NATURAL LANGUAGE PROCESSING

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Neural Machine Translation

CSC 594 Topics in AI – Natural Language Processing

Simone Paolo Ponzetto University of Heidelberg Massimo Poesio

Statistical NLP: Lecture 9

A Hierarchical Bayesian Look at Some Debates in Category Learning

Somi Jacob and Christian Bach

Linguistic aspects of interlanguage

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch, Dmitrij A. Zalmanov

2 22 Referential choice in discourse  When a speaker needs to mention (or refer to) a specific, definite referent, s/he chooses between several options, including:  Full noun phrase (NP) Proper name (e.g. Pushkin) Common noun (with modifiers) = definite description (e.g. the poet)  Reduced NP, particularly a third person pronoun (e.g. he)

3 Example  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done  How is this choice made? Full NP Pronoun antecedentcoreference anaphors

4 Why is this important?  Reference is among the most basic cognitive operations performed by language users  It is the linguistic representation of what is known as attention and working memory in psychology  Reference constitutes a lion’s share of all information in natural communication  Consider text manipulation according to the method of Biber et al. 1999:

5 Referential expressions marked in green  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done

6 Referential expressions removed  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done

7 Referential expressions kept  Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done

8 88 Plan of talk  I. Referential choice as a multi-factorial process  II. The RefRhet corpus and the machine learning-based approach  III. The probabilistic character of referential choice

9 99 Multi-factorial character of referential choice  Many factors of referential choice  Distance to antecedent Along the linear discourse structure Along the hierarchical discourse structure  Antecedent role  Referent animacy  Protagonisthood  None of these factors alone can explain referential choice

10 Factors integration  At every poing in discourse factors are somehow summed and give rise to an integral characterization – the referent’s activation score  Activation score is the referent’s status with respect to the speaker’s working memory  Activation score predetermines referential choice  Low  full NP  Medium  full or reduced NP  High  reduced NP

11 Multi-factorial model of referential choice (Kibrik 1999) Various properties of the referent or discourse context Referent’s activation score Referential choice Activation factors

12 Modeling multi-factorial processes: machine learning-based methods  Neural networks approach (Gruening and Kibrik 2005)  Machine learning algorithm Automatic selection of factors’ weights Automatic reduction of the number of factors («pruning»)  However: Small data set Single method of machine learning Low interpretability of results  Hence a new study  Large corpus  Implementation of several machine learning methods  Statistical model of referential choice

13 The RefRhet corpus  English  Business prose  Initial material – the RST Discourse Treebank  Annotated for hierarchical discourse structure  385 articles from Wall Street Journal  The added component – referential annotation  The RefRhet corpus  Over referential expressions

14 Example of a hierarchical graph

15 Scheme of referential annotation  The ММАХ2 program  Krasavina and Chiarcos 2007  All markables are annotated, including:  Referential expressions  Their antecedents  Coreference relations are annotated  Features of referents and context are annotated that can potentially be factors of referential choice

16

17 Work on referential annotation  O. Krasavina  A. Antonova  D. Zalmanov  A. Linnik  M. Khudyakova  Students of the Department of Theoretical and Applied Linguistics, MSU

18 Current state of the RefRhet referential annotation  2/3 completed  Further results are based on the following data:  247 texts  110 thousand words  markables 7097 proper names 8560 definite descriptions 1797 third person pronouns  3756 reliable pairs «anaphor – antecedent» Proper names — 1623 (43%) Definite descriptions — 971 (26%) Pronouns — 1162 (31%)

19 Factors of referential choice  Properties of the referent:  Animacy  Protagonisthood  Properties of the antecedent:  Type of syntactic phrase (phrase_type)  Grammatical role (gramm_role)  Form of referential expression (np_form, def_np_form)  Whether it belongs to direct speech or not (dir_speech)

20 Factors of referential choice  Properties of the anaphor:  First vs. nonfirst mention in discourse (referentiality)  Type of syntactic phrase (phrase_type)  Grammatical role (gramm_role)  Whether it belongs to direct speech or not (dir_speech)  Distance between the anaphor and the antecedent:  Distance in words  Distance in markables  Linear distance in clauses  Hierarchical distance in elementary discourse units

21 Goals for the machine learning-base study  Dependent variable:  Form of referential expression (np_form)  Binary prediction:  Full NP vs. pronouns  Three-way prediction:  Definite description vs. proper name vs. pronoun  Accuracy maximization:  Ratio of correct predictions to the overall number of instances 21

22 Machine learning methods (Weka, a data mining system)  Easily interpretable methods:  Logical algorithms Decision trees (C4.5) Decision rules (JRip)  Higher quality:  Logistic regression  Quality control – the cross-validation method

23 Examples of decision rules generated by the JRip algorithm  (Antecedent’s grammatical role = subject) & (Hierarchical distance ≤ 1.5) & (Distance in words ≤ 7) => pronoun  (Animate) & (Distance in markables ≥ 2) & (Distance in words ≤ 11) => pronoun 23

24 Main results  Accuracy  Binary prediction:  logistic regression – 86.1%  logical algorithms – 85%  Three-way prediction:  logistic regression – 74%  logical algorithms – 72% 24

25 Comparison of single- and multi-factor accuracy FeatureThree-way prediction Binary prediction The largest class43%69% Distance in words55%76% Hierarchical distance53.5%74.8% Anaphor’s grammatical role 45.2%70% Anaphor in direct speech 43.8%70% Animate47.3%71.5% Combination of factors 74%86.1% 25

26 Referential choice is a probabilistic process  According to Kibrik 1999 Potential referential expressions Actual referential expressions Full NP only (19%) Full NP (49%) Full NP, ?pronoun (21 %) Pronoun or full NP (28%) Pronoun (51%) Pronoun, ?full NP (23%) Pronoun only (9%)

27 Probabilistic character of referential choice in the RefRhet study  Prediction of referential choice cannot be fully deterministic  There is a class of instances in which referential choice is random  It is important to tune up the model so that it could process such instances in a special manner  We are working on this problem  Logistic regression generates estimates of probability for each referential option  This estimate of probability can be interpreted as the activation score from the cognitive model

28 Probabilistic multi-factorial model of referential choice Activation score = probability of using a certain referential expression Referential choice Activation factors Various properties of the referent or discourse context

29 Conclusions about the RefRhet study  Quantity: Large corpus of referential expressions  Quality: A high level of accurate prediction is already attained  And this is not the limit  Theoretical significance: the following fundamental properties of referential choice are addressed:  Multi-factorial character of referential choice  Probabilistic character of referential choice  This approach can be applied to a wide range of linguistic and other behavioral choices