From Linguistic Annotations to Knowledge Objects Bonnie Dorr Saif Mohammad Boyan Onyshkevych 11/14/2008.

Slides:



Advertisements
Similar presentations
Sequencing and Communicative Function in Complex Dialogs Rebecca Passonneau and Owen Rambow becky, Center for Computational Learning.
Advertisements

TTO3 Semantic Annotation Analysis Bonnie Dorr September 9, 2008.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Intellectual Challenge of Teaching
David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie.
Semantic Annotation Evaluation and Utility Bonnie Dorr Saif Mohammad David Yarowsky Keith Hall.
Annotating Modality Marjorie McShane and Sergei Nirenburg UMBC An analyst will benefit from being able to distinguish what Al-Qaeda can/might/is trying.
1 Framework Programme 7 Guide for Applicants
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Semantic Annotation & Utility Evaluation Meeting: Feb 14, 2008 Project Organization: Who is here? Agenda Meaning Layers and Applications Ongoing work.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
Websites Revision Guides
Automatic Writing Evaluation
Compatible with the latest browsers; Chrome, Safari, Firefox, Opera and Internet Explorer 9 and above.
Presented by: Darcy L. Hitesman
contrastive linguistics
PeerWise Student Instructions
UNIVERSAL DESIGN TO INCLUDE ALL LEARNERS
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CSC 594 Topics in AI – Natural Language Processing
Project Controls: As-Built S-Curves
Games Development 2 semester 1 Review & Revision
Chapter 7: Entity-Relationship Model
Erasmus University Rotterdam
INTRODUCTION.
Release of PARCC Student Results
Automatic Hedge Detection
J200/02 Music and News Annotated Sample Assessment Materials.
The Club Health Assessment
contrastive linguistics
Classroom Assessment Validity And Bias in Assessment.
THE QUESTIONS—SKILLS ANALYSE EVALUATE INFER UNDERSTAND SUMMARISE
July 2010 doc.: IEEE /0xxxr0 A summary of draft LS from 3GPP in response to IEEE 802 LS in March May 2017 Authors: Name Company Phone.
Child Outcomes Summary (COS) Process Module
© Paradigm Publishing, Inc.
Child Outcomes Summary (COS) Process Training Module
Social Knowledge Mining
Office of Education Improvement and Innovation
EVAAS Overview.
Lecture 12: Data Wrangling
Entity Relationship Diagrams
Module 5: Data Cleaning and Building Reports
Instructional Learning Cycle:
CS 430: Information Discovery
Child Outcomes Summary (COS) Process Module
Child Outcomes Summary (COS) Process Module
Project Management Process Groups
Essentials of Oral Defense (English/Chinese Translation)
CSE 635 Multimedia Information Retrieval
Notice! This file is a ‘disabled’ file. It is not complete. Slides have been left out and other info is lacking. I have posted this file for general information.
Information System Analysis
July 2010 doc.: IEEE /0xxxr0 A summary of draft LS from 3GPP in response to IEEE 802 LS in March May 2017 Authors: Name Company Phone.
Chapter 11 Describing Process Specifications and Structured Decisions
COMPARATIVE Linguistics 2018/2019
Child Outcomes Summary (COS) Process Module
contrastive linguistics
contrastive linguistics
Testing Schedule.
Information Retrieval
MTPD Technology Overview
Presentation transcript:

From Linguistic Annotations to Knowledge Objects Bonnie Dorr Saif Mohammad Boyan Onyshkevych 11/14/2008

Overarching Goals Produce knowledge elements Build an explicit model of the world based on explicit and implicit language data Enable higher-order reasoning operate on knowledge units rather than on annotated raw text infer relationships, states of affair, sentiments, beliefs, etc.

Current and Next Phases Phase I (Linguistic Annotation of semantics): Annotate raw text with entity descriptions, co-reference information, semantic categories, lexical-semantic relations, temporal information, thematic role information, modality, etc. Local information on single sentences and documents Phase II (Knowledge Units): Automatically produce language-independent structured representations of knowledge (entities, relations, events, opinions, scenarios, etc.) derived from unstructured text and speech in a wide variety of languages and genres. Knowledge units based on aggregate information across multiple documents 3

HLT COE Team: Project Organization BBN (Ramshaw, Habash) Temporal Annotation Coreference (complex) CMU (Mitamura, Levin, Nyberg) Coreference Entity relations Committed Belief JHU/CLSP (Yarowsky) Latent property extraction (e.g. gender/age/1st-lang/occupation) Meaning Specification Assessment Coordination (Center of Excellence) UMCP (Dorr, Mohammad) Lexical semantic features (e.g., synonymy, antonymy) and use of belief for detection of contradiction, sentiment, entailment. Columbia (Rambow, Passonneau) Dialogic Content Committed Belief UMBC (Nirenburg, McShane) Modality: epistemic, belief, volitive, etc. Affiliated Efforts Ed Hovy Martha Palmer George Wilson (Mitre) 4

Definitions and Examples Linguistic annotations: tags on raw data From: <Email>EZB</Email> <Name>Ed Z Boss</Name> To: <Email>SAS</Email> <Name>Sara A Secky</Name> <Request-Action>Please <CB>request</CB> a meeting with <Name>Bob</Name> and <Name>Marla</Name> at <Time>2pm</Time> <Date>tomorrow</Date></Request-Action>. Knowledge Objects: representational entities over which a system may make inferences Knowledge objects may be derived from linguistic annotations or from other indicators Systems might ultimately infer person-person relationships, event-event relationships, sentiment, and other important information about the world. E1= Meet(P2,P3,P4) T(E1)= <11/20/2001 14:00> P1=Sara P2=Ed Relation: Subordinate (P1,P2) [Conf=0.9] P3=Bob P4=Marla

Phase I: Linguistic Annotation Modality Sheikh Mohamed announced that "we want[modality=volitive; value=1] to make Dubai a new trading center." Temporal Types and Relations Sheikh Mohamed announced[Past.Say,Before<writer>] that "we want[Present.State,After,Before,Concurrent(announced)] to make[Unspec.State,After(want)] Dubai a new trading center." Committed Belief Sheikh Mohamed announced[CB] that "we want[NCB] to make[NA] Dubai a new trading center." Dialog Acts Please let me know what you’d like me to do [Request: answer to [or(M1.5,M1.6)]]#flink1.5(commission to pay Pasadena now) or (commission to pay Pasadena after Jul-Aug)

Annotation Data and Task Selection and manual annotation of 10,000 words of English and Arabic containing a variety of topics/ genre and some parallel views on the same topics. Rule of thumb: When annotating an Arabic document with a parallel English translations, the representations produced for those two documents should be fundamentally the same. The representations should not be just a syntactic parse, but should contain meaning units that go beyond surface-form issues.

Annotation Corpora Features Multilingual: IAMTC, IBM Hand-aligned corpus, Harmony Multi-translation: IAMTC Multi-document for same entities: Enron, AQUAINT, Arabic Gigaword Conversation: Switchboard, Ontonotes news conversation, Enron, Harmony Persuasion: Enron, Indianapolis museum request to join society Correspondence: Enron, Indianapolis museum request to join society Instructions: Enron, Harmony Opinions: Switchboard

Detailed Example: Modality The minister, {who has his own website}, also said: "I want [TYPE=VOLITIVE, VALUE=.8] Dubai to be the best [TYPE=EVALUATIVE, VALUE=1] place in the world for state-of-the-art technology companies." The minister {who has a personal website on the internet}, further said that he wanted [TYPE=VOLITIVE,VALUE=1] Dubai to become the best [TYPE=EVALUATIVE, VALUE=1] place in the world for the advanced (hitech) technological companies. Equivalence: Strict = 50%, Loose = 100% Note: {} units omitted for simplicity

Detailed Example: Temporal Parse E1: The minister who has(يملك) his own website also said(واضاف) I want(اريد) Dubai to be the best place(تصبح) in the world for companies. E2: The minister who has(يملك) a personal website further said(واضاف) he wanted(اريد) Dubai to become the best place(تصبح) in the world for companies. Time Unit Type Relation Parent has(يملك) Present.State* After/Before/Conc said said(واضاف) Past.Say* After <announced> want(اريد) Present.State*+ After/Before/Conc said place(تصبح) Unspec.State* After want This is the second sentence in the Dubai corpus (with both English and Arabic tokens). The time units, types, relations, and parents are spelled out in the Unit/Type/Relation/Parent table. Mismatches are marked with * (and described) below. The table at the bottom presents the overall equivalences. The two English cases have almost an exact overlap, with the exception of the Temporal Type of “want” (marked with a +), resulting in 75% Type equivalence. Note that the first text uses “want” and the second text uses “wanted”, so one of the two texts labeled this unit as Past.State. English-Arabic overlaps significantly less. Mismatches are marked with *. Regarding Type matches, a labeling error appears to have taken “said” to be Past.Event instead of Past.Say; so this is a Type mismatch. There are two additional type mismatches due to labeling errors: “want” and “has” are marked as Events rather than States. It is unclear whether the Arabic labeling is wrong here! In fact, in several other English documents, words like this are labeled as Events, not States. Finally, “place” is marked as Future.State rather than Unspec.State. (Again, it is unclear which is correct.) In any case, these four tokens are considered to be a mismatch, resulting in a 0% type equivalence. Regarding Parent matches, “said” has a mismatched Parent label of Before(<writer>) in Arabic due to confusion over whether this should be tied back to the “announced” in the previous sentence or to the general speaker context. English uses After(announced). The Parent mismatch mentioned above results in an 75% parent match for Arabic-English. The relation mismatch is not counted because units with parent mismatches are tossed out before computing the relation matches. (So there are three remaining cases.) Finally in Arabic, the word “want” is labeled Concurrent with “said”, so this is *partial* match with the English After/Before/Concurrent(said). Thus, 3 out of 3 relations are considered matched (100%). Equivalence English-English Arabic-English Type Match 75% 0% Parent Match 100% Relation Match

Detailed Example: Committed Belief E1: The minister, who has(يملك) his own website(موقعا), also said(واضاف): "I want(اريد) Dubai to be(تصبح) the best place in the world for state-of-the-art technology companies." E2: The minister who has(يملك) a personal website(موقعا) on the internet, further said(واضاف) that he wanted(اريد) Dubai to become(تصبح) the best place in the world for the advanced (hitech) technological companies. Belief Unit Type has(يملك) CB website(الانترنت)* CB said(واضاف) CB want(اريد) NCB be(تصبح) NA Equivalence Arabic-English English-English Belief Unit Match 80% 100% Belief Type Match The two English texts match exactly on belief units and belief types. There is one belief unit (marked with *) that was not labeled in Arabic (resulting in 80% equivalence for belief units), but the belief types matched for all labeled units (100% equivalence). As with other representations seen so far, it would be easy to imagine superimposing belief units on TMRs, specifically, the “belief” (epistemic) modality.

Detailed Example: Dialog Acts #<M1.5>#Do you want me to pay Pasadena on Friday for these things? #<M1.6>#or do you want me to hold off until I finish July and August[Request-info-either/or: comission to pay Pasadena or to delay paying Pasadena]#flink1.5(commission to pay Pasadena now) or (commission to pay Pasadena after Jul-Aug) . #<M1.11>#Please let me know what you’d like me to do [Request: answer to [or(M1.5,M1.6)]]#flink1.5(commission to pay Pasadena now) or (commission to pay Pasadena after Jul-Aug)

Overall Assessment of Linguistic Annotations Manual Annotation  Agreement Percentage Multi-translation Equivalence Multi-language Equivalence Modality 95% 75% equivalence NA (no Arabic) Temporal Type 100% equivalence 62% equivalence Inherent Time 96% 93% equivalence 85% equivalence Temporal Parent 85% 92% equivalence Temporal Relations 97% Committed Belief NA (not doubly annotated) 90.5% equivalence Dialog Acts NA (no multi-trans) NA (no parallel docs) The focus was on determining the agreement percentages (in the doubly-annotated cases), the equivalence of representations across multiple English translations (in the doubly-translated cases), and the equivalence of representations between Arabic and English (in the multi-language cases). Issue: It was impossible to judge agreement or equivalences for dialog acts because there were no doubly annotated documents and no multi-translation or parallel documents in the relevant genre (conversational text). The latter part of this talk will focus on a genre that brings in dialog acts.

Relation Between Linguistic Annotations Committed Belief ties into TMR modalities belief, epistemic, etc. Possible to map TMR onto a temporal parse to link concepts, modality, and time: SAID(واضاف) [Past.Say, After(<announced>)] HAS(يملك) [Present.State, After(SAID)] WANT(اريد) [Present.State, After/Before/Concurrent(SAID), volitive=.8] PLACE(تصبح) [Unspec.State, After(WANT),evaluative=1] It is possible to write out temporal parses in a style similar to the structure of TMRs (linking concepts, modality & time), where indentation means “child”, but siblings have a temporal ordering (with respect to their parent as well as to each other). Sentence 1: The minister who has(يملك) his own website also said(واضاف) I want(اريد) Dubai to be the best place(تصبح) in the world for companies. Sentence 2: The minister who has(يملك) a personal website further said(واضاف) he wanted(اريد) Dubai to become the best place(تصبح) in the world for companies.

Preliminary Automatic Results (Committed Belief, Latent Properties) Automatic Annotation: Committed Belief (CMU) Precision Recall F-score (1) Baseline (1) 55.80 27.37 36.73 (2) Best contextual features 57.66 34.69 43.32 (3) Adding lemma to (2) 60.98 33.88 43.55 (4) Adding POS to (3) 52.94 46.34 49.42 (5) Adding POS to (4) 45.53 47.46 (6) Adding ngram features to (5) 49.62 42.82 49.84 (7) Combining (6) and (4) 54.43 46.61 50.22 (8) Adding up to 4 character ngrams from the beginning and end of words to (2) 57.77 51.43 (9) Combining (8) and (4) 55.94 48.51 51.96 (10) Like (9) but context is different 57.19 52.49 Preliminary results: We are nearing the end of Phase I, wherein our plan is to automate the annotation process using ML approaches. Committed Belief and Latent Properties are among the first attempts at automatic ML-based annotation in this project. ISSUE 1: Despite increases of 20 points, committed belief numbers (supervised SVM approach) indicate more research is needed. ISSUE 2: Results for Latent Properties (lexical and non-lexical models) already very high; additional properties will be addressed: Education, Occupation, Descriptive properties, Social Relations, Rank, Affiliation, etc. (Note: This is a combination SVM/Weka approach that takes advantage of non-lexical features such as mean utterance length, speaker rate, syntactic structure, passive/active usage.) Automatic Annotation: Latent Properties (JHU) Accuracy Gender 97% Age 94%

Phase II: Moving Toward Knowledge Units Linguistic Annotations Lex. Relations Dialog Units Comm Belief Modality Temporal Units Attitudes Sentiment Beliefs Intention Knowledge Units Person Event Personal Attributes Relations Time(line) CONFIDENCE

Matrix: Linguistic Annotation and Knowledge Units

Matrix: Linguistic Annotation and Knowledge Units (continued)

Entity-Centric Presentation Metaphor FROM Linguistic Annotations: discourse units, lexical relations, named entities, committed belief, modality TO Knowledge Objects: attributes, person-person/event-event relations, sentiment and/or attitudes, beliefs, intention/motivation, state of affairs TASK: Populate descriptive facets (either pre-defined or dynamically created) associated with a person or event by reasoning over knowledge objects. EXAMPLES: Person-person relations: oppose, support, contradict, refute (related to sentiment, attitudes).  Event-Event relations: precursor, causal, consequence, super-event, etc. Personal Attributes: Name, Age, Height, DOB Likes, Dislikes [Sentiment/attitudes, relations such as oppose, contradict, etc.] People You Know [Relational information, e.g., sibling, co-worker, etc.] Groups [Purpose/attitudes/sentiments of groups] Status [State of affairs] Activities [Purpose/attitudes/sentiments of groups] Goals [Desired states of affairs, intentions/motivations]

Exercise in October From dialogic text, can we derive: A set of entities A set of events A plan or desired state of affairs A set of relations among entities (or events) Times of events (or plans) Biographical information about entities

Derivation of knowledge units may involve: Mapping from a particular linguistic annotation type Mapping from combinations of linguistic annotation types None of the above: It might be possible to derive knowledge units directly from text (e.g., inferring personal attributes from lexical or non-lexical information).

Building a Plan from Knowledge Units Relation: Subordinate (P1,P2) [Conf=0.9] P2=Ed P1=Sara P3=Bob P4=Marla Relation: Subordinate (P1,P4) [Conf=0.7] Relation: Subordinate (P1,P3) [Conf=0.7] Subject: Meeting with Bob and Marla Date: Mon, 19 Nov 2001 08:38:25 From: Ed Z. Boss To: Sara A Secky Please request a meeting with Bob and Marla at 2pm tomorrow. - Ed Subject: Re: Meeting with Bob and Marla Date: Mon, 19 Nov 2001 08:50:05 From: Sara A Secky To: Ed Z. Boss How about Wednesday at 3pm? – Sara Subject: Re: Meeting with Bob and Marla Date: Mon, 19 Nov 2001 09:20:50 From: Ed Z. Boss To: Sara A Secky Fine. See you then. E1= Meet(P2,P3,P4) T(E1)= <11/20/2001 14:00> T(E1)= <11/21/2001 15:00> P1: Gender: M Age: 50 P2: Gender: F Age: 32

Corpus Example Involving Dialog Units M1.1. Kim: M1.2. I have completed the invoices for April, May and June M1.3. and we owe Pasadena each month for a total of \$3,615,910.62. M1.4. I am waiting to hear back from Patti on May and June to make sure they are okay with her. M1.5. Do you want me to pay Pasadena on Friday for these months M1.6. or do you want me to hold off until I finish July and August? M1.7. Again, I do not have all of the information for July and August, M1.8. so I cannot give you any numbers. M1.9. If I go by what is currently in the system as a guide, Pasadena would owe Enron a little over \$1 mil. M1.10. I need to forecast the money today, M1.11 so please let me know what you . M4.1. Patti is the one with the details, M4.2. I’m just the deal maker M4.3. and don’t have access to any of the systems. M4.4. All I know is what fixed priced baseload deals we have. M4.9. Kim

Annotation Example for Exercise Committed Belief Discourse Modality ID Type Time Parent TempRel 138 #<M1.11>#   [Request: answer to [or(M1.5,M1.6)]]#flink1.5(commission to pay Pasadena now) or (commission to pay Pasadena after Jul-Aug) 139 so 140 please 141 let CB(141-143) 32 B C 29 142 me 143 know 33 U A 144 what 145 you 146 would CB 147 like NA 34 BCA 148 to 149 do volitive = .7 35 E 150 #</M1.11># Relation: Subordinate (P1,P2) What would the final plan be and how do P1, P2, E1, E2 play a role? B = State, C = Current time CB = committed belief Request Modality = volitive(.7) P1 P2

Resulting Knowledge Unit Diagram from October Exercise 2 P1 = Kim Ward Age: Gender: Ed: First Lang: Nationality: Deal-maker Org1:___ Communicate with Subordinate Confidence 0.6 Refuse request HQ Financial Services and forecaster Loc:Pasadena No access 474 P2 = Megan Plan: Pay1(Org2, Org1, Amt1) NotTime(Now, Pay1) Time(> T1, Pay1) Age: Gender: Ed: First Lang: Nationality: Owe $Y Owe $X Communicate with Communicate with Employed-By 305 P3= Patti Org2: Enron T1: <Jul-Aug> Amt1 F($Y, $X) Age: Gender: Ed: First Lang: Nationality: Used by Enron Fin Systems Employed-By Loc:___ 305 P4= Janinie Age: Gender: Ed: First Lang: Nationality:

Summary and Next Steps Phase I has resulted in linguistic annotations with relatively high language equivalences for multi-translation and multi-language cases. Preliminary results for automatic annotation of latent properties and committed belief indicate that these are promising avenues for continuing research. Phase II will focus on automatically induced knowledge units that may be derived from linguistic annotations or from independent properties of the input text. We expect that the automatically produced knowledge objects will be crucial for language analysis systems and will improve the performance of those systems. Areas of focus for upcoming Phase II may include personal attributes (latent properties), person-person relations, and state of affairs (which may include belief/intention/sentiment). Confidence values are another critical aspect of information that may enable a more focused analysis of incoming data.

References Mohammad, Saif, Bonnie J. Dorr, and Graeme Hirst, "Towards Antonymy-Aware Natural Language Applications," Proceedings of NSF Symposium on Semantic Knowledge Discovery: Organization and Use, New York University, November, 2008. McShane, Marjorie, Sergei Nirenburg and Stephen Beale. 2008. Paraphrasing for Memory Management in Conversational Agents. To appear in Proceedings of AAAIFall Symposium on Naturally Inspired AI. Arlington, VA. November, 2008. Mohammad, Saif, Bonnie Dorr, and Graeme Hirst, “Computing Word-Pair Antonymy,” Proceedings of EMNLP-2008. McShane, Marjorie, Sergei Nirenburg and Stephen Beale. 2008. Resolving Paraphrases to Support Modeling Language Perception in an Intelligent Agent. Presented at STEP 08, Venice, September. Dorr, Bonnie J., David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith J. Miller, Teruko Mitamura, Owen Rambow, Florence Reeder, Advaith Siddharthan. "Interlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation," under review for JNLE, 2008. Eric Nyberg, Eric Riebling, Richard C. Wang and Robert Frederking. “Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP,” LREC 2008, Marrakech, Morocco, May 31, 2008,

Reserve Slides

Detailed Example: Modality The minister, {who has his own website}, also said: "I want Dubai to be the best place in the world for state-of-the-art technology companies." ASSERTIVE-ACT-70(say) MINISTER-67(minister) MODALITY-71(want)[TYPE=VOLITIVE,VALUE=.8] MODALITY-6()[TYPE=EVALUATIVE,VALUE=1] GEO-POL-ENT-74(Dubai) FOR-PROF-CORP-79(company) TECHNOLOGY-78(technology) The minister {who has a personal website on the internet}, further said that he wanted Dubai to become the best place in the world for the advanced (hitech) technological companies. ASSERTIVE-ACT-941(say) MINISTER-936(minister) DISCOURSE-940(further) MODALITY-942(want)[TYPE=VOLITIVE,VALUE=1] CHANGE-EVENT-944(become) MODALITY-2()[TYPE=EVALUATIVE,VALUE=1] GEO-POL-ENT-943(Dubai) FOR-PROF-CORP-949(company) Equivalence: Strict = 50%, Loose = 100% Note: {} units omitted for simplicity

Detailed Example: Temporal Parse E1: The minister who has(يملك) his own website also said(واضاف) I want(اريد) Dubai to be the best place(تصبح) in the world for companies. E2: The minister who has(يملك) a personal website further said(واضاف) he wanted(اريد) Dubai to become the best place(تصبح) in the world for companies. Time Unit Type Relation Parent has(يملك) Present.State* After/Before/Concurrent said said(واضاف) Past.Say* After <announced>(prev sent)* want(اريد) Present.State*+ After/Before/Concurrent said place(تصبح) Unspec.State* After want Equivalence English-English Arabic-English Type Match 75% 0% Parent Match 100% Relation Match <announced> said This is the second sentence in the Dubai corpus (with both English and Arabic tokens). The time units, types, relations, and parents are spelled out in the Unit/Type/Relation/Parent table. Mismatches are marked with * (and described) below. A graphical form of this table is shown in the box to the right. The table at the bottom presents the overall equivalences. The two English cases have almost an exact overlap, with the exception of the Temporal Type of “want” (marked with a +), resulting in 75% Type equivalence. Note that the first text uses “want” and the second text uses “wanted”, so one of the two texts labeled this unit as Past.State. English-Arabic overlaps significantly less. Mismatches are marked with *. Regarding Type matches, a labeling error appears to have taken “said” to be Past.Event instead of Past.Say; so this is a Type mismatch. There are two additional type mismatches due to labeling errors: “want” and “has” are marked as Events rather than States. It is unclear whether the Arabic labeling is wrong here! In fact, in several other English documents, words like this are labeled as Events, not States. Finally, “place” is marked as Future.State rather than Unspec.State. (Again, it is unclear which is correct.) In any case, these four tokens are considered to be a mismatch, resulting in a 0% type equivalence. Regarding Parent matches, “said” has a mismatched Parent label of Before(<writer>) in Arabic due to confusion over whether this should be tied back to the “announced” in the previous sentence or to the general speaker context. English uses After(announced). The Parent mismatch mentioned above results in an 75% parent match for Arabic-English. The relation mismatch is not counted because units with parent mismatches are tossed out before computing the relation matches. (So there are three remaining cases.) Finally in Arabic, the word “want” is labeled Concurrent with “said”, so this is *partial* match with the English After/Before/Concurrent(said). Thus, 3 out of 3 relations are considered matched (100%). has want place TIME

Detailed Example: Committed Belief E1: The minister, who has(يملك) his own website(موقعا), also said(واضاف): "I want(اريد) Dubai to be(تصبح) the best place in the world for state-of-the-art technology companies." E2: The minister who has(يملك) a personal website(موقعا) on the internet, further said(واضاف) that he wanted(اريد) Dubai to become(تصبح) the best place in the world for the advanced (hitech) technological companies. Belief Unit Type has(يملك) CB website(الانترنت)* CB said(واضاف) CB want(اريد) NCB be(تصبح) NA Equivalence Arabic-English English-English Belief Unit Match 80% 100% Belief Type Match The two English texts match exactly on belief units and belief types. There is one belief unit (marked with *) that was not labeled in Arabic (resulting in 80% equivalence for belief units), but the belief types matched for all labeled units (100% equivalence). As with other representations seen so far, it would be easy to imagine superimposing belief units on TMRs, specifically, the “belief” (epistemic) modality.

Detailed Example: Dialog Acts #<M1.5>#Do you want me to pay Pasadena on Friday for these things? #<M1.6>#or do you want me to hold off until I finish July and August[Request-info-either/or: comission to pay Pasadena or to delay paying Pasadena]#flink1.5(commission to pay Pasadena now) or (commission to pay Pasadena after Jul-Aug) . #<M1.11>#Please let me know what you’d like me to do [Request: answer to [or(M1.5,M1.6)]]#flink1.5(commission to pay Pasadena now) or (commission to pay Pasadena after Jul-Aug)

Manual Agreement Percentages for Modality Item Strict Loose* Example Belief (13) 96% 46% 100% Expect = .9 Epistemic (8) 94% 63% 88% May = .8 Epiteuctic (5) 91% 34% 95% Get = 1 Obligative (62) 99% 98% Should = .7 Permissive (15) 60% Allow = .8 Potential (31) 74% 87% Can = .8 Volitive (19) 37% Prefer = .7 Evaluative (11) Best = 1 OVERALL (164) 97% 52%   Agreement percentages were computed for modality on the basis of the documents that were double-annotated by UMBC. (There were 150 double-annotated sentences from the 10,000 word corpus.) The numbers shown below were computed over the entire set of texts. The “Item” column shows the percentage of units where the annotators agreed on the item to be labeled. This is an F-score, where one annotator is taken as the gold standard and the other is measured against that standard. It is possible to swap these, or to compute both and take an average. The “Strict” column requires an exact match of labels for those items that were agreed upon by the annotators. This number is based on the strict intersection of modality annotations over the number of labeled units. The “Loose” column requires labels that are within .2 for those items that were agreed upon by the annotators. This number is based on a loose intersection of modality annotations over the number of labeled units. 95% “loose” agreement overall; most mismatches are due to minor differences in modality values, e.g., “want” is volitive=0.7 vs. 0.8. Highest strict mismatches: Volitive (“want”) and Potential (“can”) Highest loose mismatches: Epiteuctic (“get”), Obligative (“should”), Belief (“expect”)

Language Equivalences: Modality Item Strict Loose* Example Potential/Permissive (1) 100% 0% Can/Allow = .8 Volitive (2) 50% Want = .7 Evaluative (1) Best = 1 OVERALL (4) 75%   UMBC annotated the two English versions of the Dubai text for modality—for three sentences only. (They did not attempt the Arabic version.) Results of MR equivalence across these translations are shown in the table below. The “Item” column shows the percentage of units for which the documents contained the same items to be labeled. The modality units overlapped exactly on four items (100%). The “Strict” column requires an exact match of labels for those items that were agreed upon by the annotators. (This number is based on the strict intersection of modality annotations over the number of labeled units.) The “Loose” column requires labels that are within .2 for those items that were agreed upon by the annotators. (This number is based on a loose intersection of modality annotations over the number of labeled units.) Findings: Overall loose agreement is 75%. Non-equivalence occurred for subtle language distinctions, permissive (“would be able”) vs. potential (“would be possible”) .

Manual Agreement Percentages: Temporal Parsing Matches Clashes Agreement Temporal Type 104 6 94.5% Inherent Time 106 4 96.4% Parent Pointer 93 17 84.5% Exact Match Partial Match Clashes Exact Agree Partial Agree Temporal Relations 81 9 3 87.1% 96.8% BBN doubly annotated four files for temporal parsing information: 410_nyt, 419_apw, 602CZ, ENRON. There were 110 cases that both annotators marked as time units. Note: They agreed well over 90% of the time on which units to label. First Table: Sticking to the 110 they agreed upon, the top table scores agreement on the choice of temporal type, inherent time, and parent node. The relatively low parent pointer scores reflect some hard choices about when a sentence should link back to the previous sentence vs linking to the general speaker context. BBN had trouble coming up with a clear definition of this distinction in the guidelines.) Second Table: For the 93 cases where the annotators picked the same parent node, the last table scores their choice of temporal relationship type. Temporal relation values can be multiple; if the child's time is both before and concurrent with the parent, the relation is coded "BC". So "exact match" means the same set of values, while a "partial match" would be if one annotator said "B" and other said "BC". Findings: Very high agreement (about 97%) Largest contributing factor in mismatches were “parent pointers” — when a sentence should link back the previous sentence vs. general speaker context (guidelines unclear on this point).

Language Equivalences: Temporal Parsing Matches Clashes Equivalence Temporal Type 21 6 77.8% Inherent Time 24 3 88.9% Parent Pointer Exact Match Partial Match Clashes Exact Equiv Partial Equiv Temporal Relations 21 2 1 87.5% 95.8% BBN annotated the two English versions of the Dubai text, and also the Arabic-English versions of this same text, for temporal type, inherent time, parent pointer, and temporal relations. (This was for three sentences only, as in the case of modality.) Annotators agreed on 27 overlapping labeled. First Table: Sticking to these 27 units, the top table scores the representational equivalence for temporal type, inherent time, and parent node. The relatively low parent pointer scores reflect some hard choices about when a sentence should link back to the previous sentence vs linking to the general speaker context. BBN had trouble coming up with a clear definition of this distinction in the guidelines. Second Table: For the 24 cases where the parent node matched, the last table scores overlap of temporal relationship type. Temporal relation values can be multiple; if the child's time is both before and concurrent with the parent, the relation is coded "BC". So "exact match" means the same set of values, while a "partial match" would be if one annotator said "B" and other said "BC". Findings: Equivalence was much higher for multi-translations of the same document (around 100% equivalence for temporal type, inherent time, parent pointer) and much lower for the multilingual English/Arabic case (as low as 54% in the case of temporal type, high 70’s to low 80’s for inherent time and parent pointers). The averages above hide this point. Temporal relations were 100% equivalent (both exact & partial) for the multi-translation case, but lower (70% exact, 90% partial) for the multi-lingual case. Again, the averages above hide this point. Mismatches primarily due to annotator confusion over: 1. States vs. Events (e.g., “want” and “make” labeled as States in English but Events in Arabic) 2. Events vs. Say (e.g., “say” labeled as Say in English but Event in Arabic) 3. Tense errors (e.g., “said” is Past in English, but Present in Arabic), and unclear guidelines about parent pointers.

Language Equivalences: Committed Belief Labeled Text 1 only Text 2 only F-Score Multilingual (English/Arabic) 21 18 2 67.7% Multi-Translation (English 1, English 2) 28 11 83.6% OVERALL 49 13  76.0% Assigned Match Equivalence Multilingual (English/Arabic) 21 19 90.5% Multi-Translation (English 1, English 2) 28 100.0% OVERALL 49 47  95.9% CMU and Columbia annotated the Dubai text for committed belief in English and Arabic (Multilingual) and in a second English translation of the Arabic (multi-translation). There were no doubly-annotated documents. Thus, this analysis is only for the multi-translation and multi-language equivalence cases. First table: There were 21 cases that were labeled with “belief” in the multilingual texts. This represented a 68% overlap of belief-labeled units in the multilingual case. There were 28 cases that were labeled with “belief” in the multi-translation texts. This represented 84% overlap of belief-labeled units in the multi-translation case. (Note: The recall, precision, and F-score numbers are computed by taking one annotator as the gold standard and the other as the test case to be measured against that standard. It is possible to swap these, or to compute both and take an average.) Second table: Sticking to the overlapping 21 and 28 labeled units for multilingual texts and multi-translation texts, respectively, the table below shows the degree of equivalence of the belief annotations. Findings: Overlap was adversely impacted by cases where one of the two English texts omitted words: “the cost of X was estimated at Y dollars” vs. “X was estimated at Y dollars” (i.e., “cost” was labeled in one text, but not the other) Some multi-language inconsistencies: English “estimated at” = CB in English, NCB in Arabic English “be possible” = NA in English, NCB in Arabic Language distinctions: Different or omitted tokens for English “engaged in” and “including” caused mismatches.