Presentation is loading. Please wait.

Presentation is loading. Please wait.

 TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo.

Similar presentations


Presentation on theme: " TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo."— Presentation transcript:

1  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo Kong Linguistic Data Consortium {strassel, mlglenn, junbok}@ldc.upenn.edu www.ldc.upenn.edu/Projects/TDT5

2  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 What’s new in TDT5?  Same fundamental concepts  Story, event, topic  New multilingual corpus  Much larger than previous corpora  Newswire only  New topic selection strategy, more topics  250 topics; ~25% multilingual  New topic labeling strategy  Search-guided, but time-limited  New annotation toolkit, infrastructure  Multilingual, multiplatform, database  Highly customized for TDT task

3  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Basic Concepts  STORY  In TDT2, story is “a section containing at least two independent declarative clauses on same topic”  In TDT3, definition modified to capture annotators’ intuitions about what constitutes story  Distinguish “preview/teaser” and complete news story  TDT4 preserves this content-based story definition  In TDT5, no manual story segmentation  Newswire comes with story boundaries; all documents are stories  EVENT  A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences  TOPIC  An event or activity along with all directly related events and activities

4  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Corpus Overview  April – September 2003  Newswire only  Translations provided by ISI (thanks to Ignacio Thayer & Kevin Knight)  Distributed to sites in early September by LDC & NIST

5  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Topic Selection Strategy  Topic selection strategy  Source/date balanced “seed” story lists  30,239 seeds generated (7.42% of corpus)  12,415 reviewed  Seeds that describe an event become candidate topics  3106 candidate topics identified  For all candidates, annotators record Title, seminal event, who/what/when/where Estimated topic size Multilingual potential  Candidate topics reviewed for suitability as final topics  Exclude same-language exact duplicates, but  No avoidance of hierarchical or overlapping topics But no extra effort to include them  Select range of topic types, sizes No avoidance of “singletons”  Also consider annotator preferences Later feeds into topic definition

6  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 2004 Evaluation Topics  250 final topics selected from candidates  Equal balance across languages English- Arabic English- Chinese English- Arabic- Chinese

7  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Topic Research ‒ Annotator spends up to 1 hour/topic web searching for information ‒ Fills in missing details ‒ Provides context, scope ‒ Annotators specialize in particular topics (of their choosing) ‒ Create topic profile that includes brief narrative plus information like ‒ timelines ‒ maps ‒ keywords ‒ named entities ‒ links to other online resources ‒ Feeds directly into later annotation queries  Completed for each evaluation topic

8  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Topic Explication  After topic research, annotator provides topic explication  Apply rule of interpretation to convert event to topic  13 rules state, for each type of seminal event, what other types of events are related  4. Natural Disasters e.g., 30002: Hurricane Mitch Seminal events include: weather events (El Nino, tornadoes, hurricanes, floods, droughts), other natural events like volcanic eruptions, wildfires, famines and the like, rescue efforts, coverage of economic or human impact of the disaster. Topic includes: the causal (weather/natural) activity including predictions thereof, the disaster itself, victims and other losses, evacuations and rescue/relief efforts.

9  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Topic Definition ‒ Fixed format to enhance consistency ‒ Seminal event ‒ who/what/when/ where ‒ Topic explication ‒ Rule of interpretation link ‒ Topic research link ‒ Seed story link ‒ Feeds directly into topic annotation  After topic research and topic explication are complete, annotator creates final topic definition

10  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Annotation Strategy  Overview  Search-guided annotation  One topic at a time  Multiple stages for each topic  Two-way topic labeling decision  Time-limited: no more than 3 hours per topic  Annotation may be incomplete for a given topic  Relevance Labels  YES: story discusses the topic in a substantial way  NO: story does not discuss the topic at all, or only mentions the topic in passing without giving any information about the topic  No BRIEF in TDT4 or TDT5  “Difficult Decision” label for tricky decisions  Completeness Judgment  Each topic also marked “complete” or “incomplete” at conclusion

11  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Annotation Search Stages  Stage 1: Initial query (60 minutes)  Submit seed story as query to search engine  Read through resulting relevance-ranked list of 200documents  Label each story as YES/NO  Stop after finding 5-10 on-topic stories, or  After reaching “off-topic threshold”  At least 2 off-topic stories for every 1 OT read AND  The last 10 consecutive stories are off-topic  Stage 2: Topic profile-based queries (45 minutes)  Issue new query drawn from text within topic research & topic definition  Read and annotate stories in resulting relevance-ranked list until reaching off-topic threshold  Stage 3: Improved query using stories from Stage 1-2 (45 minutes)  Issue new query using concatenation of all or some known OT stories  Read and annotate stories in resulting relevance-ranked list until reaching off-topic threshold  Stage 4: Creative searching (30 minutes)  Free (iterative) text query  Annotators instructed to use specialized knowledge, think creatively to find novel ways to identify additional OT stories

12  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 “Hits” by Query Type

13  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Annotation Time per Topic

14  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Annotation Time, Topic Completeness & Topic Size

15  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Additional Annotation & QC  Top-Ranked Off-Topic Stories (TROTS)  By community consensus, not provided in 2004  Precision  All on-topic (YES) stories reviewed by senior annotator to identify false alarms  All “not easy” off-topic stories reviewed  Adjudication  Review pooled site results and adjudicate cases of disagreement with LDC annotators’ judgments  Pooled 4 sites’ tracking results  Reviewed all purported LDC FAs  Reviewed portion of purported LDC Misses  Priorities 4/4 sites disagree with LDC 3/4 sites disagree with LDC Incomplete topics

16  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Topic Size and Adjudication Changes

17  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Topic Hits and Adjudication Changes

18  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Adjudication & Difficult Topics

19  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Adjudication & Difficult Topics 55125-E. Sweden rejects Euro 55106-E. Bombing in Riyadh 55200-E. Iraq Antiquities

20  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Adjudication & Difficult Topics 55125-E. Sweden rejects Euro 55106-E. Bombing in Riyadh 55200-E. Iraq Antiquities -many on-topics -overlapping -terrorism or MidEast

21  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 TDT Annotation Toolkit Choose Task Topic SelectionTopic Research, Definition Topic Labeling, Go to Next Stage Free Text QueryTopic Complete?

22

23  TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004

24

25

26

27

28

29

30

31

32

33

34

35

36


Download ppt " TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo."

Similar presentations


Ads by Google