 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic.

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic Data Consortium - University of Pennsylvania {strassel, mlglenn@ldc.upenn.edu}

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Data Collection/Preparation  Collection  Multiple sources, languages  October 2000 – July 2001  TDT-4 Corpus V1.0  Arabic, Chinese, English only  October 2000 – January 2001  Collection subsampled for annotation  Goal: Reduce licensing, transcription and segmentation costs  Broadcast sources: select 4 of 7 or 3 of 5 days, stagger selection to maximize coverage by day  Newswire sources: sampling consistent with previous years No down-sampling of Arabic NW  Reference transcripts  Closed-caption text where available  Commercial transcription agencies otherwise Spell-check names for English commercial transcripts  Provide initial story boundaries & timestamps  ASR Output & Machine Translation  TDT-4 Corpus V 1.1  Incorporates patches to Mandarin ASR data to fix encoding; removes empty files

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 TDT-4 Corpus Overview

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 TDT Concepts  STORY  In TDT2, story is “a section containing at least two independent declarative clauses on same topic”  In TDT3, definition modified to capture annotators’ intuitions about what constitutes story  Distinction between “preview/teaser” and complete news story  TDT4 preserves this content-based story definition  Greater emphasis on consistent application of story definition among annotation crew  EVENT  A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences  TOPIC  An event or activity along with all directly related events and activities

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Topics for 2003  40 new topics selected, defined, annotated for 2003 evaluation  20 from Arabic seed stories  10 each from Mandarin, English  Topic selection strategy same as in 2002  Arabic topics are somewhat different  Despite same selection strategy  First time we’ve had Arabic seed stories  “Topic well” is running dry  80 news topics with high likelihood of cross- language hits from 4-month span!

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Selection Strategy  Team leaders examine randomly-selected seed story  Potential seeds balanced across corpus (source/date/lang)  Identify TDT-style seminal event within story  Apply rule of interpretation to convert event to topic  13 rules state, for each type of seminal event, what other types of events should be considered related  No requirement that selected topics have cross- language hits  But team leaders use knowledge of corpus to select stories likely to produce hits in other language sources  Handful of “easily confusable” topics

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Rules of Interpretation 1. Elections, e.g. 30030: Taipei Mayoral Elections Seminal events include: a specific political campaign, election day coverage, inauguration, voter turnouts, election results, protests, reaction. Topic includes: the entire process, from announcements of a candidate's intention to run through the campaign, nominations, election process and through the inauguration and formation of a newly-elected official's cabinet or government. 2. Scandals/Hearings, e.g. 30038: Olympic Bribery Scandal 3. Legal/Criminal Cases, e.g. 30003: Pinochet Trial 4. Natural Disasters, e.g., 30002: Hurricane Mitch 5. Accidents, e.g., 30014: Nigerian Gas Line Fire 6. Acts of Violence or War, e.g., 30034: Indonesia/East Timor Conflict 7. Science and Discovery News, e.g., 31019: AIDS Vaccine Testing Begins 8. Financial News, e.g., 30033: Euro Introduced 9. New Laws, e.g., 30009: Anti-Doping Proposals 10. Sports News, e.g., 31016: ATP Tennis Tournament 11. Political and Diplomatic Meetings, e.g., 30018: Tony Blair Visits China 12. Celebrity/Human Interest News, e.g., 31036: Joe DiMaggio Illness 13. Miscellaneous News, e.g., 31024: South Africa to Buy $5 Billion in Weapons

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Topic Research  Provides context  Annotators specialize in particular topics (of their choosing)  Includes timelines, maps, keywords, named entities, links to online resources for each topic  Feeds into annotation queries

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Topic Definition  Fixed format to enhance consistency  Seminal event lists basic facts – who/what/when/where  Topic explication spells out scope of topic and potential difficulties  Rule of interpretation link  Link to additional resources  Feeds directly into topic annotation

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Annotation Strategy  Overview  Search-guided complete annotation  Work with one topic at a time  Multiple stages for each topic; multiple iterations of each stage  Two-way topic labeling decision  Topic Labels  YES: story discusses the topic in a substantial way  NO: story does not discuss the topic at all, or only mentions the topic in passing without giving any information about the topic  No BRIEF in TDT-4  “Not Easy” label for tricky decisions  Triggers additional QC

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Annotation Search Stages  Stage 1: Initial query  Submit seed story or keywords as query to search engine  Read through resulting relevance-ranked list  Label each story as YES/NO  Stop after finding 5-10 on-topic stories, or  After reaching “off-topic threshold”  At least 2 off-topic stories for every 1 OT read AND  The last 10 consecutive stories are off-topic  Stage 2: Improved query using OT stories from Stage 1  Issue new query using concatenation of all known OT stories  Read and annotate stories in resulting relevance-ranked list until reaching off-topic threshold  Stage 3: Text-based queries  Issue new query drawn from topic research & topic definition documents plus any additional relevant text  Read and annotate stories in resulting relevance-ranked list until reaching off-topic threshold  Stage 4: Creative searching  Annotators instructed to use specialized knowledge, think creatively to find novel ways to identify additional OT stories

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Additional Annotation & QC  Top-Ranked Off-Topic Stories (TROTS)  Define search epoch  First 4 on-topic stories chronologically sorted  Find two highly-ranked off-topic documents for each topic-language  Precision  All on-topic (YES) stories reviewed by senior annotator to identify false alarms  All “not easy” off-topic stories reviewed  Adjudication  Review pooled site results and adjudicate cases of disagreement with LDC annotators’ judgments  Pooled 3 sites’ tracking results  Reviewed all purported LDC FAs  For purported LDC Misses English and Arabic: reviewed cases where all 3 sites disagreed with LDC Mandarin: reviewed cases where 2 or more sites disagreed with LDC

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic.

Similar presentations

Presentation on theme: " TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic.

Similar presentations

Presentation on theme: " TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic."— Presentation transcript:

Similar presentations

About project

Feedback