Presentation is loading. Please wait.

Presentation is loading. Please wait.

TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001.

Similar presentations


Presentation on theme: "TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001."— Presentation transcript:

1 TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001

2 Corpus  TDT-4 English and Mandarin required Arabic optional, but encouraged Provided in TREC-friendly format Made substantially cheaper  Topics 60 new topics, as before “Brief” redefined to be “more useful”  Stand-off notation standard So sites can provide useful annotations

3 Tasks  Drop tracking Transition to TREC filtering Entry task if TREC does not pick it up  Keep segmentation (sunset?)  Focus on FSD and SLD  Exploratory evaluations on clustering  New event-based evaluation

4 Changes to “brief”  Currently “brief” is “less than 10% is on topic” LDC does this strictly So 10.5% is “YES”!  Prefer notion of whether topic is central to the story or not If central topic, then YES If mentioned in passing, then BRIEF Requires a SHARED-CENTRAL possibility?  This idea requires rethinking

5 Standoff anotation  TDT-2 and TDT-3 distributed with: ASR into text SYSTRAN into English Named entity tagging  Standardize a means for sites to provide other annotations: POS or parsings Co-references for named entities Time expressions with normalization Alternate translations Subject-like headings à la BBN’s tags …

6 Clustering evaluations  Few people happy with clustering measures  Many people unhappy with central idea of clustering Partitioning of corpus (single topic stories) No hierarchies permitted in results  Allow exploration of new models for clustering Perhaps inspired by IFE Bio and IFE Arabic?  Both systems have UMass detection running  Or new problems based on clustering Linking clusters, describing their substructures, …

7 Event-based evaluation  Most (not all) TDT approaches would work just as well for IR filtering or event IR document retrieval  Force exploration of TDT-specific needs  Topic is made of events Seminal events and inevitable ones  Event is something that happens somewhere at a particular time Who, where, when, what  Explicitly capture components of events?

8 Event-based straw man  Based on link detection  Given two stories: Is the perpetrator (“who”) the same? Do they describe events that take place at the same location? …at the same time?  Idea: If two stories talk about events at the same time, they’re more likely to be talking about the same event (obviously more than time needed)


Download ppt "TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001."

Similar presentations


Ads by Google