Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Topic Pragmatics for New Event Detection in TDT-2004

Similar presentations


Presentation on theme: "Exploiting Topic Pragmatics for New Event Detection in TDT-2004"— Presentation transcript:

1 Exploiting Topic Pragmatics for New Event Detection in TDT-2004
TDT-2004 Evaluation Workshop December 2-3, 2004 Ronald K. Braun 1107 NE 45th St., Suite 310, Seattle, WA 98105 FAX:

2 Who We Are Stottler Henke is a small business specializing in AI consulting and R&D. Seattle focus on information retrieval and text mining. Work constitutes part of a DARPA-sponsored Small Business Innovation Research (SBIR) contract (#DAAH01-03-C-R108).

3 Project Overview Leverage topic pragmatics and committee methods to increase accuracy in the new event detection task. Pragmatics: non-semantic structure arising from how a topic is reported through time. Committee methods: combining evidence from multiple perspectives (e.g., ensemble learning).

4 An Informal Experiment
Considered case by case errors made on the TDT3 corpus (topic-weighted CFSD = , pMiss = , pFA = ) . Examined 30 misses and 20 false alarms, asked what % of these are computationally tractable? 28% of misses and 35% of the false alarms in our sample had computationally visible features. With copious caveats, we estimate a CFSD limit of 0.35 exists for the TDT3 corpus under current NED evaluation conditions. Limit might be greater due to one-topic bias.

5 Error Classes Annotation effects – limited annotation time, possible keyword biases. Lack of a priori topic definitions – topic structure not computationally accessible. Lack of semantic knowledge – causality, abstraction relationships not modeled. Multiple topics within a story – at event level, single topic per story may be exceptional.

6 Error Classes (continued)
High overlap of entities due to subject marginality or class membership – “Podunk country syndrome”, topics in same topic category. Topics joined in later stages of activity – earliest event activities are ossified into shorthand tags. Sparseness of topical allusions – “a season of crashing banks, plunging rubles, bouncing paychecks, failing crops and rotating governments” == Russian economic crisis. Outlier / peripheral events – human interest stuff.

7 TDT5 Corpus Effects stories  an order of magnitude larger than TDT3 or TDT4. Reduced evaluation in part to an exercise in scalability (only 9 seconds per story for all processing). Lots of optimization. Threw out several techniques that relied on POS tagging as tagger was not sufficiently efficient.

8 TDT5 Corpus Effects (continued)
Performed worse on TDT5 relative to TDT4 for topic-weighted CFSD metric, suggesting TDT5 topic set has some different attribute.

9 TDT5 Corpus Effects (continued)
An increase in p(miss) rate was expected. Less annotation time per topic implies an increased likelihood of missed annotations. Possible conflation of stories due to ubiquitous Iraq verbiage. Stem DF (%) unit 48.52 nation 27.58 iraq 46.06 iraqi 25.02 presid 36.99 china 23.94 minist 36.40 american 23.88 u.s. 34.95 washington 20.53

10 NED Classifiers Made use of three classifiers in our official submission. Vector Cosine (Baseline) Sentence Linkage Location Association

11 Vector Cosine (Baseline)
Traditional full-text similarity. Stemmed, stopped bag-of-words feature vector. TF/IDF weighting, vector cosine distance. Non-incremental raw DF statistics, generated from all manual stories of TDT3. Corpus Story CFSD Topic CFSD TDT3 Newswire 0.6253 0.5117 TDT4 Newswire 0.5345 0.4546 TDT5 0.6177 0.7324

12 Sentence Linkage Detect linking sentences in text that refer to events described or also referenced by previous or future stories. For TDT-2003, we used a temporal reference heuristic to identify event candidates. Sentence Linkage generalizes this technique by treating every sentence (>= 15 unique features, >= one capitalized feature) as a potential event reference candidate. Candidates of new story compared to all previous stories and all future stories.

13 Sentence Linkage (continued)
If all capitalized features in candidate occur in story and >= threshold of all unique features also overlap, the stories are linked. Targets error classes: multiple topics within a story, shared event enforcement in high entity overlapping stories, linking across topic activities, and outlier / peripheral stories. Problems: contextual events, ambient events. Corpus Story CFSD Topic CFSD TDT3 Newswire 0.7586 0.8751 TDT4 Newswire 0.8579 0.8296 TDT5 0.7767 0.8658

14 Location Association Looks for pairs of strongly associated location entities and non-location words in a story. Co-occurrence frequencies are maintained for all BBN locations and non-location words in moving window (deferment window + twice that past). DF with non-location word DF without non-location word DF with location A B DF without location C D A + B > 5 A + C > 5 assoc > 0.7

15 Location Association (continued)
For all interesting pairs in a story, pair is added to feature vector and all location words and the non-location word are removed. Feature weight is non-location word’s TF/IDF weight + max TF/IDF weight of words in location. Uses Baseline TF/IDF methodology otherwise. Addresses high entity overlap stories error class. Corpus Story CFSD Topic CFSD TDT3 Newswire 0.6408 0.5165 TDT4 Newswire 0.5676 0.5270 TDT5 0.6432 0.7548

16 Evidence Combination Authority voting – a single classifier is primary; other classifiers may override with a non-novel judgment based on their expertise. Non-primary members of the committee are trained to low miss error rates. Confidence is claimant’s normalized confidence for non-FS and least normalized confidence of all classifiers for FS. Evaluation run of Baseline + Sentence Linkage. Committee Configuration TDT3 TDT4 Baseline only 0.5117 0.4546 Baseline + Sentence Linkage 0.4912 0.4858

17 Evidence Combination (continued)
Majority voting – members of the committee are each polled for a NED judgment and the majority decision is the system decision. Trained all classifiers to minimize topic-weighted CFSD over TDT3 and TDT4. Confidence is the average normalized distance between each majority classifier’s confidence value and decision threshold. Ties: maximal average normalized difference between the novel versus the non-novel voters decides the system. Used for our official submission SHAI1.

18 Evaluation Results 5 runs, three singletons to gauge individual classifier performance and two committees. NIST Code Combo Method Constituent Classifiers Story CFSD Topic CFSD SHAI1 Majority Vector Cosine Sentence Linkage Location Association 0.5672 0.7155 SHAI2 None 0.6177 0.7324 SHAI3 Authority SHAI4 0.6432 0.7548 SHAI5 0.7767 0.8658

19 Evaluation Results (continued)
Authority committee was non-useful. Explained by poor threshold on Baseline, making Baseline non-FS promiscuous. Majority committee did surprisingly well given non-optimized thresholds of classifiers. Topic-weighted performance worse than last year but story-weighted performance improved. Committee outperformed all constituent classifiers again this year. Suggests less sensitivity to initial thresholding than was expected.


Download ppt "Exploiting Topic Pragmatics for New Event Detection in TDT-2004"

Similar presentations


Ads by Google