Download presentation
Presentation is loading. Please wait.
Published byBartholomew Dawson Modified over 6 years ago
1
Proposed Formative Evaluation Adaptive Topic Tracking Systems
On Evaluation of Adaptive Topic Tracking Systems Tamer Elsayed and Douglas W. Oard Computer Science Department / College of Information Studies / UMIACS, University of Maryland, College Park, USA Proposed Formative Evaluation Adaptive Topic Tracking Systems Motivation Experiments Scoring time Decision threshold score initial training Model Adaptation update news stories topic model on-topic ? off-topic 1 time MAP0 MAPi MAPr MAP Curve M0 Mi Mr Scoring Decision threshold score initial training Model Adaptation update news stories topic model on-topic ? off-topic Sad (M0) Snon-ad (M0) Snon-ad (Mi) Snon-ad (Mr) Run adaptive system Sad once, initialized by initial topic model M0 Delivering a story has 2 effects Effect of delivery on user satisfaction Effect of update on topic model quality Change threshold change decisions change topic models change scores Scores may be incomparable across topic models, even for the same topic Vector space topic tracking system Static threshold 15 TDT-2004 topics 254,000 stories, of which are on-topic 10 positive-feedback opportunities At least 10 more remain, makes MAP stable Present Utility 2 Expected Future Utility Obtain r topic models Mi , each at different time. Mi 3 Run non-adaptive system Snon-ad r times, initialized by different topic models Expected Future Utility Summative Evaluation in TDT & TREC Formative Evaluation 4 Goal: support the design process Approach: Characterize effect of topic model adaptation on expected future utility in a way comparable over time Sample topic model at predefined set of documents (ideally after each feedback in adaptive run) Measure quality of topic model over fixed document set Compute MAP, of ranked list of stories, for each non-adaptive run MAP 5 Plot MAP for each non- adaptive run to trace model quality over “time” Plot MAP Gives insights into design process by tracing system performance in a fair/comparable manner Conclusion Limitations Decreases number of possible testing topics Requires r+2 runs per system Future Work Time-based averaging Characterize false alarms DET Curve Assumes scores for all topics are comparable Appropriate for non-adaptive tracking systems Utility Curve Convolves effects of present utility, future utility, and document set evolution Threshold optimizes long-term cost Key insight: invests early in future model quality 5 to 7 positive-feedback instances are enough Useful guideline for model initialization Fewer relevant stories >15 TDT topics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.