December 2, 2004TDT-2004 Adaptive Topic Tracking at Maryland Tamer Elsayed, Douglas W. Oard, David Doermann University of Maryland, College Park Gary Kuhn.

Slides:



Advertisements
Similar presentations
Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Advertisements

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud
The Ohio Mental Health Consumer Outcomes System A Training for Family Members Prepared by Velma Beale, M.A. NAMI Ohio For the Ohio Department of Mental.
Imbalanced data David Kauchak CS 451 – Fall 2013.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
UMass Amherst at TDT 2003 James Allan, Alvaro Bolivar, Margie Connell, Steve Cronen-Townsend, Ao Feng, FangFang Feng, Leah Larkey, Giridhar Kumaran, Victor.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,
Benchmarking Anomaly-based Detection Systems Ashish Gupta Network Security May 2004.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
BPS - 5th Ed. Chapter 171 Inference about a Population Mean.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
© 2002 Thomson / South-Western Slide 6-1 Chapter 6 Continuous Probability Distributions.
Ch 6 Validity of Instrument
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
Topic Detection and Tracking Introduction and Overview.
TAP National Conference Washington D.C.. What did we learn?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
1 1 Slide Chapter 7 (b) – Point Estimation and Sampling Distributions Point estimation is a form of statistical inference. Point estimation is a form of.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 6-1 Business Statistics, 4e by Ken Black Chapter 6 Continuous Distributions.
Effective Lesson Planning EnhanceEdu. Agenda  Objectives  Lesson Plan  Purpose  Elements of a good lesson plan  Bloom’s Taxonomy – it’s relevance.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,
UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
November 10, 2004Dmitriy Fradkin, CIKM'041 A Design Space Approach to Analysis of Information Retrieval Adaptive Filtering Systems Dmitriy Fradkin, Paul.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Essential Statistics Chapter 161 Inference about a Population Mean.
CMU at TDT 2004 — Novelty Detection Jian Zhang and Yiming Yang Carnegie Mellon University.
Chapter 6 DECISION MAKING: THE ESSENCE OF THE MANAGER’S JOB 6.1 © 2003 Pearson Education Canada Inc.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Topics Detection and Tracking Presented by CHU Huei-Ming 2004/03/17.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
Continuous Quality Improvement (CQI). What is Continuous Quality Improvement? emphasizes organization and systems emphasizes organization and systems.
LineUp: Visual Analysis of Multi- Attribute Rankings Samuel Gratzl, Alexander Lex, Nils Gehlenborg, Hanspeter Pfister, and Marc Streit.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for.
TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.
1 Evaluation of Opinion Questions ä Session leaders: Ed Hovy, Kathy McKeown ä Topics ä Is evaluating opinion questions feasible at all? How can we construct.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Enterprise Track: Thread-based Retrieval Enterprise Track: Thread-based Retrieval Yejun Wu and Douglas W. Oard Goal Explore -- document expansion.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Getting Started with i-Ready®
Talal H. Noor, Quan Z. Sheng, Lina Yao,
Chapter 7 (b) – Point Estimation and Sampling Distributions
Tracking parameter optimization
Business Statistics, 4e by Ken Black
Exploring the Child Welfare Practice Model: Facilitated Discussion
Proposed Formative Evaluation Adaptive Topic Tracking Systems
Evidence from Behavior
Predicting User Flight Preferences
Probabilistic Latent Preference Analysis
Cumulated Gain-Based Evaluation of IR Techniques
Basic Practice of Statistics - 3rd Edition
Business Statistics, 3e by Ken Black
Chapter 6 Confidence Intervals.
Essential Statistics Inference about a Population Mean
Changing Education Paradigms
Presentation transcript:

December 2, 2004TDT-2004 Adaptive Topic Tracking at Maryland Tamer Elsayed, Douglas W. Oard, David Doermann University of Maryland, College Park Gary Kuhn National Security Agency

Outline Results System design Interpreting the results Next steps

Non-Adaptive Topic Tracking No score normalization Cost= Bottom left is better

Adaptive Topic Tracking No score normalization, unjudged treated as firmly off-topic Cost=0.2438

Adaptive Topic Tracking Cost= One-pass score normalization, unjudged treated as firmly off-topic

Non-Adaptive System Design TDT-5 Evaluation EpochTraining Epoch Compute log-odds ngram weights Compute story scores

Log-Odds Term Weights

Computing Story Scores

Non-Adaptive System Design TDT-5 Evaluation EpochTraining Epoch Compute log-odds ngram weights Compute story scores

Adaptive System Design TDT-4TDT-5 Evaluation EpochTraining Epoch Compute log-odds ngram weights Compute story scores Extended Training Epoch Normalize Story scores Compute Normalization factor

Interpreting Non-Adaptive Results Lack of normalization probably hurt! What can we say about the effect of incomplete judgments?

Interpreting Adaptive Results Normalization hurt! –One-pass design is the problem DET has limitations –Changing the threshold changes our topic model! –Threshold selection is now a critical path item How does judgment density affect the results? Not normalized Normalized

Next Steps Further explore normalization –Implement continuous renormalization –Tune parameters on devtest data Decide between TDT-5 and TDT-4 –Is incomplete judging harmful? Define richer training sets –Explicit queries –Many known on-topic/off-topic training stories –Models of (imperfect) behavioral feedback

Our Favorite Quote of the Day “It takes time to get the implementation correct” [Yiming] We had 30 days from project initiation to non-adaptive submission