Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.

Slides:



Advertisements
Similar presentations
 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic.
Advertisements

Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
UMass Amherst at TDT 2003 James Allan, Alvaro Bolivar, Margie Connell, Steve Cronen-Townsend, Ao Feng, FangFang Feng, Leah Larkey, Giridhar Kumaran, Victor.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Issues in Pre- and Post-translation Document Expansion: Untranslatable Cognates and Missegmented Words Gina-Anne Levow University of Chicago July 7, 2003.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.
Review of ICASSP 2004 Arthur Chan. Part I of This presentation (6 pages) Pointers of ICASSP 2004 (2 pages) NIST Meeting Transcription Workshop (2 pages)
A Comparison of Manual and Automatic Melody Segmentation Massimo Melucci Nicola Orio.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
 TDT PI Meeting - November 16-17, 2000 Annotation Overview  Background  annotation strategy search-guided complete annotation work with one topic at.
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
Introduction to Machine Learning Approach Lecture 5.
Spoken Term Detection Evaluation Overview Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
National Institute of Standards and Technology Information Technology Laboratory 2000 TREC-9 Spoken Document Retrieval Track
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
Topic Detection and Tracking Introduction and Overview.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Topic Detection and Tracking : Event Clustering as a Basis for First Story Detection AI-Lab Jung Sung Won.
ARDA VACE Advanced Research and Development Activity (ARDA) Video Analysis and Content Extraction (VACE)
Overview of the TDT 2004 Evaluation and Results Jonathan Fiscus Barbara Wheatley National Institute of Standards and Technology Gaithersburg, Maryland.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Translingual Topic Tracking with PRISE Gina-Anne Levow and Douglas W. Oard University of Maryland February 28, 2000.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
National Taiwan University, Taiwan
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
 TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Topics Detection and Tracking Presented by CHU Huei-Ming 2004/03/17.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
TDT 2000 Workshop Lessons Learned These slides represent some of the ideas that were tried for TDT 2000, some conclusions that were reached about techniques.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CMU TDT Report November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU.
Computational Linguistics Courses Experiment Test.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
New Event Detection at UMass Amherst Giridhar Kumaran and James Allan.
TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.
Hierarchical Topic Detection UMass - TDT 2004 Ao Feng James Allan Center for Intelligent Information Retrieval University of Massachusetts Amherst.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Lecture 16: Filtering & TDT
Exploiting Topic Pragmatics for New Event Detection in TDT-2004
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Idiap Research Institute University of Edinburgh
Presentation transcript:

Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002

Outline TDT Evaluation Overview TDT-2003 Evaluation Result Summaries New Event Detection Topic Detection Topic Tracking Link Detection Other Investigations

TDT 101 “Applications for organizing text” 5 TDT Applications Story Segmentation Topic Tracking Topic Detection New Event Detection Link Detection Terabytes of Unorganized data

TDT’s Research Domain Technology challenge Develop applications that organize and locate relevant stories from a continuous feed of news stories Research driven by evaluation tasks Composite applications built from Automatic Speech Recognition Story Segmentation Document Retrieval

Definitions An event is … A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences. A topic is … an event or activity, along with all directly related events and activities A broadcast news story is … a section of transcribed text with substantive information content and a unified topical focus

TDT-02 Evaluation Corpus TDT4 Corpus TDT4 Corpus used for last year’s evaluation October 1, 2000 to January 31, sources: 8 English, 5 Arabic, 7 Mandarin Chinese news, 7513 non-news stories 80 annotated topics 40 topics from new topics See LDC’s presentation for more details

What was new in new topics Same number of “On- Topic” stories 20, 10, 10 seed stories for Arabic, English and Mandarin respectively. Much more Arabic “On- Topic” stories Large influence on scores

Participants Carnegie Mellon Univ. (CMU) Royal Melbourne Insititute of Technology (RMIT) Stottler Henke Associates, Inc. (SHAI) Univ. Massachusetts (UMass) New Event Topic DetectionTopic Tracking Link Detection CMU22611 RMIT12 SHAI10 UMass831817

TDT Evaluation Methodology Evaluation tasks are cast as detection tasks: YES there is a target, or NO there is not Performance is measured in terms of detection cost: “a weighted sum of missed detection and false alarm probabilities” C Det = C Miss * P Miss * P target + C FA * P FA * (1- P target ) C Miss = 1 and C FA = 0.1 are preset costs P target = 0.02 is the a priori probability of a target

TDT Evaluation Methodology (cont’d) Detection Cost is normalized to generally lie between 0 and 1: (C Det ) Norm = C Det /min{C Miss *P target, C FA * (1-P target )} When based on the YES/NO decisions, it is referred to as the actual decision cost Detection Error Tradeoff (DET) curves graphically depict the performance tradeoff between P Miss and P FA Makes use of likelihood scores attached to the YES/NO decisions Minimum DET point is the best score a system could achieve with proper thresholds

TDT: Experimental Control Good research requires experimental controls Conditions that affect performance in TDT Newswire vs. Broadcast news Manual vs. automatic transcription of Broadcast News Manual vs. automatic story segmentation Mono vs. multilingual language material Topic training amounts and languages Default, automatic English translation vs. native orthography Decision deferral periods

Outline TDT Evaluation Overview TDT-02 Evaluation Result Summaries New Event Detection (NED) Topic Detection Topic Tracking Link Detection Other Investigations

New Event Detection Task System Goal: To detect each new event discussing each topic for the first time Evaluating “part” of a Topic Detection system, I.e., when to start a new cluster New Event on two topics Not First Stories of Events = Topic 1 = Topic 2

TDT-03 Primary NED Results SR=nwt+bnasr TE=eng,nat boundary DEF=10

Primary NED Results 2002 vs Topics

Topic Detection Task System Goal: To detect topics in terms of the (clusters of) stories that discuss them. “Unsupervised” topic training New topics must be detected as the incoming stories are processed Input stories are then associated with one of the topics Topic 1 Topic 2 Story Stream

TDT-03 Topic Detection Results Multilingual Sources, English Translations, Reference Boundaries, 10 File Deferral Period Newswire+BNews ASR Newswire+BNews Manual Trans Not a primary system

Topic Tracking Task System Goal: To detect stories that discuss the target topic, in multiple source streams Supervised Training  Given N t samples stories that discuss a given target topic Testing  Find all subsequent stories that discuss the target topic training data test data on-topic unknown

TDT-03 Primary TRK Results Newswire+BNews Human Trans., Multilingual sources, English Translations, Reference Boundaries, 1 Training story, 0 Negative Training Stories Newswire + BNews Human Trans., Nt=1 Nn=0 Newswire+ BNews ASR, Nt=1 Nn=0 RMIT1 UMass01CMU1

Primary Topic Tracking Results 2002 vs Topics Minimum DET Cost

Link Detection Task System Goal: To detect whether a pair of stories discuss the same topic. (Can be though of as a “primitive operator” to build a variety of applications) ?

TDT-03 Primary LNK Results Newswire+BNews ASR, Multilingual Sources, English or Native Translations, Reference Boundaries, 10 File Deferral Period

TDT-03 Primary LNK Results 2002 vs Topics Topic Weighted, Minimum DET Cost UMass01 CMU1

Outline TDT Evaluation Overview 2002 TDT Evaluation Result Summaries New Event Detection (NED) Topic Detection Topic Tracking Link Detection Other Investigations

History of performance

Evaluation Performance History Link Detection yearconditionsitescore 1999 SR=nwt+bnasr TE=eng,nat DEF=10 CMU SR=nwt+bnasr TE=eng+man,eng boundary DEF=10 UMass “ CMU SR=nwt+bnasr TE=eng+man+arb, eng boundary DEF=10 PARC SR=nwt+bnasr TE=eng+man+arb, eng boundary DEF=10 UMass * * on 2002 Topics

Evaluation Performance History Tracking yearconditionsitescore 1999 SR=nwt+bnasr TR=eng TE=eng+man,eng boundary NT=4 BBN SR=nwt+bnman TR=eng TE=eng+man,eng boundary NT=1_Nn=0 IBM “ LIMSI SR=nwt+bnman TR=eng TE=eng+man+arb, eng boundary Nt=1 Nn=0 UMass SR=nwt+bnman TR=eng TE=eng+man+arb, eng boundary Nt=1 Nn=0 UMass1.1949* * on 2002 Topics

Evaluation Performance History Topic Detection yearconditionsitescore 1999 SR=nwt+bnasr TE=eng+man,eng boundary DEF=10 IBM SR=nwt+bnasr TE=eng+man,eng noboundary DEF=10 Dragon “ TNO1 (late) SR=nwt+bnasr TE=eng+man+arb, eng boundary DEF=10 UMass “” CMU1.3035* * on 2002 Topics

Evaluation Performance History New Event Detection yearconditionsitescore 1999 SR=nwt+bnasr TE=eng,nat boundary DEF=10 UMass SR=nwt+bnasr TE=eng,nat noboundary DEF=10 UMass “ UMass SR=nwt+bnasr TE=eng,nat boundary DEF=10 CMU “” CMU1.5971* * on 2002 Topics

Summary and Issues to Discuss TDT Evaluation Overview 2003 TDT Evaluation Results 2002 vs topic sets are very different 2003 set was weighted more towards Arabic Dramatic increase in error rates with new topics; link detection, topic tracking and new event detection Need to calculate the effect of topic set on topic detection TDT 2004 Release 2003 topics and TDT4 corpus? Ensure 2004 evaluation will support Go/No Go decisions What tasks will 2004 include?