Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
YOU CANT RECYCLE WASTED TIME Victoria Hinkson. EXPERIMENT #1 :
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.
Issues in Pre- and Post-translation Document Expansion: Untranslatable Cognates and Missegmented Words Gina-Anne Levow University of Chicago July 7, 2003.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Chinese Character Recognition for Video Presented by: Vincent Cheung Date: 25 October 1999.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
 TDT PI Meeting - November 16-17, 2000 Annotation Overview  Background  annotation strategy search-guided complete annotation work with one topic at.
Spoken Term Detection Evaluation Overview Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
Introduction to machine learning
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Information Retrieval in Practice
National Institute of Standards and Technology Information Technology Laboratory 2000 TREC-9 Spoken Document Retrieval Track
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
Topic Detection and Tracking Introduction and Overview.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Overview of the TDT 2004 Evaluation and Results Jonathan Fiscus Barbara Wheatley National Institute of Standards and Technology Gaithersburg, Maryland.
Translingual Topic Tracking with PRISE Gina-Anne Levow and Douglas W. Oard University of Maryland February 28, 2000.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Yang Hu University of Pittsburgh Department of Computer Science.
UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
1 DUTIE Speech: Determining Utility Thresholds for Information Extraction from Speech John Makhoul, Rich Schwartz, Alex Baron, Ivan Bulyko, Long Nguyen,
National Taiwan University, Taiwan
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
 TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Topics Detection and Tracking Presented by CHU Huei-Ming 2004/03/17.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Data Mining and Decision Support
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
TDT 2000 Workshop Lessons Learned These slides represent some of the ideas that were tried for TDT 2000, some conclusions that were reached about techniques.
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Lecture 16: Filtering & TDT
National 4 Course.
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Presentation transcript:

Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington

What’s New in TDT 2000 TDT3 corpus used in both 1999 and 2000 –120 topics used in the 2000 test: topics, 60 new topics Of 44K news stories, 24% were at least singly judged YES or BRIEF 1999 and 2000 topics are very different in terms of size and cross-language makeup –Annotation of new topics using search engine-guided annotation: “Use a search engine with on-topic story feedback and interactive searching techniques to limit the number stories read by annotators” Evaluation Protocol Changes –Only minor changes to the Tracking (Negative example stories) –Link Detection test set selection changed in light of last year’s experience

Search-Guided Annotation: How will it affect scores? Simulate search-guided annotation using 1999 topics, 1999 annotations and 1999 Systems Probability of a human reading a judged story Number of read stories Stability in “Region of Interest”

TDT Topic Tracking Task training data test data on-topic unknown 7 Participants: –Dragon, IBM, Texas A&M Univ., TNO, Univ. of Iowa, Univ. of Massachusetts, Univ. of Maryland System Goal: –To detect stories that discuss the target topic, in multiple source streams. Supervised Training –Given N t sample stories that discuss a given target topic Testing –Find all subsequent stories that discuss the target topic

Topic Tracking Results With Negative Without Negative Basic Condition: Newswire + BNews, reference story boundaries, English training: 1 On-topic Challenge Condition: Newswire + BNews ASR, automatic story boundaries, English training: 4 On-topic, 2 Negative training

Topic Tracking Results (Expanded Basic Condition DET Curve)

Topic Tracking Results (Expanded Challenge Condition DET Curve) With Negative Without Negative

Effect of Automatic Story Boundaries Evaluation conditioned jointly by source and Language –Newswire, Broadcast News, English and Mandarin Degradation due to story boundaries source for ASR Test Condition: NWT+Bnasr, 4 English Training Stories, Reference Boundaries IBM1 UMass1

Variability of Tracking Performance Based on Training Stories BBN ran their 1999 system on this year’s index files: –Same topics, but different training stories –One caveat: these results based on different “test epochs”, 2000 index files contain more stories There could be several reasons for the difference …needs future investigation

NIST Speech Group TDT Link Detection Task One Participant: University of Massachusetts System Goal: –To detect whether a pair of stories discuss the same topic. (Can be thought of as a “primitive operator” to build a variety of applications) ?

2000 Link Detection Results A lot was learned last year: –The test set must be properly sampled “Linked” story pairs were selected by randomly sampling all possible on-topic story pairs “Unlinked” pairs were selected using all on-topic stories as one of the pair, and a randomly chosen story was chosen as the second –This year, the task was made multilingual –More story pairs were used Link Detection Test Set Composition

Link Detection Results Required Condition –Multilingual texts –Newswire + Broadcast News ASR, –Reference story boundaries –10 file decision deferral Overall

TDT Topic Detection Task Three Participants Chinese Univ. of Hong Kong, Dragon, Univ. of Massachusetts System Goal: –To detect topics in terms of the (clusters of) stories that discuss them. “Unsupervised” topic training New topics must be detected as the incoming stories are processed. Input stories are then associated with one of the topics. a topic!

Required Condition (in yellow) –Multilingual Topic Detection –Newswire+Broadcast News ASR –Automatic Story Boundaries Performance on the 1999 and 2000 topic sets are different 2000 Topic Detection Evaluation Using English Translations for Mandarin Using Native Orthography

Effect of Topic Size on Detection Performance The 1999 topics have more on-topic stories than the 2000 topics Distribution of scores are related to topic size –Bigger topics tend to have higher scores. –Is this a behavior induced by setting a topic size parameter in training? Dragon1Results NWT+BNasr, Reference Boundary, Multilingual Texts

Fractional Components of Detection Cost Evaluations conditioned on factors (like language) are problematic Instead, compute the additive contributions to detection costs for different subsets of data. Dragon1Results NWT+BNasr, Reference Boundary, Multilingual Texts Interesting Reversal

Effects of Automatic Boundaries On Detection Performance –Multilingual Topic Detection –Newswire+Broadcast News ASR –Reference Vs. Automatic Story Boundaries 19%, 21% and 41% relative increase in cost respectively

TDT Segmentation Task Transcription: text (words) Story: Non-story: One Participant: MITRE (For TDT 2000, Story segmentation is an integral part of the other tasks, not just a separate evaluation task) System Goal: –To segment the source stream into its constituent stories, for all audio sources.

Story Segmentation Results Required Condition: –Broadcast News ASR

TDT First Story Detection (FSD) Task Two Participants: National Taiwan University and University of Massachusetts System Goal: –To detect the first story that discusses a topic, for all topics. Evaluating “part” of a Topic Detection system, (i.e., when to start a new cluster) First Stories on two topics Not First Stories = Topic 1 = Topic 2

First Story Detection Results Required Condition: –English Newswire and Broadcast News ASR transcripts –Automatic story boundaries –One file decision deferral Required Condition

1999 and 2000 Topic Set Differences in FSD Evaluation For UMass there is a slight difference, but a marked difference for the NTU system

Summary Many, many things remaining to look at –Results appear to be a function of topic size and topic set in the detection task, but it’s unclear why. The re-usability of last year’s detection system outputs enable valuable studies Conditioned detection evaluation should be replaced with a “contribution to cost” model –Performance variability on tracking training stories should be further investigated –…and the list goes on When should the annotations be released? Need to find cost effective annotation technique –Consider TREC ad-hoc style annotation via simulation