Topic Detection and Tracking Introduction and Overview.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

1 Minimally Supervised Morphological Analysis by Multimodal Alignment David Yarowsky and Richard Wicentowski.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer.
CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Review of ICASSP 2004 Arthur Chan. Part I of This presentation (6 pages) Pointers of ICASSP 2004 (2 pages) NIST Meeting Transcription Workshop (2 pages)
A Comparison of Manual and Automatic Melody Segmentation Massimo Melucci Nicola Orio.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
December 2, 2004TDT-2004 Adaptive Topic Tracking at Maryland Tamer Elsayed, Douglas W. Oard, David Doermann University of Maryland, College Park Gary Kuhn.
 TDT PI Meeting - November 16-17, 2000 Annotation Overview  Background  annotation strategy search-guided complete annotation work with one topic at.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Introduction to Machine Learning Approach Lecture 5.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
National Institute of Standards and Technology Information Technology Laboratory 2000 TREC-9 Spoken Document Retrieval Track
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Social Network Analysis via Factor Graph Model
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Overview of the TDT 2004 Evaluation and Results Jonathan Fiscus Barbara Wheatley National Institute of Standards and Technology Gaithersburg, Maryland.
Text Classification, Active/Interactive learning.
Translingual Topic Tracking with PRISE Gina-Anne Levow and Douglas W. Oard University of Maryland February 28, 2000.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.
Classification Heejune Ahn SeoulTech Last updated May. 03.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Korea Maritime and Ocean University NLP Jung Tae LEE
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
 TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Topics Detection and Tracking Presented by CHU Huei-Ming 2004/03/17.
Data Mining and Decision Support
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
TDT 2000 Workshop Lessons Learned These slides represent some of the ideas that were tried for TDT 2000, some conclusions that were reached about techniques.
TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Machine Learning with Spark MLlib
Lecture 16: Filtering & TDT
Digital Video Library - Jacky Ma.
Tracking parameter optimization
Spoken Dialog System.
Text Categorization Berlin Chen 2003 Reference:
Main Title Here Topic 2 Topic 3 Topic 4 Topic 5
Getting Started with Microsoft Azure Machine Learning
Presentation transcript:

Topic Detection and Tracking Introduction and Overview

TDT Task Overview * 5 R&D Challenges: –Story Segmentation –Topic Tracking –Topic Detection –First-Story Detection –Link Detection TDT3 Corpus Characteristics: † –Two Types of Sources: Text Speech –Two Languages: English30,000 stories Mandarin10,000 stories –11 Different Sources: _8 English__ 3 Mandarin ABCCNN VOA PRIVOA XIN NBCMNB ZBN APWNYT * * see for details † see for details

TDT – Similarities with TREC The TDT Tasks: –Story Segmentation –Topic Detection –First-Story Detection –Link Detection No equivalent task in TREC –Topic Tracking Similar to Ad hoc and Filtering, but… – Query versus Topic – Multiple streams – Multiple languages – Limited look-ahead – No supervised update

Preliminaries topic A topic is … event a seminal event or activity, along with all directly related events and activities. story A story is … a topically cohesive segment of news that includes two or more DECLARATIVE independent clauses about a single event.

Example Topic Title: Mountain Hikers Lost – WHAT: 35 or 40 young Mountain Hikers were lost in an avalanche in France around the 20th of January. – WHERE: Orres, France – WHEN: January 1998 – RULES OF INTERPRETATION: 5. Accidents

Topic Contrast – Year 2000 vs 1999

(for Radio and TV only) Transcription: text (words) Story: Non-story: The Segmentation Task: To segment the source stream into its constituent stories, for all audio sources.

The Topic Tracking Task: To detect stories that discuss the target topic, in multiple source streams. Find all the stories that discuss a given target topic –Training: Given a stream of sample stories, with N t of them known to discuss a given target topic, –Test: Find all subsequent stories that discuss the target topic. on-topic unknown training data test data Untagged stories are not guaranteed to be off-topic Stories tagged as being “on topic”

Two Primary Test Conditions for the Topic Tracking Task: One on-topic training story Four on-topic training stories –And two off-topic stories that are algorithmically very close to the topic but that are certified as being “off-topic”. ?

The Topic Detection Task: To detect topics in terms of the (clusters of) stories that discuss them. –Unsupervised topic training A meta-definition of topic is required  independent of topic specifics. –New topics must be detected as the incoming stories are processed. –Input stories are then associated with one of the topics. a topic!

First-Story Detection is essentially the same as Topic Detection. The difference is only in what the system outputs. Time First Stories Not First Stories = Topic 1 = Topic 2 The First-Story Detection Task: To detect the first story that discusses a topic, for all topics.

The Link Detection Task To detect whether a pair of stories discuss the same topic. The topic discussed is a free variable. Topic definition and annotation is unnecessary. The link detection task represents a basic functionality, needed to support all applications (including the TDT applications of topic detection and tracking). The link detection task is related to the topic tracking task, with Nt = 1. same topic?

TDT3 Evaluation Methodology All TDT3 tasks are cast as statistical detection (yes-no) tasks. –Story Segmentation: Is there a story boundary here? –Topic Tracking: Is this story on the given topic? –Topic Detection: Is this story in the correct topic-clustered set? –First-story Detection: Is this the first story on a topic? –Link Detection: Do these two stories discuss the same topic? Performance is measured in terms of detection cost, which is a weighted sum of miss and false alarm probabilities: C Det = C Miss P Miss P target + C FA P FA (1- P target ) Detection Cost is normalized to lie between 0 and 1: (C Det ) Norm = C Det / min{C Miss P target, C FA (1- P target )}