Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Introduction Distance-based Adaptable Similarity Search
Chapter 5: Introduction to Information Retrieval
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Search Engines and Information Retrieval Chapter 1.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Cache-Conscious Performance Optimization for Similarity Search Maha Alabduljalil, Xun Tang, Tao Yang Department of Computer Science University of California.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Neighborhood - based Tag Prediction
IR Theory: Evaluation Methods
John Lafferty, Chengxiang Zhai School of Computer Science
Learning to Rank with Ties
Presentation transcript:

Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang Technological University May 2,

Outline Introduction –Novel Text Mining (NTM) System –Performance Evaluation of NTM Adaptive Threshold Setting for NTM –Motivations –Our Method: Gaussian-based Adaptive Threshold Setting (GATS) –Experimental Result Conclusion 2

Overview of Novel Text Mining System 3 Categorise each incoming document or sentence into its relevant topic bin. Detect novel yet relevant documents or sentences in each topic. Prepare a clean data matrix which can be easily processed by a computer. Interact with users: input documents, output novel info, preference setting and feedback.

Vector space Given a set of relevant documents in a specific topic, e.g. “football games”, NTM retrieves the novel documents by: –Step 1: rank documents in the topic “football games” in a chronological order. –Step 2: assign a novelty score for each document by comparing the document with its history documents. –Step 3: predict the document as “novel” if its novelty score is greater than the predefined novelty threshold. Novel Text Mining Algorithm 4 D1 D3 D2 D4 I am “novel” because I am the first document I am “novel” because I am dissimilar to D1 I am “novel” because I am dissimilar with my nearest neighbor D2 D1, D2, D3, D4 … Unfortunately, I am “non-novel” because I am very similar to my nearest neighbor D3

NTM Performance Evaluation Given a set of documents D1, D2, to D10, relevant to some topic, for example, 5 D1, D2, D3, D4, D5, D6, D7, D8, D9, D10 System (S): Assessor (A): Matched (M): # Novel: 8 5 novel non-novel Precision (P) reflects how likely the system retrieved docs are truly novel. P=M/S=4/8=0.5, i.e. 50% system retrieved docs are truly novel. Recall (R) reflects how likely the truly novel docs can be retrieved by the system. R=M/A=4/5=0.8, i.e. 80% truly novel docs can be retrieved by the system. F β score: the function of P and R: 4

Threshold Setting vs. Users’ Requirements 6 I want to read the most novel information in a short time 1. I do not want to miss any novel information 2. I am not sure until I can see the documents The NTM system should define the novelty threshold based on the users’ requirements adaptively. Different users may have different performance requirements. 1.High-precision NTM systems are desired; 2.High-recall NTM systems are desired.

Why Adaptive Threshold Setting Motivations : 1.As NTM system is a real-time system, there is little or no training information in the initial stages of NTM. Therefore, the threshold cannot be predefined with confidence. 2.As NTM system is an accumulating system, more training information will be available for threshold setting, based on user’s feedback given over time. 3.Different users may have different definitions of “novelty”: –One user: a document with 50% novel info –Another user: a document with 90% novel info 7

Gaussian-based Adaptive Threshold Setting (GATS) Basic idea: GATS is a score distribution-based threshold setting method. It models the score distributions of both novel and non-novel documents (based on the user feedback); This parametric model provides the global information of data, from which we can construct an optimization criterion of desired performance to search the best threshold. 8

Novelty Score Distributions 9 Empirical probability distribution and its Gaussian probability distribution approximation for TREC 2004 Novelty Track data topic N54 Gaussian probability distribution approximation Novel Non-novel

Optimization Criterion Satisfy 2 conditions: 1.Criterion is a function of Threshold: J=f (θ) 2. Criterion is directly related to system performance: J=F β (θ)

Optimization Criterion 11 S1S1 S0S0 Novel Non-novel θ θ

Flow Chart of NTM with GATS

Experimental Data 13 Sentence-level data: TREC 2004 Novelty Track data The news providers of the document set are Xinghua English (XIE), New York Times (NYT), and Associated Press Worldstream (APW). The NIST assessors created 50 topics for this data. Each topic consists of around 25 documents. These documents were ordered chronologically and then segmented into sentences. Each sentence was given an identifier and concatenated together to form the target sentence set. In this data, the overall percentage of novel sentences is around 41.4%. The statistics of data is summarized in Table 1. #Novel#Non-novelSum Relevant3454 (41.4%) 4889 (58.6%) 8343 Table 1 Statistics of TREC 2004 Novelty Track data

Experimental Data 14 Document-level data: APWSJ APWSJ consists of news articles from Associate Press (AP) and Wall Street Journal (WSJ), which cover the same period from 1988 to 1990 [Zhang et al., 2002]. There are 50 TREC topics from Q101 to Q150 in this data and 5 topics (Q131, Q142, Q145, Q147, Q150) that lack non-novel documents are excluded from the experiments. The statistics of this data are summarized in Table 2. Table 2 Statistics of APWSJ data #Novel#Non-novelSum Relevant10,839 (91.1%) 1057 (8.9%) 11,896

Methods & Parameters Baseline: –Fixed threshold setting θ from 0.05~0.95 with an equal step Our method, GATS: –Complete feedback: with β from 0.1~0.9 with an equal step 0.1. –Partial feedback: with β from 0.1~0.9 with an equal step 0.1, percentages of feedback: 10%, 20%, 50% and 80%.

Experimental Result 16 Sentence-Level NTM on TREC 2004 Data Recall Precision

Experimental Result 17 Document-Level NTM on APWSJ Data Redundancy-Recall Redundancy-Precision

Comparison: GATS vs. Fixed Threshold For precision-recall tradeoff –Fixed threshold θ cannot reflect the tradeoff of the precision and recall directly. –GATS parameter β reflects the weights of precision and recall directly. Under various performance requirements, GATS is able to approximate the best fixed threshold. Table 3 Comparison of F β on TREC 2004 Novelty Track data

Experimental Result 19 PR curves of GATS (tuned for F β ) with different percentages of the user’s feedback. Recall Precision Sentence-Level NTM on TREC 2004 Data

Experimental Result 20 R-PR curves of GATS with different percentages of the user’s feedback. Redundancy-Recall Redundancy-Precision Document-Level NTM on APWSJ Data

Conclusion A Gaussian-based Adaptive Threshold Setting (GATS) algorithm was proposed for NTM system. GATS is a generic method, which can be tuned according to different performance requirements varying from high-precision to high-recall. By testing the proposed method on both document and sentence-level datasets, we found the experimental results showed the promising performance of GATS for a real-time NTM system. 21

22 Q & A