Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
@ Carnegie Mellon Databases User-Centric Web Crawling Sandeep Pandey & Christopher Olston Carnegie Mellon University.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Evaluating Search Engine
Search Engines and Information Retrieval
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Model Personalization (1) : Data Fusion Improve frame and answer (of persistent query) generation through Data Fusion (local fusion on personal and topical.
Web Search – Summer Term 2006 IV. Web Search - Crawling (c) Wolfgang Hürst, Albert-Ludwigs-University.
Information Retrieval in Practice
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Automatic Blog Monitoring and Summarization Ka Cheung “Richard” Sia PhD Prospectus.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Search Engines and Information Retrieval Chapter 1.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
Search and Information Extraction Lab IIIT Hyderabad.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
Summarization of XML Documents K Sarath Kumar. Outline I.Motivation II.System for XML Summarization III.Ranking Model and Summary Generation IV.Example.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Year 9 Humanities Personal Project Term 2. Contents  The task and outcome The task and outcome  The purpose The purpose  Becoming an effective learner.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Improving Recommendation Lists Through Topic Diversification CaiNicolas Ziegler, Sean M. McNee,Joseph A. Konstan, Georg Lausen WWW '05 報告人 : 謝順宏 1.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Relevance Feedback Hongning Wang
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Hierarchical Topic Detection UMass - TDT 2004 Ao Feng James Allan Center for Intelligent Information Retrieval University of Massachusetts Amherst.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Information Retrieval in Practice
Queries and Interfaces
WEB SPAM.
Evaluation of IR Systems
Ontology Evolution: A Methodological Overview
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
INF 141: Information Retrieval
Presentation transcript:

Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004

Main Contents Identify novelty of news stories given preceding news a user has read Newsjunkie: a set of algorithms for different (but related) tasks Technique: text collection comparison Tasks: –Ranking news by novelty –Personalized news updates –Characterization of relevance types of articles Evaluation or Examples

Review: Text Comparison Syntactic differences b/w Web pages – e.g :AT&T Internet Difference Engine Characteristic words – e.g: genre classification Language models for entire collections – e.g: corpus linguistics Comparing one set of documents to another – e.g: MMR (Maximum Marginal Relevance) – Newsjunkie

Research Problems Focus on temporal aspects of content difference – automatically assess the novelty over time of news articles coming from live newsfeeds. Look for documents most dissimilar from documents reviewed earlier – limitation: output entire documents rather than novel parts of multiple documents => much harder : + IE + summarization

Difference of Text Content KL divergence Density of new named entities – assumption: novelty is often conveyed by introducing new named entities ? Is normalization reasonable? What we need is new info. regardless how long the document is.

Task 1: news ranking

Evaluation 1 User evaluate on 3 distance metrics, 12 topics –KL divergence; density of NE; chronological order Each metric produced a set of 3 novel documents Users judge which set is the most novel Statistical significance tests on mean ranks –KL & NE are superior than chronological order –No significant difference b/w KL & NE ? Not consider the order of the 3 articles, while the question is ranking! ? Statistical tests only on mean, how about variance?

Task 2: personalized news update Task 2.1 single daily update – articles on the preceding day as background – user specify a novelty threshold Future work: consider more previous articles with weights decaying with age No evaluation in this part

Task 2.2: breaking news report detect new information about a story preceding articles within a sliding window as background – empirically, size of 40 articles Filtering out delayed reports and recaps – those are narrow spikes in a distance graph based on the nature of news reports – median filter filters out narrow spikes – empirically, width of filter : 5 ? parameters setting

Task 2.2: example

Task 3: relevance type of articles Four types of relevance to background – Recap: repeat old stuff, – Elaboration: add new info. – Offshoot: mainly about another topic – Irrelevant: totally different topic Identify them using intra-document dynamics

Task 3: intra-document dynamics Estimate relevance of different parts within a document Sliding window with a fixed size Compare content within the window to background Plot the distance scores Identify different patterns

What will the graph of a irrelevant article look like? -- Higher absolute scores, but small dynamic range

Contributions Novel novelty metric –density of named entities Evaluation by users Breaking news detection – novel adoption of median filter Characterization of article types – intra-story pattern novelty

Limitations Generalization of the metric on named entities: – works well on news domain, but others? User evaluation: too coarse – without considering order of articles – used old news which users had seen before the tests Claimed “personalized”, but only provided flexibility in threshold and, possibly, article relevance type selection Better if it can identify novel parts – or maybe not, keep integrity of a piece of news

Thank you!