Connecting the Dots Between News Article

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
The user entered the query “What is the historical relation between Greek and Roma”. Here are the query’s results. The user clicked the topic “Roman copies.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.
Rui Yan, Yan Zhang Peking University
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Date: 2012/3/5 Source: Marcus Fontouraet. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1 Efficiently encoding term co-occurrences in inverted.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Post-Ranking query suggestion by diversifying search Chao Wang.
Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
Customized of Social Media Contents using Focused Topic Hierarchy
Queensland University of Technology
Data Mining: Concepts and Techniques
An Efficient Algorithm for Incremental Update of Concept space
Automatic Video Shot Detection from MPEG Bit Stream
Methods and Apparatus for Ranking Web Page Search Results
Graphs.
Compact Query Term Selection Using Topically Related Text
Finding Story Chains in Newswire Articles
Identifying Decision Makers from Professional Social Networks
Searching with context
Struggling and Success in Web Search
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Navigation-Aided Retrieval
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
How to create the digital identity of an E-enterprise
Information Retrieval and Web Design
Heterogeneous Graph Attention Network
Presentation transcript:

Connecting the Dots Between News Article KDD‘10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh

Outline Introduction Scoring a chain Formalize story coherence Measuring influence Finding a good chain Evaluation Interaction Model

Introduction Users are constantly struggling to keep up with the large amounts of content that is being published every day. With this much data, it is easy to miss the big picture. Investigate methods for automatically connecting the dots.

Connecting the mortgage crisis to healthcare This chain should be coherent The user should gain better understanding of the progression of the story

Scoring a chain  

Formalizing story coherence  

Formalizing story coherence  

Formalizing story coherence Advantage: - Positioning similar documents next to each other - Rewards long stretches of words Disadvantage: - Overlook importance of a word - Missing Words - Overlook weak links

Formalizing story coherence  

Formalizing story coherence  

Formalizing story coherence  

Formalizing story coherence  

Formalizing story coherence Jitteriness: topics that appear and disappear throughout the chain - Only consider the longest continuous stretch of each word. - This way, going back-and-forth between two topics provides no utility after the first topic switch

Formalizing story coherence  

Measuring influence  

Measuring influence  

Measuring influence

Measuring influence  

Measuring influence  

Finding a good chain  

Finding a good chain Linear Programming - Chain Restriction - Smoothness - Activation Restriction - Minmax Objective

Linear Programming  

Linear Programming  

Linear Programming  

Linear Programming Minmax Objective - Minedge is the minimum of all active edge scores

Evaluation More than half million real news articles were used. Major news stories of recent years are considered. For each story, selecting an initial subset of 500 – 10,000 candidate articles, based on keyword-search Named entities and noun phrases were extracted from each article(remove infrequent name entities and non-informative noun phrase)

Evaluation Stories linking technique - Connecting-Dots - Shortest-path - Google News Timeline(GNT) - Event threading(TDT)

Evaluation Shortest path Google news timeline GNT constructed a graph by connecting each document with its nearest neighbor based on Cosine similarity Google news timeline GNT - Using query string to get articles - Construct query string for each story, based on s and t - Picked K equally-spaced documents between the dates of the original query article

Evaluation  

Evaluation 18 users with a pair of source an target articles Gauged users familiarity with those articles Ask whether they believe they knew a coherent story linking them together( on scale 1 - 5 ) Ask user to indicate - Relevance - Coherence - Non-Redundancy

Evaluation

Evaluation

Interaction Models Refinement: - Users might be especially interested in a specific part of the chain - A refinement may consist of adding a new article, or replacing an article

Interaction Models  

Interaction Model

Evaluation Refinement - Return two chains, obtained from the original chain by (1) our local search (2) adding an article chosen randomly from a subset of candidate articles - User preferred the local-search chains 72% of the time

Evaluation User Interests - Two chains are showed to users 1 Obtained from the other by increasing the importance of 2-3 words 2 Show them a list of ten words containing the words (1) words whose importance we increased (2) randomly chosen words asked which words they would pick in order to obtain the seconds chain from the first. The goal was to see if users can identify at least some of the words - User identified at least one word 63.3% of the time

Conclusion & Future Work Describe problem of connecting the dots. Explore different desired properties of a good story, formalized it as a linear program Provided an efficient algorithm to connect two articles Allowing more complex tasks