Download presentation
Presentation is loading. Please wait.
1
Connecting the Dots Between News Article
KDD‘10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh
2
Outline Introduction Scoring a chain Formalize story coherence
Measuring influence Finding a good chain Evaluation Interaction Model
3
Introduction Users are constantly struggling to keep up with the large amounts of content that is being published every day. With this much data, it is easy to miss the big picture. Investigate methods for automatically connecting the dots.
4
Connecting the mortgage crisis to healthcare
This chain should be coherent The user should gain better understanding of the progression of the story
5
Scoring a chain
7
Formalizing story coherence
8
Formalizing story coherence
9
Formalizing story coherence
Advantage: - Positioning similar documents next to each other - Rewards long stretches of words Disadvantage: - Overlook importance of a word - Missing Words - Overlook weak links
10
Formalizing story coherence
11
Formalizing story coherence
12
Formalizing story coherence
13
Formalizing story coherence
14
Formalizing story coherence
Jitteriness: topics that appear and disappear throughout the chain - Only consider the longest continuous stretch of each word. - This way, going back-and-forth between two topics provides no utility after the first topic switch
15
Formalizing story coherence
16
Measuring influence
17
Measuring influence
18
Measuring influence
19
Measuring influence
20
Measuring influence
21
Finding a good chain
22
Finding a good chain Linear Programming - Chain Restriction
- Smoothness - Activation Restriction - Minmax Objective
23
Linear Programming
24
Linear Programming
25
Linear Programming
26
Linear Programming Minmax Objective
- Minedge is the minimum of all active edge scores
27
Evaluation More than half million real news articles were used.
Major news stories of recent years are considered. For each story, selecting an initial subset of 500 – 10,000 candidate articles, based on keyword-search Named entities and noun phrases were extracted from each article(remove infrequent name entities and non-informative noun phrase)
28
Evaluation Stories linking technique - Connecting-Dots - Shortest-path
- Google News Timeline(GNT) - Event threading(TDT)
29
Evaluation Shortest path Google news timeline GNT
constructed a graph by connecting each document with its nearest neighbor based on Cosine similarity Google news timeline GNT - Using query string to get articles - Construct query string for each story, based on s and t - Picked K equally-spaced documents between the dates of the original query article
30
Evaluation
31
Evaluation 18 users with a pair of source an target articles
Gauged users familiarity with those articles Ask whether they believe they knew a coherent story linking them together( on scale ) Ask user to indicate - Relevance - Coherence - Non-Redundancy
32
Evaluation
33
Evaluation
34
Interaction Models Refinement:
- Users might be especially interested in a specific part of the chain - A refinement may consist of adding a new article, or replacing an article
35
Interaction Models
36
Interaction Model
37
Evaluation Refinement
- Return two chains, obtained from the original chain by (1) our local search (2) adding an article chosen randomly from a subset of candidate articles - User preferred the local-search chains 72% of the time
38
Evaluation User Interests - Two chains are showed to users
1 Obtained from the other by increasing the importance of 2-3 words 2 Show them a list of ten words containing the words (1) words whose importance we increased (2) randomly chosen words asked which words they would pick in order to obtain the seconds chain from the first. The goal was to see if users can identify at least some of the words - User identified at least one word 63.3% of the time
39
Conclusion & Future Work
Describe problem of connecting the dots. Explore different desired properties of a good story, formalized it as a linear program Provided an efficient algorithm to connect two articles Allowing more complex tasks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.