Download presentation
Presentation is loading. Please wait.
Published byHarvey Wilcox Modified over 9 years ago
1
Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz
2
The abundance of books is a distraction ‘‘,, Lucius Annaeus Seneca 4 BC – 65 AD
3
… and it does not get any better 129,864,880 Books (Google estimate) Research: – PubMed: 19 million papers (One paper added per minute!) – Scopus: 40 million papers
4
Papers Innovative Papers
5
So, you want to understand a research topic… Now what?
6
Search Engines are Great But do not show how it all fits together
7
Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]
8
Research is not Linear
9
Metro Map A map is a set of lines of articles Each line follows a coherent narrative thread Temporal Dynamics + Structure austerity bailout junk status Germany protests strike labor unions Merkel
10
Map Definition A map M is a pair (G, ) where – G=(V,E) is a directed graph – is a set of paths in G (metro lines) – Each e E must belong to at least one metro line austerity bailout junk status protests strike Germany labor unions Merkel
11
Game Plan Objective Algorithm Does it work?
12
Properties of a Good Map 1. Coherence ???
13
1 1 2 2 3 3 4 4 5 5 Greece Europe Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default Coherence is not a property of local interactions: Incoherent: Each pair shares different words
14
1 1 2 2 3 3 4 4 5 5 Greece Austerity Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default A more-coherent chain: Coherent: a small number of words captures the story
15
Words are too Simple 1 1 2 2 3 3 Probability Network Cost Sensor networks Bayesian networks Bayesian networks Social networks Social networks
16
Using the Citation Graph Create a graph per word – All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11] 3 3 6 6 7 7 4 4 9 9 2 2 8 8 1 1 5 5 Network Where did paper 8 get the idea? Do papers 8 and 9 mean the same thing?
17
Words are too Simple 1 1 2 2 3 3 Probability Network Cost Sensor networks Bayesian networks Bayesian networks Social networks Social networks Incoherent
18
Properties of a Good Map 1. Coherence Is it enough?
19
Max-coherence Map Query: Reinforcement Learning
20
Properties of a Good Map 1. Coherence 2. Coverage Should cover diverse topics important to the user
21
Coverage: What to Cover? Perhaps words? Not enough: SVM in oracle database 10g Milenova et al VLDB '05 Support Vector Machines in Relational Databases Ruping SVM '02 1 1 2 2
22
Similar Content 1 1 2 2
23
Different Impact Citing Venues and Authors: Affected more authors/ venues Very little intersection 1 1 2 2
24
What to Cover? Instead of words… Cover papers A paper covers papers that it had an impact on High-coverage map: impact on a lot of the corpus Why descendants? Soft notion: [0,1]
25
p has High Impact on q if… p q Many paths (especially short) Many paths (especially short) Note that our protocol is different from previous work… coherent Formalize with coherent random walks We use the algorithm of… r
26
Map Coverage Documents cover pieces of the corpus: Corpus Coverage
27
High-coverage, Coherent Map
28
Properties of a Good Map 1. Coherence 2. Coverage 3. Connectivity
29
Definition: Connectivity Experimented with formulations Users do not care about connection type Encourage connections between pairs of lines
30
Lines with No Intersection Perceptrons Generalized Portrait Method Kernel SVM Kernel functions Optimizing kernels Applying perceptrons to facial feature location View-based human face detection Training SVMs for face detection Face recognition by SVM Automatic extraction of face features Solution: Reward lines that had impact on each other Perceptrons SVM Optimizing Kernels for SVM Face Detection SVM for Facial Recognition
31
Tying it all Together: Map Objective Coherence – Either coherent or not: Constraint Coverage – Must have! Connectivity – Nice to have Consider all coherent maps with maximum possible coverage. Find the most connected one.
32
Game Plan Objective Algorithm Does it work?
33
Approach Overview Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity
34
Coherence Graph: Main Idea Vertices correspond to short coherent chains Directed edges between chains which can be conjoined and remain coherent 1 1 2 2 3 3 4 4 5 5 6 6 5 5 8 8 9 9 1 1 2 2 3 3 5 5 8 8 9 9
35
Finding High-Coverage Chains Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles 1 1 2 2 3 3 4 4 5 5 6 6 5 5 8 8 9 9 Cover( ) > Cover( ) ? 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 5 5 8 8 9 9
36
Reformulation Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles Submodular orienteering – [Chekuri and Pal, 2005] – Quasipolynomial time recursive greedy – O(log OPT) approximation Orienteering a function of the nodes visited
37
Approach Overview: Recap Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity Encodes all coherent chains as graph paths Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation
38
Example Map: Reinforcement Learning multi-agent cooperative joint team mdp states pomdp transition option control motor robot skills arm bandit regret dilemma exploration arm q-learning bound optimal rmax mdp
39
Example Map Detail: SVM
40
Game Plan Objective Algorithm Does it work?
41
User Study Tricky! – No double-blind, no within-subject – Domain: understandable yet unfamiliar – Reinforcement Learning (RL)
42
User Study 30 participants First-year grad student, Reinforcement Learning project Update a survey paper from 1996 Identify research directions + relevant papers – Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia
43
Results (in a nutshell) Better Google Us Google Us Map users find better papers, and cover more important areas
44
User Comments Helpful noticed directions I didn't know about great starting point … get a basic idea of what science is up to why don't you draw words on edges? Legend is confusing hard to get an idea from paper title alone
45
Conclusions Formulated metrics characterizing good maps for the scientific domain Efficient methods with theoretical guarantees User studies highlight the promise of the method Website on the way! Personalization Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.