Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Similar presentations


Presentation on theme: "Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz."— Presentation transcript:

1 Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

2 The abundance of books is a distraction ‘‘,, Lucius Annaeus Seneca 4 BC – 65 AD

3 … and it does not get any better 129,864,880 Books (Google estimate) Research: – PubMed: 19 million papers (One paper added per minute!) – Scopus: 40 million papers

4 Papers Innovative Papers

5 So, you want to understand a research topic… Now what?

6 Search Engines are Great But do not show how it all fits together

7 Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]

8 Research is not Linear

9 Metro Map A map is a set of lines of articles Each line follows a coherent narrative thread Temporal Dynamics + Structure austerity bailout junk status Germany protests strike labor unions Merkel

10 Map Definition A map M is a pair (G,  ) where – G=(V,E) is a directed graph –  is a set of paths in G (metro lines) – Each e  E must belong to at least one metro line austerity bailout junk status protests strike Germany labor unions Merkel

11 Game Plan Objective Algorithm Does it work?

12 Properties of a Good Map 1. Coherence ???

13 1 1 2 2 3 3 4 4 5 5 Greece Europe Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default Coherence is not a property of local interactions: Incoherent: Each pair shares different words

14 1 1 2 2 3 3 4 4 5 5 Greece Austerity Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default A more-coherent chain: Coherent: a small number of words captures the story

15 Words are too Simple 1 1 2 2 3 3 Probability Network Cost Sensor networks Bayesian networks Bayesian networks Social networks Social networks

16 Using the Citation Graph Create a graph per word – All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11] 3 3 6 6 7 7 4 4 9 9 2 2 8 8 1 1 5 5 Network Where did paper 8 get the idea? Do papers 8 and 9 mean the same thing?

17 Words are too Simple 1 1 2 2 3 3 Probability Network Cost Sensor networks Bayesian networks Bayesian networks Social networks Social networks Incoherent

18 Properties of a Good Map 1. Coherence Is it enough?

19 Max-coherence Map Query: Reinforcement Learning

20 Properties of a Good Map 1. Coherence 2. Coverage Should cover diverse topics important to the user

21 Coverage: What to Cover? Perhaps words? Not enough: SVM in oracle database 10g Milenova et al VLDB '05 Support Vector Machines in Relational Databases Ruping SVM '02 1 1 2 2

22 Similar Content 1 1 2 2

23 Different Impact Citing Venues and Authors: Affected more authors/ venues Very little intersection 1 1 2 2

24 What to Cover? Instead of words… Cover papers A paper covers papers that it had an impact on High-coverage map: impact on a lot of the corpus Why descendants? Soft notion: [0,1]

25 p has High Impact on q if… p q Many paths (especially short) Many paths (especially short) Note that our protocol is different from previous work… coherent Formalize with coherent random walks We use the algorithm of… r

26 Map Coverage Documents cover pieces of the corpus: Corpus Coverage

27 High-coverage, Coherent Map

28 Properties of a Good Map 1. Coherence 2. Coverage 3. Connectivity

29 Definition: Connectivity Experimented with formulations Users do not care about connection type Encourage connections between pairs of lines

30 Lines with No Intersection Perceptrons Generalized Portrait Method Kernel SVM Kernel functions Optimizing kernels Applying perceptrons to facial feature location View-based human face detection Training SVMs for face detection Face recognition by SVM Automatic extraction of face features Solution: Reward lines that had impact on each other Perceptrons SVM Optimizing Kernels for SVM Face Detection SVM for Facial Recognition

31 Tying it all Together: Map Objective Coherence – Either coherent or not: Constraint Coverage – Must have! Connectivity – Nice to have Consider all coherent maps with maximum possible coverage. Find the most connected one.

32 Game Plan Objective Algorithm Does it work?

33 Approach Overview Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity

34 Coherence Graph: Main Idea Vertices correspond to short coherent chains Directed edges between chains which can be conjoined and remain coherent 1 1 2 2 3 3 4 4 5 5 6 6 5 5 8 8 9 9 1 1 2 2 3 3 5 5 8 8 9 9

35 Finding High-Coverage Chains Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles 1 1 2 2 3 3 4 4 5 5 6 6 5 5 8 8 9 9 Cover( ) > Cover( ) ? 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 5 5 8 8 9 9

36 Reformulation Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles Submodular orienteering – [Chekuri and Pal, 2005] – Quasipolynomial time recursive greedy – O(log OPT) approximation Orienteering a function of the nodes visited

37 Approach Overview: Recap Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity Encodes all coherent chains as graph paths Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation

38 Example Map: Reinforcement Learning multi-agent cooperative joint team mdp states pomdp transition option control motor robot skills arm bandit regret dilemma exploration arm q-learning bound optimal rmax mdp

39 Example Map Detail: SVM

40 Game Plan Objective Algorithm Does it work?

41 User Study Tricky! – No double-blind, no within-subject – Domain: understandable yet unfamiliar – Reinforcement Learning (RL)

42 User Study 30 participants First-year grad student, Reinforcement Learning project Update a survey paper from 1996 Identify research directions + relevant papers – Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia

43 Results (in a nutshell) Better Google Us Google Us Map users find better papers, and cover more important areas

44 User Comments Helpful noticed directions I didn't know about great starting point … get a basic idea of what science is up to why don't you draw words on edges? Legend is confusing hard to get an idea from paper title alone

45 Conclusions Formulated metrics characterizing good maps for the scientific domain Efficient methods with theoretical guarantees User studies highlight the promise of the method Website on the way! Personalization Thank you!


Download ppt "Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz."

Similar presentations


Ads by Google