Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz.

Slides:



Advertisements
Similar presentations
Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1.
Greedy Algorithms.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Turning Down the Noise in the Blogosphere Khalid El-Arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin.
Guest lecture II: Amos Fiat’s Social Networks class Edith Cohen TAU, December 2014.
Beyond Keyword Search: Discovering Relevant Scientific Literature Khalid El-Arini and Carlos Guestrin August 22, 2011 TexPoint fonts used in EMF. Read.
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Efficient Informative Sensing using Multiple Robots
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Non-myopic Informative Path Planning in Spatio-Temporal Models Alexandra Meliou Andreas Krause Carlos Guestrin Joe Hellerstein.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Two Discrete Optimization Problems Problem #2: The Minimum Cost Spanning Tree Problem.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
The Aha! Moment: From Data to Insight
Great Visual Communication The Power of Imagery and Multiple Intelligences.
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Algorithms  Al-Khwarizmi, arab mathematician, 8 th century  Wrote a book: al-kitab… from which the word Algebra comes  Oldest algorithm: Euclidian algorithm.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Graphic Organizers. Free Template from 2 Index of workshop Graphic Organizers workshop.
StoryFlow: Tracking the Evolution of Stories IEEE INFOVIS 2013 Shixia Liu, Senior Member, IEEE, Microsoft Research Asia Yingcai Wu, Member, IEEE, Microsoft.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Manuel Gomez Rodriguez Bernhard Schölkopf I NFLUENCE M AXIMIZATION IN C ONTINUOUS T IME D IFFUSION N ETWORKS , ICML ‘12.
Models of Greedy Algorithms for Graph Problems Sashka Davis, UCSD Russell Impagliazzo, UCSD SIAM SODA 2004.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Clustering (Search Engine Results) CSE 454. © Etzioni & Weld To Do Lecture is short Add k-means Details of ST construction.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
25 January 2016 SUMMARY WRITING Sokolova Elvira Yakovlevna.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Visualization in Process Mining
Cohesive Subgraph Computation over Large Graphs
Heuristic & Approximation
Near-optimal Observation Selection using Submodular Functions
Representing Documents Through Their Readers
Summarizing Entities: A Survey Report
Finding Story Chains in Newswire Articles
Data Integration with Dependent Sources
Coverage Approximation Algorithms
Feature Selection for Ranking
Computational Advertising and
Major Design Strategies
Variable Elimination Graphical Models – Carlos Guestrin
Data Mining CSCI 307, Spring 2019 Lecture 21
Major Design Strategies
Introduction Dataset search
Connecting the Dots Between News Article
Presentation transcript:

Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

The abundance of books is a distraction ‘‘,, Lucius Annaeus Seneca 4 BC – 65 AD

So, you want to understand a complex topic… Now what?

Search Engines are Great But do not show how it all fits together

Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]

Real Stories are not Linear

Metro Map A set of lines Each line follows a coherent narrative thread Structure + multiple aspects austerity bailout junk status Germany protests strike labor unions Merkel

Map Definition A map M is a pair (G,  ) where – G=(V,E) is a directed graph –  is a set of paths in G (metro lines) – Each e  E must belong to at least one metro line austerity bailout junk status protests strike Germany labor unions Merkel

Game Plan

Properties of a Good Map 1. Coherence ???

Greece Europe Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default Coherence is not a property of local interactions: Incoherent: Each pair shares different words

Greece Austerity Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default A more-coherent chain: Coherent: a small number of words captures the story

Properties of a Good Map 1. Coherence Is it enough?

Max-coherence Map Query: Clinton Clinton visits Belfast Clinton set for Dublin High hopes for Clinton visit Clinton, Religious Leaders Share Thoughts Church Leaders Praise Clinton's 'Spirituality' Religion Leaders Divided on Clinton Moral Issue Clinton Should Resign, 2 Religious Leaders Say

Properties of a Good Map 1. Coherence 2. Coverage Should cover diverse topics important to the user

Coverage Select a small set of diverse articles that covers the most important stories January 17, 2009 Turning Down the Noise [El-Arini, Veda, S, Guestrin, KDD’09]

Coverage: The Idea Documents cover concepts: Corpus Coverage

High-coverage, Coherent Map Greek Civil Servants Strike over Austerity Measures Greece Paralyzed by New Strike Greek Take to the Streets, but Lacing Earlier Zeal Infighting Adds to Merkel’s Woes It’s Germany that Matters UK Backs Germany’s Effort Germany says the IMF should Rescue Greece IMF more Likely to Lead Efforts IMF is Urged to Move Forward

Properties of a Good Map 1. Coherence 2. Coverage 3. Connectivity

Definition: Connectivity Experimented with formulations Users do not care about connection type Encourage connections between pairs of lines

Tying it all Together: Map Objective Coherence – Either coherent or not: Constraint Coverage – Must have! Connectivity – Nice to have Consider all coherent maps with maximum possible coverage. Find the most connected one.

Game Plan

Approach Overview Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity

Coherence Graph: Main Idea Vertices correspond to short coherent chains Directed edges between chains which can be conjoined and remain coherent

Finding Vertices Vertices are short, coherent chains Can use [KDD’10] – Expensive – Solving many LPs Take advantage of simplicity of short stories – No topic drift – Sampling-based (fast) algorithm

Finding Edges Problem: Combining several strong chains may result in a much-weaker chain Discontinuity: Change of focus Discontinuity: Change of focus

A chain is m-coherent if each sub-chain (di, …, di+m) is coherent. m-Coherence Control discontinuity points: m: size of user's ‘history window‘ – m=length(chain) : standard coherence – m=1: optimize transitions without context

Observation If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent:

Using the Observation If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent: Useful for divide and conquer: – Add edge if m-1 overlap

Approach Overview Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity

Finding High-Coverage Chains Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles Cover( ) > Cover( ) ?

Reformulation Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles Submodular orienteering – [Chekuri and Pal, 2005] – Quasipolynomial time recursive greedy – O(log OPT) approximation Orienteering a function of the nodes visited

Approach Overview: Recap Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity Encodes all m-coherent chains as graph paths Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation

Example Map: Greece Debt

Game Plan

Evaluation User study – Document selection: capturing important content? – Micro-knowledge: question-answering – Macro-knowledge: high-level summaries – Effect of structure New York Times ( ) – 18K+ articles – Chile, Haiti, Greece

Document Selection Experts compose a list of important events Subtopic recall (% of events in the map): # lines Subtopic recall

Micro-Knowledge (Question Answering) Mechanical Turk Competitors: – Google News – Event threading (TDT) [Nallapati et al, 04] – Structureless maps Results: minor gains – map structure helps Question 2: How many miners were trapped?

Macro-Knowledge (High-Level Summaries) Summarize complex story in a paragraph – Maps vs. Google News – ~15 paragraphs per task Mturk to evaluate paragraphs: – Which paragraph provided a more complete and coherent picture of the story? – Justification: Paragraph A is more… – ~300 evaluations per task

Macro-Knowledge: Results Greece: 72% prefer maps – Justifications: Haiti: 59% prefer maps – Map users mostly summarized one story line MapsGoogle News Bottom line: maps are more useful as high-level tools for stories without a single dominant storyline

Conclusions Formulated metrics characterizing good maps Efficient methods with theoretical guarantees User studies highlight the promise of the method Website on the way! Personalization Thank you!

Finding Coherent Chains Goal: represent all coherent chains Problem: intractable Divide and conquer: – Find short coherent chains – Concatenate to form longer coherent chains

Website