Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Slides:



Advertisements
Similar presentations
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Mauro Sozio and Aristides Gionis Presented By:
Turning Down the Noise in the Blogosphere Khalid El-Arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin.
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Maximizing the Spread of Influence through a Social Network
Beyond Keyword Search: Discovering Relevant Scientific Literature Khalid El-Arini and Carlos Guestrin August 22, 2011 TexPoint fonts used in EMF. Read.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Efficient Informative Sensing using Multiple Robots
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
1 Greedy Algorithms. 2 2 A short list of categories Algorithm types we will consider include: Simple recursive algorithms Backtracking algorithms Divide.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer June
Active Learning with Support Vector Machines
Distributed Q Learning Lars Blackmore and Steve Block.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
1 Random Walks in WSN 1.Efficient and Robust Query Processing in Dynamic Environments using Random Walk Techniques, Chen Avin, Carlos Brito, IPSN 2004.
CS 188: Artificial Intelligence Fall 2009 Lecture 12: Reinforcement Learning II 10/6/2009 Dan Klein – UC Berkeley Many slides over the course adapted from.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Models of Influence in Online Social Networks
The Aha! Moment: From Data to Insight
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Welcome to Scopus Training by : Arash Nikyar June 2014
Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Marina Drosou, Evaggelia Pitoura Computer Science Department
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Post-Ranking query suggestion by diversifying search Chao Wang.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
Reinforcement Learning
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Brief Intro to Machine Learning CS539
Heuristic & Approximation
MEIKE: Influence-based Communities in Networks
Online Multiscale Dynamic Topic Models
Near-optimal Observation Selection using Submodular Functions
RE-Tree: An Efficient Index Structure for Regular Expressions
Reinforcement Learning
"Playing Atari with deep reinforcement learning."
Finding Story Chains in Newswire Articles
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Coverage Approximation Algorithms
Cost-effective Outbreak Detection in Networks
CS 188: Artificial Intelligence Fall 2008
Reinforcement Learning (2)
Connecting the Dots Between News Article
Reinforcement Learning (2)
Presentation transcript:

Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

The abundance of books is a distraction ‘‘,, Lucius Annaeus Seneca 4 BC – 65 AD

… and it does not get any better 129,864,880 Books (Google estimate) Research: – PubMed: 19 million papers (One paper added per minute!) – Scopus: 40 million papers

Papers Innovative Papers

So, you want to understand a research topic… Now what?

Search Engines are Great But do not show how it all fits together

Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]

Research is not Linear

Metro Map A map is a set of lines of articles Each line follows a coherent narrative thread Temporal Dynamics + Structure austerity bailout junk status Germany protests strike labor unions Merkel

Map Definition A map M is a pair (G,  ) where – G=(V,E) is a directed graph –  is a set of paths in G (metro lines) – Each e  E must belong to at least one metro line austerity bailout junk status protests strike Germany labor unions Merkel

Game Plan Objective Algorithm Does it work?

Properties of a Good Map 1. Coherence ???

Greece Europe Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default Coherence is not a property of local interactions: Incoherent: Each pair shares different words

Greece Austerity Italy Republican Protest Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Debt default A more-coherent chain: Coherent: a small number of words captures the story

Words are too Simple Probability Network Cost Sensor networks Bayesian networks Bayesian networks Social networks Social networks

Using the Citation Graph Create a graph per word – All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11] Network Where did paper 8 get the idea? Do papers 8 and 9 mean the same thing?

Words are too Simple Probability Network Cost Sensor networks Bayesian networks Bayesian networks Social networks Social networks Incoherent

Properties of a Good Map 1. Coherence Is it enough?

Max-coherence Map Query: Reinforcement Learning

Properties of a Good Map 1. Coherence 2. Coverage Should cover diverse topics important to the user

Coverage: What to Cover? Perhaps words? Not enough: SVM in oracle database 10g Milenova et al VLDB '05 Support Vector Machines in Relational Databases Ruping SVM '

Similar Content

Different Impact Citing Venues and Authors: Affected more authors/ venues Very little intersection

What to Cover? Instead of words… Cover papers A paper covers papers that it had an impact on High-coverage map: impact on a lot of the corpus Why descendants? Soft notion: [0,1]

p has High Impact on q if… p q Many paths (especially short) Many paths (especially short) Note that our protocol is different from previous work… coherent Formalize with coherent random walks We use the algorithm of… r

Map Coverage Documents cover pieces of the corpus: Corpus Coverage

High-coverage, Coherent Map

Properties of a Good Map 1. Coherence 2. Coverage 3. Connectivity

Definition: Connectivity Experimented with formulations Users do not care about connection type Encourage connections between pairs of lines

Lines with No Intersection Perceptrons Generalized Portrait Method Kernel SVM Kernel functions Optimizing kernels Applying perceptrons to facial feature location View-based human face detection Training SVMs for face detection Face recognition by SVM Automatic extraction of face features Solution: Reward lines that had impact on each other Perceptrons SVM Optimizing Kernels for SVM Face Detection SVM for Facial Recognition

Tying it all Together: Map Objective Coherence – Either coherent or not: Constraint Coverage – Must have! Connectivity – Nice to have Consider all coherent maps with maximum possible coverage. Find the most connected one.

Game Plan Objective Algorithm Does it work?

Approach Overview Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity

Coherence Graph: Main Idea Vertices correspond to short coherent chains Directed edges between chains which can be conjoined and remain coherent

Finding High-Coverage Chains Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles Cover( ) > Cover( ) ?

Reformulation Paths correspond to coherent chains. Problem: find a path of length K maximizing coverage of underlying articles Submodular orienteering – [Chekuri and Pal, 2005] – Quasipolynomial time recursive greedy – O(log OPT) approximation Orienteering a function of the nodes visited

Approach Overview: Recap Documents D … 1. Coherence graph G 2. Coverage function f f( ) = ? 3. Increase Connectivity Encodes all coherent chains as graph paths Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation

Example Map: Reinforcement Learning multi-agent cooperative joint team mdp states pomdp transition option control motor robot skills arm bandit regret dilemma exploration arm q-learning bound optimal rmax mdp

Example Map Detail: SVM

Game Plan Objective Algorithm Does it work?

User Study Tricky! – No double-blind, no within-subject – Domain: understandable yet unfamiliar – Reinforcement Learning (RL)

User Study 30 participants First-year grad student, Reinforcement Learning project Update a survey paper from 1996 Identify research directions + relevant papers – Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia

Results (in a nutshell) Better Google Us Google Us Map users find better papers, and cover more important areas

User Comments Helpful noticed directions I didn't know about great starting point … get a basic idea of what science is up to why don't you draw words on edges? Legend is confusing hard to get an idea from paper title alone

Conclusions Formulated metrics characterizing good maps for the scientific domain Efficient methods with theoretical guarantees User studies highlight the promise of the method Website on the way! Personalization Thank you!