Learning to Rank Typed Graph Walks: Local and Global Approaches

Slides:



Advertisements
Similar presentations
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Advertisements

School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Hidden Markov Models Theory By Johan Walters (SR 2003)
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Review of : Yoav Freund, and Robert E
Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Machine Learning for Personal Information Management William W. Cohen Machine Learning Department and Language Technologies Institute School of Computer.
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Fine-tuning Ranking Models: a two-step optimization approach Vitor Jan 29, 2008 Text Learning Meeting - CMU With invaluable ideas from ….
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Markov Cluster (MCL) algorithm Stijn van Dongen.
Semi-Supervised Learning With Graphs William Cohen.
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
John Lafferty Andrew McCallum Fernando Pereira
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Kijung Shin Jinhong Jung Lee Sael U Kang
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
DeepWalk: Online Learning of Social Representations
End-To-End Memory Networks
New Characterizations in Turnstile Streams with Applications
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Information Retrieval in Practice
Adversarial Learning for Neural Dialogue Generation
Boosting and Additive Trees
Introductory Seminar on Research: Fall 2017
Intelligent Information System Lab
Summarizing Entities: A Survey Report
Compact Query Term Selection Using Topically Related Text
Contextual Search and Name Disambiguation in using Graphs
Large Graph Mining: Power Tools and a Practitioner’s guide
Hidden Markov Models Part 2: Algorithms
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Bring Order to The Web Ruey-Lung, Hsiao May 4 , 2000.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Ranking Users for Intelligent Message Addressing
LECTURE 15: REESTIMATION, EM AND MIXTURES
Attention.
Topological Signatures For Fast Mobility Analysis
Attention for translation
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Learning to Rank Typed Graph Walks: Local and Global Approaches Einat Minkov and William W. Cohen Language Technologies Institute and Machine Learning Department School of Computer Science Carnegie Mellon University

Did I forget to invite anyone for this meeting?

Did I forget to invite anyone for this meeting? What is Jason’s personal email address ?

Did I forget to invite anyone for this meeting? What is Jason’s personal email address ? Who is “Mike” who is mentioned in this email?

CALO William graph proposal CMU 6/17/07 6/18/07 einat@cs.cmu.edu Has Subject Term Sent To William graph proposal CMU 6/17/07 6/18/07 einat@cs.cmu.edu

Q: “what are Jason’s email aliases?” einat@cs.cmu.edu Sent To Has term inverse “Jason” einat Sent-to Msg 18 Msg5 Msg 2 Jason Ernst Sent to Email Sent from Email Alias jernst@ cs.cmu.edu jernst@ andrew.cmu.edu Similar to

Search via lazy random graph walks An extended similarity measure via graph walks:

Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths.

Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths. Fixed probability of halting the walk at every step – i.e., shorter connecting paths have greater importance (exponential decay)

Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths. Fixed probability of halting the walk at every step – i.e., shorter connecting paths have greater importance (exponential decay) Finite graph walk, applied through sparse matrix multiplication (estimated via sampling for large graphs)

Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths. Fixed probability of halting the walk at every step – i.e., shorter connecting paths have greater importance (exponential decay) Finite graph walk, applied through sparse matrix multiplication (estimated via sampling for large graphs) The result is a list of nodes, sorted by “similarity” to an input node distribution (final node probabilities).

The graph Graph nodes are typed. Graph edges - directed and typed (adhering to the graph schema) Multiple relations may hold between two given nodes. Every edge type is assigned a fixed weight.

Returns a list of nodes (of type ) ranked by the graph walk probs. A query language: Q: { , } Graph walks Returns a list of nodes (of type ) ranked by the graph walk probs. The graph Nodes Node type Edge label Edge weight graph walk controlled by edge weights Θ , walk length K and stay probability γ The probability of reaching y from x in one step: the sum of edge weights from x to y, out of the total outgoing weight from x. The transition matrix assumes a stay probability at the current node at every time step.

Tasks Person name disambiguation [ term “andy” file msgId ] “person” Threading What are the adjacent messages in this thread? A proxi for finding generally related messages. [ file msgId ] “email-file” Alias finding [ term Jason ] What are the email-addresses of Jason ?... “email-address”

Learning to Rank Typed Graph Walks

Learning settings GRAPH WALK Task T (query class) Query a Query b … Query q + Rel. answers a + Rel. answers b + Rel. answers q GRAPH WALK node rank 1 node rank 2 node rank 3 node rank 4 … node rank 10 node rank 11 node rank 12 node rank 50 node rank 1 node rank 2 node rank 3 node rank 4 … node rank 10 node rank 11 node rank 12 node rank 50 node rank 1 node rank 2 node rank 3 node rank 4 … node rank 10 node rank 11 node rank 12 node rank 50

Learning approaches Edge weight tuning: Graph walk Weight update Theta*

Learning approaches task Edge weight tuning: Graph walk Weight update Theta* Graph walk task

Learning approaches task Edge weight tuning: Graph walk Weight update Theta* Graph walk task Node re-ordering: Graph walk Feature generation Update re-ranker Re-ranking function

Learning approaches task task Edge weight tuning: Graph walk Weight update Theta* Graph walk task Node re-ordering: Graph walk Feature generation Update re-ranker Re-ranking function Graph walk Feature generation Score by re-ranker task

Learning approaches Graph parameters’ tuning Gradient descent (Chang et-al, 2000) Can be adapted from extended PageRank settings to finite graph walks. Exhaustive local search over edge type (Nie et-al, 05) Hill climbing error backpropagation (Dilligenti et-al, IJCAI-05) Strong assumption of first-order Markov dependencies Gradient descent approximation for partial order preferences (Agarwal et-al, KDD-06) Node re-ordering A discriminative learner, using graph-paths describing features. Re-ranking (Minkov, Cohen and NG, SIGIR-06) Loses some quantitative data in feature decoding. However, can represent edge sequences.

Error Backpropagation following Dilligenti et-al, 2005 Cost function: Weight updates: Where,

Re-ranking follows closely on (Collins and Koo, Computational Linguistics, 2005) Scoring function: Adapt weights to minimize (boosted version): , where

Path describing Features X1 ‘Edge unigram’ was edge type l used in reaching x from Vq? X2 X3 ‘Edge (n-)bigram’ were edge types l1 and l2 traversed (in that order) in reaching x from Vq? X4 X5 K=0 K=1 K=2 ‘Top edge (n-)bigram’ same, where only the top k contributing paths are considered. x2  x1  x3 x4  x1  x3 x4  x2  x3 x2  x3 Paths [x3, k=2]: ‘Source count’ indicates the number of different source nodes in the set of connecting paths.

Learning to Rank Typed Graph Walks: Local vs. Global approaches

Experiments Methods: Tasks & Corpora : Gradient descent: Θ0  ΘG Reranking: R(Θ0) Combined: R(ΘG) Tasks & Corpora :

The results (MAP) Name disambiguation * * * * * Threading * * * * * * + + * * Threading * * * * * * * + + + Alias finding MAP

Name disambiguation Threading Alias finding

Our Findings Re-ranking often preferable due to ‘global’ features: Models relation sequences. e.g., threading: sent-from  sent-to-inv Re-ranking rewards nodes for which the set of connecting paths is diverse. source-count feature informative for complex queries The approaches are complementary Future work: Re-ranking: large feature space. Re-ranking requires decoding at run-time. Domain specific features

Related papers Einat Minkov, William W. Cohen, Andrew Y. Ng Contextual Search and Name Disambiguation in Email using Graphs SIGIR 2006 Einat Minkov, William W. Cohen An Email and Meeting Assistant using Graph Walks CEAS 2006 Alekh Agarwal, Soumen Chakrabarti Learning Random Walks to Rank Nodes in Graphs ICML 2007 Hanghang Tong, Yehuda Koren, and Christos Faloutsos Fast Direction-Aware Proximity for Graph Mining KDD 2007

Thanks! Questions?