Learning to Rank Typed Graph Walks: Local and Global Approaches Einat Minkov and William W. Cohen Language Technologies Institute and Machine Learning Department School of Computer Science Carnegie Mellon University
Did I forget to invite anyone for this meeting?
Did I forget to invite anyone for this meeting? What is Jason’s personal email address ?
Did I forget to invite anyone for this meeting? What is Jason’s personal email address ? Who is “Mike” who is mentioned in this email?
CALO William graph proposal CMU 6/17/07 6/18/07 einat@cs.cmu.edu Has Subject Term Sent To William graph proposal CMU 6/17/07 6/18/07 einat@cs.cmu.edu
Q: “what are Jason’s email aliases?” einat@cs.cmu.edu Sent To Has term inverse “Jason” einat Sent-to Msg 18 Msg5 Msg 2 Jason Ernst Sent to Email Sent from Email Alias jernst@ cs.cmu.edu jernst@ andrew.cmu.edu Similar to
Search via lazy random graph walks An extended similarity measure via graph walks:
Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths.
Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths. Fixed probability of halting the walk at every step – i.e., shorter connecting paths have greater importance (exponential decay)
Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths. Fixed probability of halting the walk at every step – i.e., shorter connecting paths have greater importance (exponential decay) Finite graph walk, applied through sparse matrix multiplication (estimated via sampling for large graphs)
Search via lazy random graph walks An extended similarity measure via graph walks: Propagate “similarity” from start nodes through edges in the graph – accumulating evidence of similarity over multiple connecting paths. Fixed probability of halting the walk at every step – i.e., shorter connecting paths have greater importance (exponential decay) Finite graph walk, applied through sparse matrix multiplication (estimated via sampling for large graphs) The result is a list of nodes, sorted by “similarity” to an input node distribution (final node probabilities).
The graph Graph nodes are typed. Graph edges - directed and typed (adhering to the graph schema) Multiple relations may hold between two given nodes. Every edge type is assigned a fixed weight.
Returns a list of nodes (of type ) ranked by the graph walk probs. A query language: Q: { , } Graph walks Returns a list of nodes (of type ) ranked by the graph walk probs. The graph Nodes Node type Edge label Edge weight graph walk controlled by edge weights Θ , walk length K and stay probability γ The probability of reaching y from x in one step: the sum of edge weights from x to y, out of the total outgoing weight from x. The transition matrix assumes a stay probability at the current node at every time step.
Tasks Person name disambiguation [ term “andy” file msgId ] “person” Threading What are the adjacent messages in this thread? A proxi for finding generally related messages. [ file msgId ] “email-file” Alias finding [ term Jason ] What are the email-addresses of Jason ?... “email-address”
Learning to Rank Typed Graph Walks
Learning settings GRAPH WALK Task T (query class) Query a Query b … Query q + Rel. answers a + Rel. answers b + Rel. answers q GRAPH WALK node rank 1 node rank 2 node rank 3 node rank 4 … node rank 10 node rank 11 node rank 12 node rank 50 node rank 1 node rank 2 node rank 3 node rank 4 … node rank 10 node rank 11 node rank 12 node rank 50 node rank 1 node rank 2 node rank 3 node rank 4 … node rank 10 node rank 11 node rank 12 node rank 50
Learning approaches Edge weight tuning: Graph walk Weight update Theta*
Learning approaches task Edge weight tuning: Graph walk Weight update Theta* Graph walk task
Learning approaches task Edge weight tuning: Graph walk Weight update Theta* Graph walk task Node re-ordering: Graph walk Feature generation Update re-ranker Re-ranking function
Learning approaches task task Edge weight tuning: Graph walk Weight update Theta* Graph walk task Node re-ordering: Graph walk Feature generation Update re-ranker Re-ranking function Graph walk Feature generation Score by re-ranker task
Learning approaches Graph parameters’ tuning Gradient descent (Chang et-al, 2000) Can be adapted from extended PageRank settings to finite graph walks. Exhaustive local search over edge type (Nie et-al, 05) Hill climbing error backpropagation (Dilligenti et-al, IJCAI-05) Strong assumption of first-order Markov dependencies Gradient descent approximation for partial order preferences (Agarwal et-al, KDD-06) Node re-ordering A discriminative learner, using graph-paths describing features. Re-ranking (Minkov, Cohen and NG, SIGIR-06) Loses some quantitative data in feature decoding. However, can represent edge sequences.
Error Backpropagation following Dilligenti et-al, 2005 Cost function: Weight updates: Where,
Re-ranking follows closely on (Collins and Koo, Computational Linguistics, 2005) Scoring function: Adapt weights to minimize (boosted version): , where
Path describing Features X1 ‘Edge unigram’ was edge type l used in reaching x from Vq? X2 X3 ‘Edge (n-)bigram’ were edge types l1 and l2 traversed (in that order) in reaching x from Vq? X4 X5 K=0 K=1 K=2 ‘Top edge (n-)bigram’ same, where only the top k contributing paths are considered. x2 x1 x3 x4 x1 x3 x4 x2 x3 x2 x3 Paths [x3, k=2]: ‘Source count’ indicates the number of different source nodes in the set of connecting paths.
Learning to Rank Typed Graph Walks: Local vs. Global approaches
Experiments Methods: Tasks & Corpora : Gradient descent: Θ0 ΘG Reranking: R(Θ0) Combined: R(ΘG) Tasks & Corpora :
The results (MAP) Name disambiguation * * * * * Threading * * * * * * + + * * Threading * * * * * * * + + + Alias finding MAP
Name disambiguation Threading Alias finding
Our Findings Re-ranking often preferable due to ‘global’ features: Models relation sequences. e.g., threading: sent-from sent-to-inv Re-ranking rewards nodes for which the set of connecting paths is diverse. source-count feature informative for complex queries The approaches are complementary Future work: Re-ranking: large feature space. Re-ranking requires decoding at run-time. Domain specific features
Related papers Einat Minkov, William W. Cohen, Andrew Y. Ng Contextual Search and Name Disambiguation in Email using Graphs SIGIR 2006 Einat Minkov, William W. Cohen An Email and Meeting Assistant using Graph Walks CEAS 2006 Alekh Agarwal, Soumen Chakrabarti Learning Random Walks to Rank Nodes in Graphs ICML 2007 Hanghang Tong, Yehuda Koren, and Christos Faloutsos Fast Direction-Aware Proximity for Graph Mining KDD 2007
Thanks! Questions?