Mining Advisor-Advisee Relatio nships from Research Publication Networks KDD2010 报告人：徐晓旻.

Slides:

Advertisements

Similar presentations

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

Advertisements

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Decision Tree Approach in Data Mining

Prachi Saraph, Mark Last, and Abraham Kandel. Introduction Black-Box Testing Apply an Input Observe the corresponding output Compare Observed output with.

Exact Inference in Bayes Nets

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.

The Out of Kilter Algorithm in Introduction The out of kilter algorithm is an example of a primal-dual algorithm. It works on both the primal.

Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.

Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics IEEE Trans on PAMI, VOL. 25, NO.9, 2003 Kyong Chang, Kevin W. Bowyer,

Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Bug Localization with Machine Learning Techniques Wujie Zheng

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Querying Structured Text in an XML Database By Xuemei Luo.

On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.

Advisor-advisee Relationship Mining from Research Publication Network Chi Wang 1, Jiawei Han 1, Yuntao Jia 1, Jie Tang 2, Duo Zhang 1, Yintao Yu 1, Jingyi.

Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.

Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA

Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Algorithmic Detection of Semantic Similarity WWW 2005.

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

Factor Graphs and the Sum-Product Algorithm

Weakly Learning to Match Experts in Online Community

Discriminative Frequent Pattern Analysis for Effective Classification

Example: Academic Search

GANG: Detecting Fraudulent Users in OSNs

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Presentation transcript:

Mining Advisor-Advisee Relatio nships from Research Publication Networks KDD2010 报告人：徐晓旻

INTRODUCTION  conduct a systematic investigation of the ca se of mining advisor-advisee relationships between authors in a research publication n etwork.  better understand the insight of the research co mmunity  provides additional semantic information on t he links

INTRODUCTION(cont.)  The left figure  shows the input: an temporal collaboration net work, which consists of authors, papers  The middle figure  shows the output of our analysis: an author net work with solid arrow indicating the advising r elationship  The right figure  gives an example of visualized chronological hierarchies.

PROBLEM FORMULATION  {G} = {(V = Vp ∪ Va,E)}, where Vp ={p 1,..., p np } is the set of public ations, with pi published in time ti, V a = {a 1,..., a na } is the set of authors, and E is the set of edges. Each edge e ij ∈ E associates the paper pi and t he author aj, meaning aj is one author of pi.

original network transformed  original network can be transformed into networ k containing only authors.  Let G ′ = (V ′,E ′,{py ij }e ij ∈ E ′,{pn ij }e ij ∈ E ′ ), where V ′ = {a 0,..., a na } is the set of authors (includin g a virtual node a 0 ). Each edge e ′ ij = (i, j) ∈ E c onnects authors ai and aj if they have publicati on together  two vectors associated with the edge, Pub_Ye ar_vector py ij and Pub_Num_vector pn ij.

network transformed cont.  associate with each author two vectors p y i a nd p n i to respectively represent the number of papers and the corresponding published y ear by author ai. The two vectors p y i and p n i can be derived from py ij and pn ij.

this problem is more complicated  (i) one could have multiple advisors like maste r advisors, PhD co-advisors  (ii) some mentors from industry behave simila rly as academic advisors if only judged by the collaboration history;  (iii) one’s advisor could be missing in the data set

construct subgraph H′  Formally, we denote r ij as the probability of a j being t he advisor of a i.  construct a subgraph H′< G′by removing some edges f rom G′ and make the remaining edges directed from a dvisee to potential advisor.

construct subgraph H′cont. A simple way to predict is :  to fetch top k potential advisors of a i and check whether a j i s one of them while r ij > r i0 or r ij >, where is a threshold such as 0.5. We use ) to denote this method.

4. APPROACH  The main idea is to leverage a time-constrained pr obabilistic factor graph model to decompose the jo int probability of the unknown advisor of every au thor.  By maximizing the joint probability of the factor graph we can infer the relationship and compute ra nking score for each relation edge on the candidate graph.

4.1 Assumptions and Framework

two-stage framework solution  In stage 1, we preprocess the heterogeneous collaboration netwo rk to generate the candidate graph H′. This includes the transfor mation from G to a homogeneous network G′, the construction from G′ to H′, and the estimate of the local likelihood on each ed ge of H′  In stage 2, these potential relations are further modeled with a pr obabilistic model. Local likelihood and time constraints are com bined in the global joint probability of all the hidden variables. The joint probability is maximized and the ranking score of all t he potential relations is computed together. The construction of H is finished in this stage.

4.2 Stage 1: Preprocessing

Rule to detect advisor  The Kulczynski meas ure reflects the correla tion of the two authors ’publications.  IR is used to measure the imbalance of the o ccurrence of aj given a i and the occurrence o f ai given aj

Rule to detect advisor

 When the pair of authors passes the test of selected rules from them, we construct a dir ected edge from ai to aj in H′.  we estimate the starting time and ending ti me of the advising, as well as the local likeli hood of a j being a i ’s advisor l ij  starting time st ij is estimated as the time the y started to collaborate

 the ending time ed ij can be estimated as eit her the time point when the Kulczynski mea sure starts to decrease, or the year making t he largest difference between the Kulczynsk i measure before and after it. local likelihood of aj being ai’s advisor lij

Stage 2: TPFG Model  define the TPFG model  For each node a i, there are three variables to d ecide: y i, st i, and ed i.  local feature function g(y i, st i, ed i ) joint probability of all the variables in the network

Stage 2: TPFG Model  To find the most probable values of all the hidden variables, we need to maximize the j oint probability of all of them.  It is intractable to do exhaustive search

Decomposition of variables dependency 消除变量 sti,edi 计算 j 为 i 的老师的可能性，以及必须满足的条件 ( 由指示函数 I 给出 )

Decomposition of variables dependency

该图中 f1(.) 相关的节点有 y1, 以及节点 1 所有可能的学生节点从图表中可以看出是节点 2,3

4.4 Model Learning

Sum-product Sum-Product 算法继承了消息传递机制，但通过引入 factor graph 将全局的概率密度函数分解成若干个局部概率密度函数的乘积

single- sum-product algorithm

Sum-product algorithm 考虑 g i (x i ) 正是只关于 xi 的函数，即有 g i (x i )=u x->gi ()(xi) 于是就照公式 (5) 可得 g i (x i )

single- sum-product algorithm

New TPFG Inference Algorithm  The original sum-product algorithm meet with dif ficulty since it requires that each node needs to wa it for all-but-one message to arrive. Thus in TPFG some nodes will be waiting forever due to the exis tence of cycles.  we arrange the message passing in a mode based on the strict order determined by H′. Each node ai has a descendant set Y −1 i and an ascendant set Y i.

Message Passing two-phase schema  In the first phase, messages are passed from advis ees to possible advisors, and in the second, messag es are passed back from advisors to possible advis ees.  the first phase:  The message from f i () to yi is generated and sent only when all the messages from its descendants h ave arrived. And yi immediately send it to all its as cendants f j (), j ∈ Y i.

two-phase schema cont.  the second phase:  each of which are along the reverse direction on the edge as in phase 1. 为什么有了 lij 还要计算 rij? 因为 lij 是 j 为 i 的导师的 local 支持度 rij 根据定义是全局意义上的支持度他考虑了图的其他依赖关系，考虑形式就是该传播模型

two-phase schema cont.  After the two phases of message propagatio n, we can collect the two messages on any e dge and obtain the marginal function.

simplify the message propagation  Eliminating the function nodes and the internal m essages between a function node and a variable no de  The improved message propagation is still separat ed into two Phases  the first phase, the messages senti which passe d from one to their ascendants are generated in a similar order as before.  In the second, messages returned from ascend ants recvi are stored in each node.

simplify the message propagation

5. EXPERIMENTAL RESULTS  Data Sets:DBLP Computer Science Bibliog raphy Database  test the accuracy of the discovered advisor- advisee relationships  adopt three data sets: One is manually labeled by looking into the home page of the advisors, and the other two are crawled from the Mathem atics Genealogy project1 and AI Genealogy pro ject

compare TPFG with baseline methods  Evaluation Aspects  two performance measurements: accuracy and sc alability.

5.2 Accuracy  Effect of rules in TPFG  From Figure 5(a) we can see that R2/R3 has th e highest suitability on the tested data. ROC 曲线：通过 test data 中已知的师生 pair 和算法计算出的师生 pair 的比较，将计算出的 pair 按照 rank score 从大到小排列，然后取横轴为 top a%of 计算 pair, 纵轴为 top a% 与 test data 中 pair 的交集 /test data 规模

Effect of network structure  From Figure 5(c) we see that for closures with differ ent depths,TPFG achieves better accuracy when the depth increases,  To compare it with the exact maximal joint probabili ty and other approximate algorithmJuncT and LBP

Effect of training data  Support Vector Machines(SVMs) are accurate supervised learning approaches  reduce advisor mining to a classification problem  we combined Kulczynski and IR measures wit h as features.  TPFG can achieve comparable or even better accuracy compared with a supervised method

Effect of training data

5.3 Scalability Performance

5.4 Applications  Visualization of genealogy  The visualized hierarchies of research community based on the relationship can help us gain a better insight of the community

5.4 Applications  Expert finding and Bole search  bole search, a specific expert finding task, ai ming to identify best supervisors