Discovering Meta-Paths in Large Heterogeneous Information Network

Slides:

Advertisements

Similar presentations

ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.

Advertisements

CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.

gSpan: Graph-based substructure pattern mining

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

ON LINK-BASED SIMILARITY JOIN A joint work with: Liwen Sun, Xiang Li, David Cheung (University of Hong Kong) Jiawei Han (University of Illinois Urbana.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Presented by Zeehasham Rasheed

Recommender systems Ram Akella November 26 th 2008.

Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.

Honglei Zhuang1, Jing Zhang2, George Brova1,

 Focused Clustering and Outlier Detection in Large Attributed Graphs Author : Bryan Perozzi, Leman Akoglu, Patricia lglesias Sánchez, Emmanuel Müller.

Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Classical Music for Rock Fans?: Novel Recommendations for Expanding User Interests Makoto Nakatsuji, Yasuhiro Fujiwara, Akimichi Tanaka, Toshio Uchiyama,

2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

A Graph-based Friend Recommendation System Using Genetic Algorithm

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.

On Node Classification in Dynamic Content-based Networks.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.

Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.

Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying, Eric Hsueh-Chan Lu, Wen-Ning Kuo and Vincent S. Tseng Institute of Computer Science.

Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.

Exploiting Relevance Feedback in Knowledge Graph Search

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.

ClusCite:Effective Citation Recommendation by Information Network-Based Clustering Date: 2014/10/16 Author: Xiang Ren, Jialu Liu,Xiao Yu, Urvashi Khandelwal,

Discovering Meta-Paths in Large Heterogeneous Information Network Changping Meng (Purdue University) Reynold Cheng (University of Hong Kong) Silviu Maniu.

Semi-Supervised Clustering

Word AdHoc Network: Using Google Core Distance to extract the most relevant information Presenter : Wei-Hao Huang 　 Authors : Ping-I Chen, Shi-Jen.

Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.

CS7280: Special Topics in Data Mining Information/Social Networks

RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,

Community Distribution Outliers in Heterogeneous Information Networks

Relevance Search in Heterogeneous Networks

KDD Reviews 周天烁 2018年5月9日.

Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16

Graph Clustering Based on Structural/Attribute Similarities

Asymmetric Transitivity Preserving Graph Embedding

Heterogeneous Graph Attention Network

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.

Modeling Topic Diffusion in Scientific Collaboration Networks

Presentation transcript:

Discovering Meta-Paths in Large Heterogeneous Information Network Changping Meng (Purdue University) Reynold Cheng (University of Hong Kong) Silviu Maniu (Noah's Ark Lab, Huawei Technologies) Pierre Senellart (Télécom ParisTech) Wangda Zhang (University of Hong Kong)

Introduction Heterogeneous Information Network (HIN) modeled in a directed graph citizenOf Class hierarchy Node numbers hasChild

Introduction Meta path [Han VLDB’11] A sequence of node class sets connected by edge types Benefits of Meta Paths Multi-hop relationships instead of direct links Combine multiple relationships

Introduction Similarity score for a node pair following a single meta- path Path Count(PC) [Han ASONAM’11] Number of the paths following a given meta-path Path Constrained Random Walk(PCRW) [Cohen KDD’11] Transition probability of a random walk following a given meta-path We propose a general form --Biased Path Constrained Random Walk(BPCRW) Similarity score for a node pair following a combination of multiple meta-paths Aggregate Function F to combine the similarity scores for each single meta path Highlight our propose Use a same example

Introduction Applications 1. Query by example When user inputs example pairs of similar objects, we could model the user’s preference and find more pairs. Input:< Barack Obama, Michelle Obama> Output: < George Bush, Laura Bush> < Bill Clinton, Hillary Clinton>

Introduction Applications 2. Link prediction Coauthor prediction (Authors from DB and AI to collaborate) Friendship prediction (Online Social Network)

Existing work Designed by experts [Han VLDB’11] Complex to analyze big data Do not consider user-preference Enumeration within a given length [Cohen ECML’11] Max length L is large, redundant (Curse of dimension) In Yago, L=3, 135 meta-paths . L=4, 2000 meta-paths L is small, miss some important ones

Problem Definition Meta Paths Generation Contributions Meta Path Example node pairs Meta Path Generation (B. Obama, M. Obama) (B. Clinton, H. Clinton) Knowledge Base Similarity Function (Yago) (Linear Function) Fluent flow Contributions Consider the user’s preference: let user input example pairs Automatically generate meta paths without a max length: heuristic search instead of enumeration

Meta-path Generation Two Phase Method Challenge: Each node has many class labels. The number of candidate meta paths grows exponentially with the length. Meta-paths Example node pairs Link Type Node Type Knowledge Base Similarity Function Challenge instead of idea Why divide First Select Important Meta Path based on Links. Next Refine the Node types

Generating Meta-Paths Phase 1: Link-Only Path Generation Forward Stagewise Path Generation(FSPG): iteratively generate the most related meta-path and update the model Updated model Get most related meta-path m Example Pairs Model Training Converge N meta-path m Principle Not detail Y An altered LARS model FINISH GreedyTree

Generating Meta-Paths GreedyTree A tree that greedily expands the node which has the largest priority score Priority Score : related to the correlation between m and r m is the meta path, r is the residual vector which evaluates the gap between the truth and current model Introduce GreedyTree in whole structure Explain m and r Hide the equation

Generating Meta-Paths Less detail Show paths in animation GreedyTree

Meta-path Generation Phase 2: Node Class Generation Why node classes are needed A link only meta path may introduce some unrelated result pairs. is less specific than Solution 1: Lowest Common Ancestor(LCA) Record the LCA in the node of GreedyTree Solution 1: existing one instead of solution Pros and Cons of LCA and TFOF

Meta-path Generation Solution 3: TFOF tf is the frequency of label in positive examples. of is the overall count in KB of can be pre-computed =2 =42

Experiments Link-prediction evaluation Dataset DBLP (four areas) Yago (Database, Data Mining, Articial Intelligence and Information Retrieval). 14376 papers, 14475 authors, 8920 topics, 20 venues. Yago A Knowledge Base derived from Wikipedia, WordNet and GeoNames. CORE Facts: 2.1 million nodes, 8 million edges, 125 edge types, 0.36 million node types Link-prediction evaluation Select n pairs of certain relationships as example pairs Randomly select another m pairs to predict the links Not the only application—only an evaluation, objective Can be used in others

Experiments Effectiveness Baseline method: enumerating all meta-paths within a given max length L. L is small, low recall. L is large, low precision. ROC for link prediction Explain ROC- larger area is better

Case study- Yago citizenOf Better than direct link(PCRW 1) Better than best PCRW 2 Better than PCRW 3,4 5 most relevant meta-paths for citizenOf

Experiments Efficiency In Yago, 2 orders of magnitude better. In DBLP, the running time is comparable to PCRW 5, but the accuracy is much better. Running time of FSPG

Conclusion We examined a novel problem of meta-paths generation which is highly needed to analyze and query KB. We proposed the FSPG algorithm, and developed GreedyTree to facilitate its execution.

References [Yu EDBT’13]: ] C. Shi , P. Yu“Relevance Search in Heterogeneous Network” EDBT’13 [Han VLDB’11] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks”, VLDB'11 [Han ASONAM’11] Y. Sun, R. Barber, M. Gupta, C. Aggarwal and J. Han, "Co-Author Relationship Prediction in Heterogeneous Bibliographic Networks", ASONAM'11 [Han CIMK’12]Y. Sun, R. Barber, M. Gupta, C. Aggarwal and J. Han, "Meta-Path Selection with User Guided Object Clustering in Heterogeneous Information Networks ", CIKM'12 [Sun VLDB’11]L. Sun, R. Cheng and etc On Link-based Similarity join, VLDB 2011 [Han KDD’12]J. Han, Y. Sun, X. Yan, and P. S. Yu, “Mining Heterogeneous Information Networks“ [Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks” ECML 2011 [Weinberger JMLR’09] Weinberger “Distance Metric Learning for Large Margin Nearest Neighbor Classiﬁcation” Journal of Machine Learning Research 10 (2009) 207-244 [Chopra CVPR’05]S.Chopra “Learning a similarity metric discriminatively, with application of face verification” CVPR 2005

Thank you! Changping Meng meng40@purdue.edu

Introduction Applications 3. Recommendation Promote Movies for customers Choose representatives to Political or Commercial negotiations

Association Rule mining compared with AMIE (Luis, WWW’13) AMIE does not consider the hierarchy of node types. Failed to distinguish Ivy League alumni from the alumni of any other universities

Experiments Class label selection The TFOF method of generating class labels is better for high precision queries LCA is better than TFOF for higher recall rates.