Honglei Zhuang1, Jing Zhang2, George Brova1,

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

gSpan: Graph-based substructure pattern mining

A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date ： 2014/04/15 Source ： KDD’13 Authors ： Chi Wang, Marina Danilevsky, Nihit.

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.

EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13.

Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Information Retrieval in Practice

On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.

Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.

Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.

Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.

Overview of Search Engines

1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.

Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.

1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.

 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.

Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

Advisor-advisee Relationship Mining from Research Publication Network Chi Wang 1, Jiawei Han 1, Yuntao Jia 1, Jie Tang 2, Duo Zhang 1, Yintao Yu 1, Jingyi.

EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.

Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA

Discovering Meta-Paths in Large Heterogeneous Information Network

P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.

The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.

Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,

Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.

Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.

Query-based Graph Cuboid Outlier Detection

UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Discovering Meta-Paths in Large Heterogeneous Information Network Changping Meng (Purdue University) Reynold Cheng (University of Hong Kong) Silviu Maniu.

Outlier Detection for Information Networks Manish Gupta 15 th Jan 2013.

MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS

Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data

Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.

RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,

Community Distribution Outliers in Heterogeneous Information Networks

Jiawei Han Department of Computer Science

Navigation-Aided Retrieval

A novel Web usage mining approach for search engines

Presentation transcript:

Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks Honglei Zhuang1, Jing Zhang2, George Brova1, Jie Tang2, Hasan Cam3, Xifeng Yan4, Jiawei Han1 1University of Illinois at Urbana-Champaign 2Tsinghua University 3US Army Research Lab 4University of California at Santa Barbara

Suppose we are given travel information of users, including: Flight info, Hotel booking info, Car rental info, … How can an analyst identify terrorists ring from the massive information? This scenario can be naturally extended to a more general problem: query-based subnetwork outlier detection.

Querying Subnetwork Outliers Input: A travel information network, a query Flights to Rio, Brazil Passenger Hotel Flight User poses a query: “Analyze passenger groups flying to Rio, Brazil”

Querying Subnetwork Outliers Input: A travel information network, a query Retrieve relevant subnetworks Flights to Rio, Brazil Passenger Hotel Flight User poses a query: “Analyze passenger groups flying to Rio, Brazil” Retrieve candidate subnetworks: connected and relevant to query

Querying Subnetwork Outliers Input: A travel information network, a query Retrieve relevant subnetworks Output: outlier subnetworks Outlier subnetwork Flights to Rio, Brazil Passenger Hotel Flight User poses a query: “Analyze passenger groups flying to Rio, Brazil” Retrieve candidate subnetworks: connected and relevant to query Identify outlier subnetworks: deviating significantly from others

Problem Definition Input: Output: A heterogeneous information network A query consisting of A set of queried vertices (entities) e.g. “Flight 123” Relationship from queried vertices to desired vertices e.g., “passengers on the flight” How they form subnetworks e.g., “traveling together” Output: Outlier subnetworks meta-path

Methodology 1 2 3 1 General Framework Retrieving relevant subnetworks Retrieve relevant subnetworks Calculate similarity between subnetworks Rank outlier subnetworks 1 Retrieving relevant subnetworks Can be handled by IR techniques Not our focus of this work Applying a simple retrieving strategy based on frequent pattern mining

Similarity Measure 2 Intuition: two subnetworks are similar when their members are from similar distribution over communities Basic idea: Calculate individual similarity by meta-path based similarity measure PathSim* Similarity measures (w.l.o.g, ) where is a set of pairs of vertices from two subnetworks, satisfying * Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta-path based top-k similarity search in heterogeneous information networks. In VLDB, pages 992–1003, 2011.

Similarity Measure (cont’) 2 Example S1 S2 Desired 1.0 AvgSim 0.5 *MatchSim BMSim Desired 1.0 AvgSim 0.5 *MatchSim BMSim Desired <1 AvgSim 0.375 *MatchSim 0.5 BMSim * Z. Lin, M. R. Lyu, and I. King. Matchsim: a novel neighbor-based similarity measure with maximum neighborhood matching. In CIKM, pages 1613–1616, 2009.

Subnetwork Outliers 3 Intuition: Basic Ideas: Clustering subnetworks by either assigning a subnetwork with an “exemplar” subnetwork, or classifying the subnetwork as an outlier Basic Ideas: Calculate the outlierness by Automatically weighting multiple similarity measures instantiated by different meta-paths *B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972–976, 2007.

Subnetwork Outliers 3 Intuition: Basic Ideas: Clustering subnetworks by either assigning a subnetwork with an “exemplar” subnetwork, or classifying the subnetwork as an outlier Basic Ideas: Calculate the outlierness by Automatically weighting multiple similarity measures instantiated by different meta-paths How good j is an exempler Similarity between i and j *B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972–976, 2007.

Data Sets Synthetic + 2 real world data sets are employed #Vertices #Edges #Types Labels Synthetic 1,000 about 33,000 2 Inserted outliers Bibliography 3,701,765 24,639,131 4 Labeled for 5 queries Patent 2,317,360 11,051,283 6 N/A Synthetic + 2 real world data sets are employed Bibliography data set are constructed based on DBLP Patent data set are constructed based on US Patent data

Experimental Results Performance Baselines Ind: sum of individual outlierness NB: topic modeling with an “outlier” topic Data set Synthetic Bibliography Measure P@5 MAP AUC Ind 60.00 66.61 85.00 28.00 24.82 59.91 NB 75.00 75.76 93.68 30.20 67.87 Proposed 84.00 92.04 99.50 44.00 45.05 79.55

Case Study Query: outlier author subnetworks related to “topic modeling” Proposed Method \ Ind Ind \ Proposed Method Sanjeev Arora, Rong Ge, Ankur Moitra Theory group Tu Bao Ho, Khoat Than Data mining group Giovanni Ponti, Andrea Tagarelli Name ambiguity problem for Giovanni Ponti – could be an economics researcher or a data mining researcher Zhixin Li, Huifang Ma, Zhongzhi Shi Machine learning and data mining group

Summary Study a novel problem of query-based subnetwork outlier detection in heterogeneous information networks Propose a framework to tackle the problem Formalize the query Propose a subnetwork similarity Rank outlier subnetworks

Thanks 12/16/2014