Jiawei Han Department of Computer Science

Slides:



Advertisements
Similar presentations
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Advertisements

CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer June
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Network A/B Testing: From Sampling to Estimation
Overview of Web Data Mining and Applications Part I
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Advanced Data Mining May 4, 2010 Growing Parallel Paths for Entity-Page.
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Using Hyperlink structure information for web search.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
2010 © University of Michigan 1 DivRank: Interplay of Prestige and Diversity in Information Networks Qiaozhu Mei 1,2, Jian Guo 3, Dragomir Radev 1,2 1.
Computing & Information Sciences Kansas State University Laboratory for Knowledge Discovery in Databases PhD Research Proficiency Exam Jing.
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
Ranking Link-based Ranking (2° generation) Reading 21.
Measuring Behavioral Trust in Social Networks
+ Big Data, Network Analysis Week How is date being used Predict Presidential Election - Nate Silver –
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
RoundTripRank Graph-based Proximity with Importance and Specificity Yuan FangUniv. of Illinois at Urbana-Champaign Kevin C.-C. ChangUniv. of Illinois at.
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
ClusCite:Effective Citation Recommendation by Information Network-Based Clustering Date: 2014/10/16 Author: Xiang Ren, Jialu Liu,Xiao Yu, Urvashi Khandelwal,
Graph clustering to detect network modules
Large Graph Mining: Power Tools and a Practitioner’s guide
A Viewpoint-based Approach for Interaction Graph Analysis
The PageRank Citation Ranking: Bringing Order to the Web
Hanan Ayad Supervisor Prof. Mohamed Kamel
Jon Crowcroft Pan Hui Computer Laboratory University of Cambridge
CIKM’ 09 November 3rd, 2009, Hong Kong
Jiawei Han Department of Computer Science
Mining Dynamics of Data Streams in Multi-Dimensional Space
A Comparative Study of Link Analysis Algorithms
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
CS7280: Special Topics in Data Mining Information/Social Networks
Network Science: A Short Introduction i3 Workshop
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Peer-to-Peer and Social Networks Fall 2017
Adaptive entity resolution with human computation
Example: Academic Search
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Fig. 3 Faculty placement distributions.
Presented by Nick Janus
Presentation transcript:

Data Mining: Principles and Algorithms Introduction to Network Analysis Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj ©2012 Jiawei Han. All rights reserved. Acknowledgements: Based on the slides by Sangkyum Kim and Chen Chen

Introduction to Network Analysis Measure and Metrics of Networks Mining Information Network 2 2

Measure & Metrics Degree Centrality Eigen Vector Centrality Not all neighbors are equal Katz Centrality PageRank 两个假设: 1. 在web图中,如果一个页面节点的入度越大,则表示这个页面越重要 2. 指向页面A的入链的权重不同,质量越高的页面会给A贡献越大的权重 HITS 分为Authority页面与Hub页面;Authority页面指高质量的网页,而Hub为指向authority的那些网页 1. 一个好的Authority页面会被很多Hub页面引用 2. 一个好的Hub页面会指向很多好的Authority页面

Measure & Metrics (2): Eigen Vector Centrality              

Measure & Metrics (3): Katz Centrality          

Introduction to Network Analysis Measure and Metrics of Networks Mining Information Network 6 6

What Are Information Networks? Information network: A network where each node represents an entity (e.g., actor in a social network) and each link (e.g., tie) a relationship between entities Each node/link may have attributes, labels, and weights Link may carry rich semantic information Homogeneous vs. heterogeneous networks Homogeneous networks Single object type and single link type Single model social networks (e.g., friends) WWW: a collection of linked Web pages Heterogeneous, multi-typed networks Multiple object and link types Medical network: patients, doctors, disease, contacts, treatments Bibliographic network: publications, authors, venues 7

Clustering and Ranking: Two Critical Functions H F J I B D Clustering Ranking Not distinguishing objects in each cluster? 1 2 3 4 5 A C E H 1 2 3 4 5 B D I F J 1 2 3 4 5 6 7 8 9 10 A C E B D G I H F J G Yelp: department store vs. restaurant A C A better solution: Integrating clustering with ranking Comparing apples and oranges? I E G B D J H F

RankClus: Integrating Clustering with Ranking   Simple solution: Project the bi-typed network into homogeneous conference network? Information-loss projection!

A New Methodology: RankClus Ranking as the feature of the cluster Ranking is conditional on a specific cluster E.g., VLDB’s rank in Theory vs. its rank in the DB area The distributions of ranking scores over objects are different in each cluster Clustering and ranking are mutually enhanced Better clustering: rank distributions for clusters are more distinguishing from each other Better ranking: better metric for objects is learned from the ranking Not every object should be treated equally in clustering! Y. Sun, et al., “RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis”, EDBT'09.

Simple Ranking vs. Authority Ranking Proportional to # of publications of an author / a conference Considers only immediate neighborhood in the network Authority Ranking: More sophisticated “rank rules” are needed Propagate the ranking scores in the network over different types What about an author publishing 100 papers in very weak conferences?

Rules for Authority Ranking Rule 1: Highly ranked authors publish many papers in highly ranked conferences Rule 2: Highly ranked conferences attract many papers from many highly ranked authors Rule 3: The rank of an author is enhanced if he or she co-authors with many highly ranked authors

RankClus: Algorithm Framework Sub-Network Ranking Clustering Initialization Randomly partition Repeat Ranking Ranking objects in each sub-network induced from each cluster Generating new measure space Estimate mixture model coefficients for each target object Adjusting cluster Until stable

RankClus: Clustering & Ranking CS Conferences Top-10 conferences in 5 clusters using RankClus in DBLP (when k = 15) RankClus outperforms spectral clustering [Shi and Malik, 2000] algorithms on projected homogeneous networks