Discovery of Blog Communities based on Mutual Awareness

Slides:



Advertisements
Similar presentations
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Advertisements

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Information Networks Link Analysis Ranking Lecture 8.
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.
Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
San Francisco Bay Area News Ecology Daniel Ramos CS790G Fall 2010.
Link Structure and Web Mining Shuying Wang
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Friends and Locations Recommendation with the use of LBSN
Adversarial Information Retrieval The Manipulation of Web Content.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Zhichen Xu, Yun Fu, Jianchang Mao, and Difu Su Yahoo! Inc 2821 Mission College Blvd., Santa Clara, CA {zhichen, yfu, jmao, Towards.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Using Hyperlink structure information for web search.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
New Media in Education Blogs & Wikis for Interactive Learning Dr. Chris Greer Georgia College & State University.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Algorithmic Detection of Semantic Similarity WWW 2005.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Ranking Link-based Ranking (2° generation) Reading 21.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
2016/2/131 Structural and Temporal Analysis of the Blogosphere Through Community Factorization Y. Chi, S. Zhu, X. Song, J. Tatemura, B.L. Tseng Proceedings.
1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )
Unsupervised Streaming Feature Selection in Social Media
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.
ENHANCING CLUSTERING BLOG DOCUMENTS BY UTILIZING AUTHOR/READER COMMENTS Beibei Li, Shuting Xu, Jun Zhang Department of Computer Science University of Kentucky.
Yu-Ru Lin, Wen-Yen Chen, Xiaolin Shi, Richard Sia, Siaodan Song,
HITS Hypertext-Induced Topic Selection
Methods and Apparatus for Ranking Web Page Search Results
The Efficacy of Collusions in Web Ranking and the Countermeasurements
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Making Eigenvector-based Reputation Systems Robust to Collusion
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Jiawei Han Department of Computer Science
Enriching Taxonomies With Functional Domain Knowledge
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Adjacency Matrices and PageRank
Discovering Important Nodes through Graph Entropy
Heterogeneous Graph Attention Network
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

Discovery of Blog Communities based on Mutual Awareness Yu-Ru Lin, Hari Sundaram, YunChi, Jun Tatemura, Belle Tseng [3rd Annual Workshop on the Weblogging Ecosystems] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/06/19

Outline Introduction Analytical framework Community formation Extracting blog communities Performance metrics Experiment

Introduction Blog Become prominent social media Enable users to publish content quickly Blog communities are different from traditional web communities Formed based on mutual awareness

Introduction Mutual awareness Individual bloggers become aware of each other’s presence through interaction (e.g., Comments, trackback). No blogger is aware of others’ s actions Not communities!

Introduction New approach to community extraction Two steps: (a) Analysis of mutual awareness from bloggers’ actions. (b) Ranking-based community extraction from mutual awareness.

Analytical framework (1/3) Community emerges through individual bloggers’ behavior Individual bloggers are aware of each other’s presence through interaction (Bi-directional property) Community needs to sustain over time

Analytical framework (2/3) Properties of blogs 1. Temporal dynamics: Blogs represent easily editable content. 2. Event Locality: A typical blog entry is time sensitive. 3. Link Semantics A hyperlink can have different semantics. 4. Community Centric People that interested in each others’ content

Analytical framework (3/3) Community extraction problem Distinct from traditional ranking problems on the web. Key difference is the semantics of the hyperlinked structure. Hub Authority

Community Formation Acts of bloggers in the blogosphere 1. Surf/read 2. Create entries 3. Comment 4. Change blogroll

Extracting blog communities 1. Computing Mutual Awareness 2. Ranking-Based Clustering Method

Computing Mutual Awareness (1/7) Mutual awareness is affected by Type of action Number of actions for each type When the action occurred Mutual awareness depends on sustained actions

Computing Mutual Awareness (2/7) Graph G = (V, E), nodes = bloggers, edge = connection of nodes, wij = weight on edges is a function of mutual awareness Mutual awareness matrix M A weighted linear combination of action matrices

Computing Mutual Awareness (3/7) For each action type k at time t, compute Temporal action matrix Xk,t Each entry xij,k,t of matrix represents the number of times the kth action ak was performed by blogger i on blogger j. e.g. Blogger i leaves a comment on blogger j’s entry.

Computing Mutual Awareness (4/7) C action type 1: yellow action type 2: blue action type 3: red X1,t = X2,t =

Computing Mutual Awareness (5/7) Effect of actions to mutual awareness diminish gradually. (λk is decaying factor for the action type k)

Computing Mutual Awareness (6/7) Two specific types of actions (a) create an entry-to-entry link (mij=0 if mij < λm ) (b) send a trackback (r denotes how likely the trackback receiver is to be aware of the trackback sender by the action of sending a trackback)

Computing Mutual Awareness (7/7) Fusion of Actions Assume actions ak in the action set are independent of each other.

Ranking-Based Clustering Method Start from highly ranked blogs to extract dense subgraphs that include popular bloggers 1. Choose seeds based on PageRank (rt is a vector of ranking score r, α is damping factor) 2. Find community members associated with seeds 3. Iterate 1 and 2 to discover N communities Exclude blogs in the existing communities

Performance Metrics Coverage Conductance measures the fraction of edges that are intra-community Higher coverage have higher quality Conductance Small conductance has higher equality

Performance Metrics Interest Coefficient Measure how much a community member is interested in his or her assigned community. Interest coefficient of an individual blogger m toward an assigned cluster Ck Interest coefficient of a cluster Ck

Performance Metrics Sustainability Average sustainability for a set C of k extracted communities

Experimental Results NEC Blog Dataset 127,467 entries for a period of 25 consecutive weeks 584 seed blogs 40,877 links 2,898 trackback links

Experimental Results Compare the communities extracted by using two different features Baseline adjacency matrix (wij = # entry to entry links) Mutual awareness matrix (wij = mutual awareness score)

Experimental Results Experimental design

Experimental Results

Experimental Results Sustainability of communities

Experimental Results Use content of blog entries to validate communities red: trackback link blue: entry-to entry links

Experimental Results

Experimental Results WWW 2006 Blog Workshop Dataset 8.37 million entries 1.43 million blog sites (Narrow down to 122K)

Experimental Results

Experimental Results

Experimental Results