Download presentation
Presentation is loading. Please wait.
Published byDewi Cahyadi Modified over 5 years ago
1
Discovery of Blog Communities based on Mutual Awareness
Yu-Ru Lin, Hari Sundaram, YunChi, Jun Tatemura, Belle Tseng [3rd Annual Workshop on the Weblogging Ecosystems] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/06/19
2
Outline Introduction Analytical framework Community formation
Extracting blog communities Performance metrics Experiment
3
Introduction Blog Become prominent social media Enable users to publish content quickly Blog communities are different from traditional web communities Formed based on mutual awareness
4
Introduction Mutual awareness
Individual bloggers become aware of each other’s presence through interaction (e.g., Comments, trackback). No blogger is aware of others’ s actions Not communities!
5
Introduction New approach to community extraction Two steps:
(a) Analysis of mutual awareness from bloggers’ actions. (b) Ranking-based community extraction from mutual awareness.
6
Analytical framework (1/3)
Community emerges through individual bloggers’ behavior Individual bloggers are aware of each other’s presence through interaction (Bi-directional property) Community needs to sustain over time
7
Analytical framework (2/3)
Properties of blogs 1. Temporal dynamics: Blogs represent easily editable content. 2. Event Locality: A typical blog entry is time sensitive. 3. Link Semantics A hyperlink can have different semantics. 4. Community Centric People that interested in each others’ content
8
Analytical framework (3/3)
Community extraction problem Distinct from traditional ranking problems on the web. Key difference is the semantics of the hyperlinked structure. Hub Authority
9
Community Formation Acts of bloggers in the blogosphere 1. Surf/read
2. Create entries 3. Comment 4. Change blogroll
10
Extracting blog communities
1. Computing Mutual Awareness 2. Ranking-Based Clustering Method
11
Computing Mutual Awareness (1/7)
Mutual awareness is affected by Type of action Number of actions for each type When the action occurred Mutual awareness depends on sustained actions
12
Computing Mutual Awareness (2/7)
Graph G = (V, E), nodes = bloggers, edge = connection of nodes, wij = weight on edges is a function of mutual awareness Mutual awareness matrix M A weighted linear combination of action matrices
13
Computing Mutual Awareness (3/7)
For each action type k at time t, compute Temporal action matrix Xk,t Each entry xij,k,t of matrix represents the number of times the kth action ak was performed by blogger i on blogger j. e.g. Blogger i leaves a comment on blogger j’s entry.
14
Computing Mutual Awareness (4/7)
C action type 1: yellow action type 2: blue action type 3: red X1,t = X2,t =
15
Computing Mutual Awareness (5/7)
Effect of actions to mutual awareness diminish gradually. (λk is decaying factor for the action type k)
16
Computing Mutual Awareness (6/7)
Two specific types of actions (a) create an entry-to-entry link (mij=0 if mij < λm ) (b) send a trackback (r denotes how likely the trackback receiver is to be aware of the trackback sender by the action of sending a trackback)
17
Computing Mutual Awareness (7/7)
Fusion of Actions Assume actions ak in the action set are independent of each other.
18
Ranking-Based Clustering Method
Start from highly ranked blogs to extract dense subgraphs that include popular bloggers 1. Choose seeds based on PageRank (rt is a vector of ranking score r, α is damping factor) 2. Find community members associated with seeds 3. Iterate 1 and 2 to discover N communities Exclude blogs in the existing communities
19
Performance Metrics Coverage Conductance
measures the fraction of edges that are intra-community Higher coverage have higher quality Conductance Small conductance has higher equality
20
Performance Metrics Interest Coefficient
Measure how much a community member is interested in his or her assigned community. Interest coefficient of an individual blogger m toward an assigned cluster Ck Interest coefficient of a cluster Ck
21
Performance Metrics Sustainability
Average sustainability for a set C of k extracted communities
22
Experimental Results NEC Blog Dataset
127,467 entries for a period of 25 consecutive weeks 584 seed blogs 40,877 links 2,898 trackback links
23
Experimental Results Compare the communities extracted by using two different features Baseline adjacency matrix (wij = # entry to entry links) Mutual awareness matrix (wij = mutual awareness score)
24
Experimental Results Experimental design
25
Experimental Results
26
Experimental Results Sustainability of communities
27
Experimental Results Use content of blog entries to validate communities red: trackback link blue: entry-to entry links
28
Experimental Results
29
Experimental Results WWW 2006 Blog Workshop Dataset
8.37 million entries 1.43 million blog sites (Narrow down to 122K)
30
Experimental Results
31
Experimental Results
32
Experimental Results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.