1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )
2 Outline Motivation Assumed Blog Structure Classification of Blog Ranking The EigenRumor Algorithm Community model Scores Algorithm Mapping to Blog community Experiments Related Works Conclusion Future Work References
3 Motivation Approaches of Page ranking PageRank [2] HITS (Hypertext Induced Topic Selection) [3] Issues The number of links to a blog entry is generally very small. Some time is needed to develop a number of in-links and thus have a higher PageRank score.
4 Assumed Blog Structure A blog consist a top page and a set of blog entries. A blog is generally updated and maintained by a single blogger. There are links from the top page of the blog to each blog entry and each blog entry has a permanent URI. Blog entries are frequently added and the notification of updates is, as an option, sent to a ping server. A mechanism to construct a trackback [3] is provided.
5 Classification of Blog Ranking Subject of ranking Space of ranking Temporal space of ranking Semantics of ranking Source of evaluations collected
6 The EigenRumor Algorithm – Community model (1/2)
7 The EigenRumor Algorithm – Community model (2/2) When agent i provides (posts) object j, a provisioning link is established from i to j. When agent i evaluates the usefulness of an existing object j with the scoring value e ij, an evaluation link is established from i to j. Provisioning matrix P = [p ij ] to represent all provisioning links in the universe. Evaluation matrix E=[E ij ] to represent all evaluation links in the universe.
8 The EigenRumor Algorithm – Scores Authority score (agent property) This indicates to what level agent i provided objects in the past that following the community direction. Hub score (agent property) This indicates to what level agent i submitted comments (evaluation) that followed the community direction on other past objects. Reputation score (object property) This indicates the level of support object j received from the agents.
9 The EigenRumor Algorithm – Algorithm (1/4) Assumptions The objects that are provided by a “ good ” authority will follow the direction of the community. The objects that are supported by a “ good ” hub will follow the direction of the community. The agent that provide objects that follow the community direction are “ good ” authorities of the community. The agent that evaluate objects that follow the community direction are “ good ” hubs of the community.
10 The EigenRumor Algorithm – Algorithm (2/4) Notations
11 The EigenRumor Algorithm – Algorithm (3/4)
12 The EigenRumor Algorithm – Algorithm (4/4)
13 Mapping to Blog community (1/3) The links from top page of the blog site to the blog entries => information provisioning links. The links to blog entries in other blogs => information evaluation links. (Forward) Trackback => the interest of the blogger. (Backward) Trackback => be ignored, often generated by spamming.
14 Mapping to Blog community (2/3) The basic algorithm does not normalize information provisioning matrix P or information evaluation E. Problem: Some user creates many blog accounts and interlinks them, he/she can inflate the scores.
15 Mapping to Blog community (3/3) Solutions: Normalization function 1: Normalization function 2 (longevity factor):
16 Experiments (1/3) In the database of this system, entries from blog sites (04/10/16 ~ 05/02/03). Original: (16.3%) entries have one or more hyperlinks (1.25%) entries are linked to other blogs (1.15%) entries are referred to by other blogs.
17 Experiments (2/3) Applying EigenRumor algorithm: bloggers have at least one blog entry linked from other blogs (9.28%) bloggers have nonzero authority scores => (9.28%) entries have nonzero reputation scores.
18 Experiments (3/3) Face-to-Face user survey (40 guests Feb. 2005) Best result EigenRumorIn-linkTFIDFNot determined Queries18 (45%)2 (5%)1 (2.5%) 19 (48%)
19 Related Works iRank Technorati provided a commercial blog search. EigenRumor algorithm: Agent-to-object, instead of page-to- page or agent-to-agent. The normalization of link. Dynamic structure of links.
20 Conclusion The important feature of the algorithm is to widen the coverage of blog entries that are assigned a score by only from static link analysis.
21 Future Work The problem of spamming. How to choose a better ranking algorithm for specific keyword?
22 References [1] K. Fujimura, T. Inoue, and M. Sugisaki, “ The EigenRumor Algorithm for Ranking Blogs, ” Nippon Telegraph and Telephone, 10 May [2] S. Brin and L. Page, “ The Anatomy of a Large-scale Hypertextual Web Search Engine, ” In Proceedings of 7 th International World Wide Web Conference, [3] Wikipedia,