1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology VisualRank- Applying PageRank to Large-Scale Image Search.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Information Retrieval
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Introduction To Blogging Sarah Mapel 9 October 2007.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Search Engine Optimization
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Search. Search and Economics Search is ubiquitous –Money as a search efficiency Eliminates double coincidence of wants in search for barter exchange –Job.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
The Internet 8th Edition Tutorial 4 Searching the Web.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
NTU Natural Language Processing Lab. 1 Investment and Attention in the Weblog Community Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen.
What Is SEO? Search engine optimization (SEO) is the art and science of publishing and marketing information that ranks well for valuable keywords in.
NTU Natural Language Processing Lab. 1 An Analysis of Effectiveness of Tagging in Blogs Christopher H. Brooks and Nancy Montanez University of San Francisco.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Search Tools and Search Engines Searching for Information and common found internet file types.
Search Engines By: Faruq Hasan.
Ranking Link-based Ranking (2° generation) Reading 21.
LOGO Identifying the Influential Bloggers in a Community Nitin Agarwal, Huan Liu, Lei Tang and Philip S. Yu WSDM 2008 Advisor : Dr. Koh Jia-Ling Speaker.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
NTU Natural Language Processing Lab. 1 Blog Track Open Task: Spam Blog Classification Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen Date: 2007/01/08.
+ “Introduction to Blogging” Katelyn Jacobsen By WordPress.org.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Search Engine Optimization
DATA MINING Introductory and Advanced Topics Part III – Web Mining
Methods and Apparatus for Ranking Web Page Search Results
The Anatomy of a Large-Scale Hypertextual Web Search Engine
A Comparative Study of Link Analysis Algorithms
Lecture 22 SVD, Eigenvector, and Web Search
HITS Hypertext Induced Topic Selection
Information retrieval and PageRank
HITS Hypertext Induced Topic Selection
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Discovery of Blog Communities based on Mutual Awareness
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Presentation transcript:

1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

2 Outline  Motivation  Assumed Blog Structure  Classification of Blog Ranking  The EigenRumor Algorithm Community model Scores Algorithm  Mapping to Blog community  Experiments  Related Works  Conclusion  Future Work  References

3 Motivation  Approaches of Page ranking PageRank [2] HITS (Hypertext Induced Topic Selection) [3]  Issues The number of links to a blog entry is generally very small. Some time is needed to develop a number of in-links and thus have a higher PageRank score.

4 Assumed Blog Structure  A blog consist a top page and a set of blog entries. A blog is generally updated and maintained by a single blogger.  There are links from the top page of the blog to each blog entry and each blog entry has a permanent URI.  Blog entries are frequently added and the notification of updates is, as an option, sent to a ping server.  A mechanism to construct a trackback [3] is provided.

5 Classification of Blog Ranking  Subject of ranking  Space of ranking  Temporal space of ranking  Semantics of ranking  Source of evaluations collected

6 The EigenRumor Algorithm – Community model (1/2)

7 The EigenRumor Algorithm – Community model (2/2)  When agent i provides (posts) object j, a provisioning link is established from i to j.  When agent i evaluates the usefulness of an existing object j with the scoring value e ij, an evaluation link is established from i to j.  Provisioning matrix P = [p ij ] to represent all provisioning links in the universe.  Evaluation matrix E=[E ij ] to represent all evaluation links in the universe.

8 The EigenRumor Algorithm – Scores  Authority score (agent property) This indicates to what level agent i provided objects in the past that following the community direction.  Hub score (agent property) This indicates to what level agent i submitted comments (evaluation) that followed the community direction on other past objects.  Reputation score (object property) This indicates the level of support object j received from the agents.

9 The EigenRumor Algorithm – Algorithm (1/4)  Assumptions The objects that are provided by a “ good ” authority will follow the direction of the community. The objects that are supported by a “ good ” hub will follow the direction of the community. The agent that provide objects that follow the community direction are “ good ” authorities of the community. The agent that evaluate objects that follow the community direction are “ good ” hubs of the community.

10 The EigenRumor Algorithm – Algorithm (2/4)  Notations

11 The EigenRumor Algorithm – Algorithm (3/4)

12 The EigenRumor Algorithm – Algorithm (4/4)

13 Mapping to Blog community (1/3)  The links from top page of the blog site to the blog entries => information provisioning links.  The links to blog entries in other blogs => information evaluation links.  (Forward) Trackback => the interest of the blogger.  (Backward) Trackback => be ignored, often generated by spamming.

14 Mapping to Blog community (2/3)  The basic algorithm does not normalize information provisioning matrix P or information evaluation E.  Problem: Some user creates many blog accounts and interlinks them, he/she can inflate the scores.

15 Mapping to Blog community (3/3)  Solutions: Normalization function 1: Normalization function 2 (longevity factor):

16 Experiments (1/3)  In the database of this system, entries from blog sites (04/10/16 ~ 05/02/03).  Original: (16.3%) entries have one or more hyperlinks (1.25%) entries are linked to other blogs (1.15%) entries are referred to by other blogs.

17 Experiments (2/3)  Applying EigenRumor algorithm: bloggers have at least one blog entry linked from other blogs (9.28%) bloggers have nonzero authority scores => (9.28%) entries have nonzero reputation scores.

18 Experiments (3/3)  Face-to-Face user survey (40 guests Feb. 2005) Best result EigenRumorIn-linkTFIDFNot determined Queries18 (45%)2 (5%)1 (2.5%) 19 (48%)

19 Related Works  iRank  Technorati provided a commercial blog search.  EigenRumor algorithm: Agent-to-object, instead of page-to- page or agent-to-agent. The normalization of link. Dynamic structure of links.

20 Conclusion  The important feature of the algorithm is to widen the coverage of blog entries that are assigned a score by only from static link analysis.

21 Future Work  The problem of spamming.  How to choose a better ranking algorithm for specific keyword?

22 References [1] K. Fujimura, T. Inoue, and M. Sugisaki, “ The EigenRumor Algorithm for Ranking Blogs, ” Nippon Telegraph and Telephone, 10 May [2] S. Brin and L. Page, “ The Anatomy of a Large-scale Hypertextual Web Search Engine, ” In Proceedings of 7 th International World Wide Web Conference, [3] Wikipedia,