(hyperlink-induced topic search)

Slides:



Advertisements
Similar presentations
Every edge is in a red ellipse (the bags). The bags are connected in a tree. The bags an original vertex is part of are connected.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Information Networks Link Analysis Ranking Lecture 8.
How Does a Search Engine Work? Part 2 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Link Structure and Web Mining Shuying Wang
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Computer Science 1 Web as a graph Anna Karpovsky.
Link Analysis HITS Algorithm PageRank Algorithm.
Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presentation by Julian Zinn.
Using Hyperlink structure information for web search.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Web Intelligence Web Communities and Dissemination of Information and Culture on the www.
Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.
On Delaying Collision Checking in PRM Planning – Application to Multi-Robot Coordination By: Gildardo Sanchez and Jean-Claude Latombe Presented by: Michael.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Algorithm for obtaining the connected components of a graph Samia Qader 252a-az HW # 6 all examples obtained from LEDA software in directory: 252a/handout/demo/graphwin/graphwin.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Ranking Link-based Ranking (2° generation) Reading 21.
Convergence of PageRank and HITS Algorithms Victor Boyarshinov Eric Anderson 12/5/02.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
1 CS 430: Information Discovery Lecture 5 Ranking.
Siddhartha Gunda Sorabh Hamirwasia.  Generating small world network model.  Optimal network property for decentralized search.  Variation in epidemic.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
The PageRank Citation Ranking: Bringing Order to the Web
Discrete ABC Based on Similarity for GCP
Minimum Spanning Tree Chapter 13.6.
HITS Hypertext-Induced Topic Selection
CS 326A: Motion Planning Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (1996) L. Kavraki, P. Švestka, J.-C. Latombe,
7CCSMWAL Algorithmic Issues in the WWW
Link-Based Ranking Seminar Social Media Mining University UC3M
Lecture 22 SVD, Eigenvector, and Web Search
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
A Simple Example Update Authority Scores first Auth Key: Hub 1.000
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Junghoo “John” Cho UCLA
Clustering The process of grouping samples so that the samples are similar within each group.
COMP5331 Web databases Prepared by Raymond Wong
Presentation transcript:

(hyperlink-induced topic search) Convergence of HITS (hyperlink-induced topic search) algorithm by Victor Boyarshinov

Was first introduced by Jon M. Kleinberg (1998). Assumption: a topic can be roughly divided into pages with good coverage of the topic, called authorities, and directory-like pages with many hyperlinks to useful pages on the topic, called hubs. And the goal of HITS is basically to identify good authorities and hubs for a certain topic which is usually defined by the user's query. Given a user query, the HITS algorithm first creates a neighborhood graph for the query. Then, an iterative calculation was performed on the value of authority and value of hub.

For each page p , the authority and hub values are computed as follows: The authority value of page p is the sum of hub scores of all the pages that points to p The hub value of page p is the sum of authority scores of all the pages that p points to

Algorithm Complexity Time complexity of one iteration of the HITS algorithm is O(|E(G)|). Experimental Data The algorithm was tested on uniformly generated directed random graphs. The edge probability value was tuned so that expected value for out-degree of every vertex was 10. Convergence Criterion Iterations were performed until sum of absolute values of weight changes fall below constant threshold (0.0000001).

Goal of the Experiments: Determine how number of iterations increases with size of graph if average degree of a vertex remains the same (10). Experiments results

The reason why attempt of convergence rate estimating failed is insufficient number of tests (computationally expensive!) Another set of test examples was generated as follows: take two disjoint uniformly generated random graphs of size n, take any vertex v from the first graph, vertex u from the second graph and connect the components by adding edges (u, v) and (v, u). Experiments results: