Download presentation
Presentation is loading. Please wait.
1
HITS Hypertext-Induced Topic Selection
BÜŞRA İPEK SELİME IŞIK Selime Işık-Büşra İpek
2
Selime Işık-Büşra İpek
OUTLINE Introduction PageRank Algorithm HITS Algorithm HITS Example HITS vs PageRank Conclusion Selime Işık-Büşra İpek
3
Selime Işık-Büşra İpek
Search Engines 1.Crawler: retrieves the contents of web pages 2.Indexer: stores and indexes information on the retrieved pages 3.Ranker: determines the importance of web pages returned 4.Retrieval Engine: performs lookups on index tables Selime Işık-Büşra İpek
4
Selime Işık-Büşra İpek
Ranking Today’s search engines may return millions of pages for a certain query It is not possible for a user to preview all the returned results So, ranking is helpful Selime Işık-Büşra İpek
5
Selime Işık-Büşra İpek
Rankers Rankers are classified into two groups : 1.Content-based rankers number of matched terms frequency of terms location of terms 2.Connectivity-based rankers links that point to them Selime Işık-Büşra İpek
6
Selime Işık-Büşra İpek
Link Analysis There are two famous link analysis methods: 1.PageRank Algorithm 2.HITS Algorithm Selime Işık-Büşra İpek
7
Selime Işık-Büşra İpek
PageRank originally formulated by Sergey Brin and Larry Page does not rank web sites as a whole but is determined for each page individually according to their authoritativeness if an authoritative web page A links to page B, then B is also authoritative Selime Işık-Büşra İpek
8
Selime Işık-Büşra İpek
PageRank (2) recursive formula page rank initially 1 for all nodes normalized when difference between two successive calculations is very small PR(A) = (1-d) + d (PR(T1)/C(T1) PR(Tn)/C(Tn)) Selime Işık-Büşra İpek
9
Selime Işık-Büşra İpek
HITS Kleinberg's hypertext-induced topic selection (HITS) algorithm is also developed for ranking documents based on the link information among a set of documents. Selime Işık-Büşra İpek
10
Selime Işık-Büşra İpek
Authorities and hubs The algorithm produces two types of pages: - Authority: pages that provide an important, trustworthy information on a given topic - Hub: pages that contain links to authorities Authorities and hubs exhibit a mutually reinforcing relationship: a better hub points to many good authorities, and a better authority is pointed to by many good hubs Selime Işık-Büşra İpek
11
Authorities and hubs (2)
a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7) 1 5 6 7 2 3 4 Selime Işık-Büşra İpek
12
Selime Işık-Büşra İpek
Definitions Authority: pages that provide an important, trustworthy information on a given topic Hubs: pages that contain links to authorities Indegree: number of incoming links to a given node, used to measure the authoritativeness Outdegree: number of outgoing links from a given node, here it is used to measure the hubness Selime Işık-Büşra İpek
13
Selime Işık-Büşra İpek
HITS Algorithm Hubs point to lots of authorities. Authorities are pointed to by lots of hubs. Together they form a bipartite graph: Hubs Authorities Selime Işık-Büşra İpek
14
Selime Işık-Büşra İpek
Step By Step HITS-1 determines a base set S let set of documents returned by a standard search engine be called the root set R Initialize S to R Selime Işık-Büşra İpek
15
Selime Işık-Büşra İpek
Step By Step HITS - 2 Add to S all pages pointed to by any page in R. Add to S all pages that point to any page in R Maintain for each page p in S: Authority score: ap (vector a) Hub score: hp (vector h) Selime Işık-Büşra İpek
16
Selime Işık-Büşra İpek
Step By Step HITS - 3 For each node initiliaze the ap and hp to 1/n In each iteration calculate the authority weight for each node in S Selime Işık-Büşra İpek
17
Selime Işık-Büşra İpek
Step By Step HITS - 4 In each iteration calculate the hub weight for each node in S Note: The hub weights are computed from the current authority weights, which were computed from the previous hub weights. Selime Işık-Büşra İpek
18
Selime Işık-Büşra İpek
Step By Step HITS - 5 After new weights are computed for all nodes, the weights are normalized: Selime Işık-Büşra İpek
19
Convergence of HITS Algorithm
Let A be an adjacency matrix of S Aij = 1 for i S , jS if and only if i->j Authority and hub: ak = φkAThk-1; hk = ψkAak; Combination of both formulas gives: ak = φkψk-1ATAak for k > 1 hk = ψkφkAAThk for k > 0 Selime Işık-Büşra İpek
20
Convergence of HITS Algorithm-2
The algorithm converges to a fixed point if iterated indefinitely and the resulting authority and hub vectors satisfy a* = (1/µ*)ATAa*; h* = (1/µ*)AATh*; The authority vector a* is an eigenvector of ATA ,converging to ATA The hub vector h* is an eigenvector of AAT, converging to AAT Selime Işık-Büşra İpek
21
Selime Işık-Büşra İpek
The Pseudocode of HITS Selime Işık-Büşra İpek
22
Selime Işık-Büşra İpek
HITS Example Root Set R {1,2,3,4} Extend it to form the base set S Selime Işık-Büşra İpek
23
Selime Işık-Büşra İpek
HITS Example Results Authority Authority and Hubness Weight Hubness Selime Işık-Büşra İpek
24
Selime Işık-Büşra İpek
HITS vs PageRank HITS emphasizes mutual reinforcement between authority and hub webpages, while PageRank does not attempt to capture the distinction between hubs and authorities. It ranks pages just by authority. HITS is applied to the local neighborhood of pages surrounding the results of a query whereas PageRank is applied to the entire web HITS is query dependent but PageRank is query-independent Selime Işık-Büşra İpek
25
Selime Işık-Büşra İpek
HITS vs PageRank (2) Both HITS and PageRank correspond to matrix computations. Both can be unstable: changing a few links can lead to quite different rankings. PageRank doesn't handle pages with no outedges very well, because they decrease the PageRank overall Selime Işık-Büşra İpek
26
Selime Işık-Büşra İpek
Conclusion HITS is a general algorithm used for calculating the authority and hubs in order to rank the retrieved data The basic aim of that algorithm is to induce the Web graph by finding set of pages with a search on a given topic (query). Results demonstrates that it is good in calculating the authority nodes and hubness. Selime Işık-Büşra İpek
27
Selime Işık-Büşra İpek
References Engines research.microsoft.com/users/tyliu/files/USTC-Lecture-tyliu.ppt Selime Işık-Büşra İpek
28
Selime Işık-Büşra İpek
THANK YOU Selime Işık-Büşra İpek
29
Selime Işık-Büşra İpek
ANY QUESTIONS? Selime Işık-Büşra İpek
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.