Presentation is loading. Please wait.

Presentation is loading. Please wait.

HITS Hypertext-Induced Topic Selection

Similar presentations


Presentation on theme: "HITS Hypertext-Induced Topic Selection"— Presentation transcript:

1 HITS Hypertext-Induced Topic Selection
BÜŞRA İPEK SELİME IŞIK Selime Işık-Büşra İpek

2 Selime Işık-Büşra İpek
OUTLINE Introduction PageRank Algorithm HITS Algorithm HITS Example HITS vs PageRank Conclusion Selime Işık-Büşra İpek

3 Selime Işık-Büşra İpek
Search Engines 1.Crawler: retrieves the contents of web pages 2.Indexer: stores and indexes information on the retrieved pages 3.Ranker: determines the importance of web pages returned 4.Retrieval Engine: performs lookups on index tables Selime Işık-Büşra İpek

4 Selime Işık-Büşra İpek
Ranking Today’s search engines may return millions of pages for a certain query It is not possible for a user to preview all the returned results So, ranking is helpful Selime Işık-Büşra İpek

5 Selime Işık-Büşra İpek
Rankers Rankers are classified into two groups : 1.Content-based rankers number of matched terms frequency of terms location of terms 2.Connectivity-based rankers links that point to them Selime Işık-Büşra İpek

6 Selime Işık-Büşra İpek
Link Analysis There are two famous link analysis methods: 1.PageRank Algorithm 2.HITS Algorithm Selime Işık-Büşra İpek

7 Selime Işık-Büşra İpek
PageRank originally formulated by Sergey Brin and Larry Page does not rank web sites as a whole but is determined for each page individually according to their authoritativeness if an authoritative web page A links to page B, then B is also authoritative Selime Işık-Büşra İpek

8 Selime Işık-Büşra İpek
PageRank (2) recursive formula page rank initially 1 for all nodes normalized when difference between two successive calculations is very small PR(A) = (1-d) + d (PR(T1)/C(T1) PR(Tn)/C(Tn)) Selime Işık-Büşra İpek

9 Selime Işık-Büşra İpek
HITS Kleinberg's hypertext-induced topic selection (HITS) algorithm is also developed for ranking documents based on the link information among a set of documents. Selime Işık-Büşra İpek

10 Selime Işık-Büşra İpek
Authorities and hubs The algorithm produces two types of pages: - Authority: pages that provide an important, trustworthy information on a given topic - Hub: pages that contain links to authorities Authorities and hubs exhibit a mutually reinforcing relationship: a better hub points to many good authorities, and a better authority is pointed to by many good hubs Selime Işık-Büşra İpek

11 Authorities and hubs (2)
a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7) 1 5 6 7 2 3 4 Selime Işık-Büşra İpek

12 Selime Işık-Büşra İpek
Definitions Authority: pages that provide an important, trustworthy information on a given topic Hubs: pages that contain links to authorities Indegree: number of incoming links to a given node, used to measure the authoritativeness Outdegree: number of outgoing links from a given node, here it is used to measure the hubness Selime Işık-Büşra İpek

13 Selime Işık-Büşra İpek
HITS Algorithm Hubs point to lots of authorities. Authorities are pointed to by lots of hubs. Together they form a bipartite graph: Hubs Authorities Selime Işık-Büşra İpek

14 Selime Işık-Büşra İpek
Step By Step HITS-1 determines a base set S let set of documents returned by a standard search engine be called the root set R Initialize S to R Selime Işık-Büşra İpek

15 Selime Işık-Büşra İpek
Step By Step HITS - 2 Add to S all pages pointed to by any page in R. Add to S all pages that point to any page in R Maintain for each page p in S: Authority score: ap (vector a) Hub score: hp (vector h) Selime Işık-Büşra İpek

16 Selime Işık-Büşra İpek
Step By Step HITS - 3 For each node initiliaze the ap and hp to 1/n In each iteration calculate the authority weight for each node in S Selime Işık-Büşra İpek

17 Selime Işık-Büşra İpek
Step By Step HITS - 4 In each iteration calculate the hub weight for each node in S Note: The hub weights are computed from the current authority weights, which were computed from the previous hub weights. Selime Işık-Büşra İpek

18 Selime Işık-Büşra İpek
Step By Step HITS - 5 After new weights are computed for all nodes, the weights are normalized: Selime Işık-Büşra İpek

19 Convergence of HITS Algorithm
Let A be an adjacency matrix of S Aij = 1 for i S , jS if and only if i->j Authority and hub: ak = φkAThk-1; hk = ψkAak; Combination of both formulas gives: ak = φkψk-1ATAak for k > 1 hk = ψkφkAAThk for k > 0 Selime Işık-Büşra İpek

20 Convergence of HITS Algorithm-2
The algorithm converges to a fixed point if iterated indefinitely and the resulting authority and hub vectors satisfy a* = (1/µ*)ATAa*; h* = (1/µ*)AATh*; The authority vector a* is an eigenvector of ATA ,converging to ATA The hub vector h* is an eigenvector of AAT, converging to AAT Selime Işık-Büşra İpek

21 Selime Işık-Büşra İpek
The Pseudocode of HITS Selime Işık-Büşra İpek

22 Selime Işık-Büşra İpek
HITS Example Root Set R {1,2,3,4} Extend it to form the base set S Selime Işık-Büşra İpek

23 Selime Işık-Büşra İpek
HITS Example Results Authority Authority and Hubness Weight Hubness Selime Işık-Büşra İpek

24 Selime Işık-Büşra İpek
HITS vs PageRank HITS emphasizes mutual reinforcement between authority and hub webpages, while PageRank does not attempt to capture the distinction between hubs and authorities. It ranks pages just by authority. HITS is applied to the local neighborhood of pages surrounding the results of a query whereas PageRank is applied to the entire web HITS is query dependent but PageRank is query-independent Selime Işık-Büşra İpek

25 Selime Işık-Büşra İpek
HITS vs PageRank (2) Both HITS and PageRank correspond to matrix computations. Both can be unstable: changing a few links can lead to quite different rankings. PageRank doesn't handle pages with no outedges very well, because they decrease the PageRank overall Selime Işık-Büşra İpek

26 Selime Işık-Büşra İpek
Conclusion HITS is a general algorithm used for calculating the authority and hubs in order to rank the retrieved data The basic aim of that algorithm is to induce the Web graph by finding set of pages with a search on a given topic (query). Results demonstrates that it is good in calculating the authority nodes and hubness. Selime Işık-Büşra İpek

27 Selime Işık-Büşra İpek
References Engines research.microsoft.com/users/tyliu/files/USTC-Lecture-tyliu.ppt Selime Işık-Büşra İpek

28 Selime Işık-Büşra İpek
THANK YOU Selime Işık-Büşra İpek

29 Selime Işık-Büşra İpek
ANY QUESTIONS? Selime Işık-Büşra İpek


Download ppt "HITS Hypertext-Induced Topic Selection"

Similar presentations


Ads by Google