Link Analysis on the Web An Example: Broad-topic Queries Xin
Problem Specific queries: “Does Netscape support the JDK 1.1 code-signing API?” Broad-topic queries: “Find information about the Java programming language.” Authority is important in broad-topic queries Web Query: “ java ” faq/javafaq.htmlhttp://sunsite.unc.edu/java faq/javafaq.html 3.…
Why to use link analysis comparing to content information? Query: Harvard “Harvard” occurring times: 4 Harvard HomepageOther page introducing Harvard “Harvard” occurring times: 8 Query: Search engines “Search engines” occurring times: 0 Yahoo! HomepageOther page introducing search engines “Search engines” occurring times: 4
Graph Presentation G=(V,E) V: pages E: in-link and out-link Adjacency matrix p1p1 p2p2 p3p3 p4p4 p1p1 p2p2 p3p3 p4p Given a query, how to find the most authoritative page through these link information?
Overview Web Query: “ java ” faq/javafaq.htmlhttp://sunsite.unc.edu/java faq/javafaq.html 3.… Sub-graph construction 2.Hubs and authorities computation
Step1: Sub-graph Construction Challenge: –Small in size –Rich in relevant pages –Contains most of the strongest authorities
Step2: Hubs and Authorities Basic Idea: in-degree Problem:
Step2: Hubs and Authorities
An Iterative Algorithm:
Simple Example 1 (x,y): x=hub score y=authority score (1/4,1/4)
Simple Example 2 (1/4,1/4) Hub : 1: 1/4 2: 1/4+1/4 3: 1/4 4: 1/4 Authority : 1: 1/4+1/4+1/4 2: 1/4 3: 0 4: 1/4 0
Page Rank