Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer June 20151
Overview Why Do We Care? Introduction Information Objective Approaches and Observed Results Related Work Generalization Conclusion Evaluation of Pros and Cons 18 June Authoritative Sources in a Hyperlinked Environment
Why Do We care? Complexity of WWW as a Hypertext Corpus Nature of the Hyperlinked Environment Structure Efficiency (Longer Response Time) and Storage Problems Because of Huge Amount of Results Return to the User 18 June Authoritative Sources in a Hyperlinked Environment
Introduction Information Query Types Specific E.g. ”Does Windows 7 Support Oracle 10g?” Scarcity Problem Broad-Topic E.g. “Sql Programming Language ” Abundance Problem Authority Notion Similar-Page E.g. “Similar Pages to Oracle.com” 18 June Authoritative Sources in a Hyperlinked Environment
Introduction Information Link-Based Model Encoding latent human judgment Conferred Authority Creating Balance Between Popularity and Relevance Relation Between Authority and Hubs 18 June Authoritative Sources in a Hyperlinked Environment
Objective Presenting the Link-Based Model for the Conferral Authority Exploring Authoritative WWW Sources in the Global Range 18 June Authoritative Sources in a Hyperlinked Environment
Approaches and Observed Results Focused Subgraph Algorithm for WWW Authorities and Hubs Computation Approach for Similar-Page Queries Sample Observed Results 18 June Authoritative Sources in a Hyperlinked Environment
Focused Subgraph Algorithm for WWW Inputs: Query String σ Text-based Search Engine Outputs: Set of Hyperlinked Pages as a Directed Graph G(V,E) Root Set Rσ Sub Set Sσ Almost Small in size Containing Most of Relevant Pages Covering Most of the Strongest Authorities Links Type in G[Sσ] Transverse Intrinsic 18 June Authoritative Sources in a Hyperlinked Environment
Authorities and Hubs Computation Solution to the approach of Ordering Pages by Their In-degree Confusion Between Strong “Authorities” and “Universally Popular“ Pages Containing Mutually Reinforcing Relationship Concept 18 June Authoritative Sources in a Hyperlinked Environment
Authorities and Hubs Computation Iterate Algorithm Input: Set of n linked pages G σ Outputs: Updated Authority Weight (thru operation I) Updated Hub Weight (thru Operation O) Filter Algorithm Input: Set of n linked pages G σ Outputs: Reporting Pages with Top c Authorities Reporting Pages with Top c Hubs 18 June Authoritative Sources in a Hyperlinked Environment
Approach for Similar-Page Queries First Step: What Do Users of the WWW Decide to be Related to a Page When They Create any Pages and Hyperlinks Second Step: Applying Link Structure to the Concept of “Similarity” Third Step: Using concept of Authorities and Hubs 18 June Authoritative Sources in a Hyperlinked Environment
Sample Observed Results ( For Broad-Specific Queries) Query StringAuthoritiesDescription “Search Engine” Yahoo! Excite Welcome to Magellan! Lycos Home Page AltaVista: Main Page “Gates” Bill g.htm Gates: The Road Ahead Welcome to Microsoft 18 June Authoritative Sources in a Hyperlinked Environment
Sample Observed Results (For Similar-Pages Queries) Query StringAuthoritiesDescription “ Welcome Honda Ford Motor Company BMW of North America, Inc. VOLVO Welcome to the Saturn Web Site NISSAN - ENJOY THE RIDE Audi Homepage 1997 Dodge Site Welcome to Chrysler 18 June Authoritative Sources in a Hyperlinked Environment
Related Work Link Structure is Related to: Definition of Standing, Impact and Influence Concepts WWW Ranking Techniques Data Clustering 18 June Authoritative Sources in a Hyperlinked Environment
Standing, Impact and Influence Concepts Social Network Proposed Standing Measure Katz Theory: Based on Path-Counting Hubbell Theory : Based on Nodes Weight-Propagation Scientific Citations Proposed Impact/Influence Measure Garfield’s Impact Theory Pinski-Narin Influence Theory 18 June Authoritative Sources in a Hyperlinked Environment
WWW Ranking TechniquesWWW Ranking Techniques Ranking Measure Proposal: Botafogo-Rivlin-Shniderman Theory Carriere-Kanzman Theory Brin-Page Theory and Contrast with This Paper Approach 18 June Authoritative Sources in a Hyperlinked Environment
Data Clustering Clustering needs : Similarity Functions Bibliographic Coupling Co-Citation Cluster Producer Functions Small-Griffith Approach Dimension-Reduction Spectral Graph partitioning Centroid Scaling 18 June Authoritative Sources in a Hyperlinked Environment
Generalization Specific Queries Diffusion Concept Set of Hubs and Authorities can be Separated from each other Because: Query String has different Meaning like “Jaguar” Query String is a Highly Polarized Subject Like “Abortion” Query String can be Applied in Multiple Communities like “Randomized Algorithms” 18 June Authoritative Sources in a Hyperlinked Environment
Generaliztion Sample Results Query StringAuthoritiesDescription “Jaguar” 2nd non-principal vector, positive end otball/nfl/jax.html 3rd non-principal vector, positive end Official Jacksonville Jaguars NFL Website Jacksonville Jaguars Home Page Jaguar Cars Global Home Page The Jaguar Collection 18 June Authoritative Sources in a Hyperlinked Environment
Conclusion Basic Elements of Paper Approach Applying Notation of Authoritative Sources Selecting High Quality of Results Dealing with Scale Problem Exploring Structure of Hubs and Authorities 18 June Authoritative Sources in a Hyperlinked Environment
Evaluation of Pros and Cons Pros: Clearly Describe the Algorithms and Applied Approaches Provide Tangible Examples and Results Enough Connection to Related Works Cons: Ignoring the Textual Contents of pages Complexity in the Nature of Quality Judgment Concentrating mostly on Broad-Topic Queries 18 June Authoritative Sources in a Hyperlinked Environment
Q & A 18 June Authoritative Sources in a Hyperlinked Environment