Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer 2010 18 June 20151.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Mining Web’s Link Structure Sushanth Rai University of Texas at Arlington
Our purpose Giving a query on the Web, how can we find the most authoritative (relevant) pages?
Authoritative Sources in a Hyperlinked environment Jon M. Kleinberg
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented by Yongqiang Li Adapted from
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Authoritative Sources in a Hyperlinked Environment By: Jon M. Kleinberg Presented by: Yemin Shi CS-572 June
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Advances & Link Analysis
Link Structure and Web Mining Shuying Wang
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Order Out of Chaos Analyzing the Link Structure of the Web for Directory Compilation and Search. Presented by Benjy Weinberger.
An Overview of Relevance Feedback, by Priyesh Sudra 1 An Overview of Relevance Feedback PRIYESH SUDRA.
Prestige (Seeley, 1949; Brin & Page, 1997; Kleinberg,1997) Use edge-weighted, directed graphs to model social networks Status/Prestige In-degree is a good.
Link Analysis HITS Algorithm PageRank Algorithm.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presentation by Julian Zinn.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Social Networking Algorithms related sections to read in Networked Life: 2.1,
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Link Analysis on the Web An Example: Broad-topic Queries Xin.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Chapter 6: Information Retrieval and Web Search
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
Algorithmic Detection of Semantic Similarity WWW 2005.
Analysis of Link Structures on the World Wide Web and Classified Improvements Greg Nilsen University of Pittsburgh April 2003.
Introduction to the Semantic Web and Linked Data
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
1 CS 430: Information Discovery Lecture 5 Ranking.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
HITS Hypertext-Induced Topic Selection
Greg Nilsen University of Pittsburgh April 2003
A Comparative Study of Link Analysis Algorithms
Lecture 22 SVD, Eigenvector, and Web Search
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Authoritative Sources in a Hyperlinked environment Jon M. Kleinberg
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Digital Libraries IS479 Ranking
Presentation transcript:

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer June 20151

Overview Why Do We Care? Introduction Information Objective Approaches and Observed Results Related Work Generalization Conclusion Evaluation of Pros and Cons 18 June Authoritative Sources in a Hyperlinked Environment

Why Do We care? Complexity of WWW as a Hypertext Corpus Nature of the Hyperlinked Environment Structure Efficiency (Longer Response Time) and Storage Problems Because of Huge Amount of Results Return to the User 18 June Authoritative Sources in a Hyperlinked Environment

Introduction Information Query Types  Specific E.g. ”Does Windows 7 Support Oracle 10g?” Scarcity Problem  Broad-Topic E.g. “Sql Programming Language ” Abundance Problem Authority Notion  Similar-Page  E.g. “Similar Pages to Oracle.com” 18 June Authoritative Sources in a Hyperlinked Environment

Introduction Information Link-Based Model  Encoding latent human judgment Conferred Authority  Creating Balance Between Popularity and Relevance  Relation Between Authority and Hubs 18 June Authoritative Sources in a Hyperlinked Environment

Objective Presenting the Link-Based Model for the Conferral Authority Exploring Authoritative WWW Sources in the Global Range 18 June Authoritative Sources in a Hyperlinked Environment

Approaches and Observed Results Focused Subgraph Algorithm for WWW Authorities and Hubs Computation Approach for Similar-Page Queries Sample Observed Results 18 June Authoritative Sources in a Hyperlinked Environment

Focused Subgraph Algorithm for WWW Inputs:  Query String σ  Text-based Search Engine Outputs:  Set of Hyperlinked Pages as a Directed Graph G(V,E)  Root Set Rσ  Sub Set Sσ Almost Small in size Containing Most of Relevant Pages Covering Most of the Strongest Authorities Links Type in G[Sσ]  Transverse  Intrinsic 18 June Authoritative Sources in a Hyperlinked Environment

Authorities and Hubs Computation Solution to the approach of Ordering Pages by Their In-degree  Confusion Between Strong “Authorities” and “Universally Popular“ Pages Containing Mutually Reinforcing Relationship Concept 18 June Authoritative Sources in a Hyperlinked Environment

Authorities and Hubs Computation Iterate Algorithm  Input: Set of n linked pages G σ  Outputs: Updated Authority Weight (thru operation I) Updated Hub Weight (thru Operation O) Filter Algorithm  Input: Set of n linked pages G σ  Outputs: Reporting Pages with Top c Authorities Reporting Pages with Top c Hubs 18 June Authoritative Sources in a Hyperlinked Environment

Approach for Similar-Page Queries First Step: What Do Users of the WWW Decide to be Related to a Page When They Create any Pages and Hyperlinks Second Step: Applying Link Structure to the Concept of “Similarity” Third Step: Using concept of Authorities and Hubs 18 June Authoritative Sources in a Hyperlinked Environment

Sample Observed Results ( For Broad-Specific Queries) Query StringAuthoritiesDescription “Search Engine” Yahoo! Excite Welcome to Magellan! Lycos Home Page AltaVista: Main Page “Gates” Bill g.htm Gates: The Road Ahead Welcome to Microsoft 18 June Authoritative Sources in a Hyperlinked Environment

Sample Observed Results (For Similar-Pages Queries) Query StringAuthoritiesDescription “ Welcome Honda Ford Motor Company BMW of North America, Inc. VOLVO Welcome to the Saturn Web Site NISSAN - ENJOY THE RIDE Audi Homepage 1997 Dodge Site Welcome to Chrysler 18 June Authoritative Sources in a Hyperlinked Environment

Related Work Link Structure is Related to: Definition of Standing, Impact and Influence Concepts WWW Ranking Techniques Data Clustering 18 June Authoritative Sources in a Hyperlinked Environment

Standing, Impact and Influence Concepts Social Network  Proposed Standing Measure Katz Theory: Based on Path-Counting Hubbell Theory : Based on Nodes Weight-Propagation Scientific Citations  Proposed Impact/Influence Measure Garfield’s Impact Theory Pinski-Narin Influence Theory 18 June Authoritative Sources in a Hyperlinked Environment

WWW Ranking TechniquesWWW Ranking Techniques Ranking Measure Proposal:  Botafogo-Rivlin-Shniderman Theory  Carriere-Kanzman Theory  Brin-Page Theory and Contrast with This Paper Approach 18 June Authoritative Sources in a Hyperlinked Environment

Data Clustering Clustering needs :  Similarity Functions Bibliographic Coupling Co-Citation  Cluster Producer Functions Small-Griffith Approach Dimension-Reduction Spectral Graph partitioning Centroid Scaling 18 June Authoritative Sources in a Hyperlinked Environment

Generalization Specific Queries  Diffusion Concept Set of Hubs and Authorities can be Separated from each other Because:  Query String has different Meaning like “Jaguar”  Query String is a Highly Polarized Subject Like “Abortion”  Query String can be Applied in Multiple Communities like “Randomized Algorithms” 18 June Authoritative Sources in a Hyperlinked Environment

Generaliztion Sample Results Query StringAuthoritiesDescription “Jaguar” 2nd non-principal vector, positive end otball/nfl/jax.html 3rd non-principal vector, positive end Official Jacksonville Jaguars NFL Website Jacksonville Jaguars Home Page Jaguar Cars Global Home Page The Jaguar Collection 18 June Authoritative Sources in a Hyperlinked Environment

Conclusion Basic Elements of Paper Approach  Applying Notation of Authoritative Sources  Selecting High Quality of Results  Dealing with Scale Problem  Exploring Structure of Hubs and Authorities 18 June Authoritative Sources in a Hyperlinked Environment

Evaluation of Pros and Cons Pros:  Clearly Describe the Algorithms and Applied Approaches  Provide Tangible Examples and Results  Enough Connection to Related Works Cons:  Ignoring the Textual Contents of pages  Complexity in the Nature of Quality Judgment  Concentrating mostly on Broad-Topic Queries 18 June Authoritative Sources in a Hyperlinked Environment

Q & A 18 June Authoritative Sources in a Hyperlinked Environment