Informetrics, Webometrics and Web Use metrics

Slides:



Advertisements
Similar presentations
Web indexing ICE0534 – Web-based Software Development July Seonah Lee.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented by Yongqiang Li Adapted from
Measuring Scholarly Communication on the Web Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Bibliometric Analysis.
Data Mining Chapter 5 Web Data Mining Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer June
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Link Structure and Web Mining Shuying Wang
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Information Retrieval
Hyperlinks and Scholarly Communication Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Methods Seminar, University.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presentation by Julian Zinn.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Using Hyperlink structure information for web search.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Chapter Chapter 3 Internet Agents. Chapter Contents Background Web Search Agents Information Filtering Agents Notification Agents Other Service.
Data Mining By Dave Maung.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Analysis of Link Structures on the World Wide Web and Classified Improvements Greg Nilsen University of Pittsburgh April 2003.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1.
Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
1 CS 430: Information Discovery Lecture 5 Ranking.
MARKO ZOVKO, ACCOUNT MANAGER STEPHEN SMITH, SOLUTIONS SPECIALIST JOURNALS & HIGHLY-CITED DATA IN INCITES V. OLD JOURNAL CITATION REPORTS. WHAT MORE AM.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Data mining in web applications
Information Retrieval in Practice
DATA MINING Introductory and Advanced Topics Part III – Web Mining
Clustering of Web pages
WEB SPAM.
Information Retrieval
HITS Hypertext-Induced Topic Selection
Greg Nilsen University of Pittsburgh April 2003
Text & Web Mining 9/22/2018.
Information Retrieval
Information retrieval and PageRank
Data Mining Chapter 6 Search Engines
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Web Mining Department of Computer Science and Engg.
Web Mining Research: A Survey
Chapter 31: Information Retrieval
Chapter 19: Information Retrieval
Presentation transcript:

Informetrics, Webometrics and Web Use metrics Huimin Lu 10/21/2004

Outline History Article 1: Bibliometrics & WWW Article 2: Bibliometrics of the WWW Article 3: Authoritative Sources Article 4: ParaSite Article 1: Bibliometrics and the World Wide Web Article 2: Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace Article 3: Authoritative Sources in a Hyperlinked Environment Article 4: ParaSite: Mining Structural Information on the Web Conclusion

History Term introduced by Pritchard in 1969. Pritchard’s explanation: “the application of mathematical and statistical methods to books and other media of communication”.

A1: Bibliometrics and the World Wide Web By Don Turnbull Bibliometrics Bibliometric laws Apply bibliometric to WWW Metrics design

A1: Bibliometrics Classic citation analysis Refined classic bibliometrics - Standard formula for impact: n journal citations / n citable articles published - Basic formula for immediacy index of influence: n citations received by article during the year / total number of citable articles published Bibliometric Coupling - Measure the number of references two papers have in common to test for similarity Cocitation Analysis - Measure the relations between cited documents Common Errors - multiple authors lost, self-citation, similar author names, human error, etc.

A1: Bibliometric Laws Bradford’s Law of Scattering Lotka’s Law - clustering method: Ran (n from 0; a<1), sum = R/(1-a) Lotka’s Law - inverse square Zipf’s Law - familiar words with high frequency (nth word: k/n times)

A1: Applying Bibliometric to Web Web surveys - Georgia Tech Graphics, Visualization, and Usability Web Surveys Web servers Add programming logic - Inaccurate data gathered: skip standard procedures, miss state information between usage hits, server hits themselves don’t represent true usage.

A1: Metrics Design Configure Web server to gather comprehensive metrics Manage log files - Enhence reliability: regular backup, store log file analysis results and logs, begin new logs timely, post results and log information for comparasion. - Log analysis tools: Analog, WWWStat, GetStats, Perl Scripts. - Standardization: Extended Log File Format by WWW Consortium Standards Committee Downie’s attempt analysis: user-based, request, byte-based Optimal Web content setup & External bibliometric gathering

Analysis of 30G Web pages collected by Inktomi “Web Crawler” A2: Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace By Ray R. Larson Analysis of 30G Web pages collected by Inktomi “Web Crawler” Cocitation analysis using DEC AltaVista search engine

A2: Growth and Usage of Web WWW

A2: Cocitation Analysis of Web Attempt: Map the intellectual structure of Web Question: Can cocitation techniques be applied to charting the contents of cyberspace?

A2: Methods Selection of core set of items for study Retrieval of cocitation frequency information Compilation of the raw cocitation frequency matrix Correlation analysis to convert the raw frequencies into correlation coefficients Multivariate analysis of the correlation matrix Interpretation of the resulting “map” and validation

A2: Results

A3: Authoritative Sources in a Hyperlinked Environment By Jon M. Kleinberg A new method for automatically extracting certain types of information about a hypermedia environment from its link structure.

A3: Goal Types of query search and problem - Specific queries: scarcity problem - Broad-topic queries: abundance problem - Similar-page queries Synthesize the unreliable information contained in the presence of individual links to provide a set of authoritative pages relevant to an initial query.

A3: Common Approaches Only S S -> T - Define S to be the top k pages indexed by AltaVista - Rank pages according to their in-degree S -> T - Define same root set S - Grow S to a larger base set T - Rank pages by their in-degree

A3: Their Approach Extract small core sets of community of hubs and authorities from T Authoritative pages - A novel type of quality measure of the document in hypermedia by algorithmic means. - Large in-degree & considerable overlap in sets of pages that point to them Hub Pages - have links to multiple relevant authoritative pages

A3: Algorithm and Output Method: Iteratively propagates “authority weight” and “hub weight” across links of the web graph, converging simultaneously to steady states for both types of weights Output: a pair of sets (X, Y) (X: a small set of authorities, Y: a small set of hubs) referred by authors as community of hubs and authorities Claim: authoritative pages can be identified as belonging to dense bipartite communities in the link graph of the WWW via their algorithm.

A4: ParaSite: Mining Structural Information on the Web By Ellen Spertus Varieties of link information on the Web How the web differs from conventional hypertext How the links can be exploited to build useful applications

A4: Classical Hypertext vs. Web - links don’t cross site even document boundaries - documents limited to a single topic - manual answers each question in exactly one place or in none - Hardly change Web - links can cross site and document boundaries - multiple topics permitted in one web page - an answer could appear any number of times on the web - constantly changing

A4: Mining Links Naïve Link Geometry Hypertext Links example - A useful technique for finding pages on a given set of topics Hypertext Links example - Categorized into upward, downward, crosswise, and outward Directory Links - Directory structure relation in pages in the absence of hypertext links Structure within a Page - Page can be considered a tree of nodes, each with attached text and links embedded in the text Other - Domain names, relationships between concepts represented by words and phrases, paths traveled through Web sites by visitors

A4: Application Finding Moved Pages Finding Related Pages - Exploiting hyperlinks - Exploiting directory links Finding Related Pages - Collaborative filtering - When searching for a related page with similar pages got, ParaSite can find the page (A) that has maximum links to the pages user got and return other pages referneced by A. A Person Finder

Conclusion World Wide Web information increase exponentially and Internet architecture turns to be more complicated. Applying bibliometrics to the Web will help us control and manage web information wisely.

Example of Hypertext Link Back to hypertext link