Presentation is loading. Please wait.

Presentation is loading. Please wait.

Informetrics, Webometrics and Web Use metrics

Similar presentations


Presentation on theme: "Informetrics, Webometrics and Web Use metrics"— Presentation transcript:

1 Informetrics, Webometrics and Web Use metrics
Huimin Lu 10/21/2004

2 Outline History Article 1: Bibliometrics & WWW
Article 2: Bibliometrics of the WWW Article 3: Authoritative Sources Article 4: ParaSite Article 1: Bibliometrics and the World Wide Web Article 2: Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace Article 3: Authoritative Sources in a Hyperlinked Environment Article 4: ParaSite: Mining Structural Information on the Web Conclusion

3 History Term introduced by Pritchard in 1969.
Pritchard’s explanation: “the application of mathematical and statistical methods to books and other media of communication”.

4 A1: Bibliometrics and the World Wide Web
By Don Turnbull Bibliometrics Bibliometric laws Apply bibliometric to WWW Metrics design

5 A1: Bibliometrics Classic citation analysis
Refined classic bibliometrics - Standard formula for impact: n journal citations / n citable articles published - Basic formula for immediacy index of influence: n citations received by article during the year / total number of citable articles published Bibliometric Coupling - Measure the number of references two papers have in common to test for similarity Cocitation Analysis - Measure the relations between cited documents Common Errors - multiple authors lost, self-citation, similar author names, human error, etc.

6 A1: Bibliometric Laws Bradford’s Law of Scattering Lotka’s Law
- clustering method: Ran (n from 0; a<1), sum = R/(1-a) Lotka’s Law - inverse square Zipf’s Law - familiar words with high frequency (nth word: k/n times)

7 A1: Applying Bibliometric to Web
Web surveys - Georgia Tech Graphics, Visualization, and Usability Web Surveys Web servers Add programming logic - Inaccurate data gathered: skip standard procedures, miss state information between usage hits, server hits themselves don’t represent true usage.

8 A1: Metrics Design Configure Web server to gather comprehensive metrics Manage log files - Enhence reliability: regular backup, store log file analysis results and logs, begin new logs timely, post results and log information for comparasion. - Log analysis tools: Analog, WWWStat, GetStats, Perl Scripts. - Standardization: Extended Log File Format by WWW Consortium Standards Committee Downie’s attempt analysis: user-based, request, byte-based Optimal Web content setup & External bibliometric gathering

9 Analysis of 30G Web pages collected by Inktomi “Web Crawler”
A2: Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace By Ray R. Larson Analysis of 30G Web pages collected by Inktomi “Web Crawler” Cocitation analysis using DEC AltaVista search engine

10 A2: Growth and Usage of Web
WWW

11 A2: Cocitation Analysis of Web
Attempt: Map the intellectual structure of Web Question: Can cocitation techniques be applied to charting the contents of cyberspace?

12 A2: Methods Selection of core set of items for study
Retrieval of cocitation frequency information Compilation of the raw cocitation frequency matrix Correlation analysis to convert the raw frequencies into correlation coefficients Multivariate analysis of the correlation matrix Interpretation of the resulting “map” and validation

13 A2: Results

14 A3: Authoritative Sources in a Hyperlinked Environment
By Jon M. Kleinberg A new method for automatically extracting certain types of information about a hypermedia environment from its link structure.

15 A3: Goal Types of query search and problem
- Specific queries: scarcity problem - Broad-topic queries: abundance problem - Similar-page queries Synthesize the unreliable information contained in the presence of individual links to provide a set of authoritative pages relevant to an initial query.

16 A3: Common Approaches Only S S -> T
- Define S to be the top k pages indexed by AltaVista - Rank pages according to their in-degree S -> T - Define same root set S - Grow S to a larger base set T - Rank pages by their in-degree

17 A3: Their Approach Extract small core sets of community of hubs and authorities from T Authoritative pages - A novel type of quality measure of the document in hypermedia by algorithmic means. - Large in-degree & considerable overlap in sets of pages that point to them Hub Pages - have links to multiple relevant authoritative pages

18 A3: Algorithm and Output
Method: Iteratively propagates “authority weight” and “hub weight” across links of the web graph, converging simultaneously to steady states for both types of weights Output: a pair of sets (X, Y) (X: a small set of authorities, Y: a small set of hubs) referred by authors as community of hubs and authorities Claim: authoritative pages can be identified as belonging to dense bipartite communities in the link graph of the WWW via their algorithm.

19 A4: ParaSite: Mining Structural Information on the Web
By Ellen Spertus Varieties of link information on the Web How the web differs from conventional hypertext How the links can be exploited to build useful applications

20 A4: Classical Hypertext vs. Web
- links don’t cross site even document boundaries - documents limited to a single topic - manual answers each question in exactly one place or in none - Hardly change Web - links can cross site and document boundaries - multiple topics permitted in one web page - an answer could appear any number of times on the web - constantly changing

21 A4: Mining Links Naïve Link Geometry Hypertext Links example
- A useful technique for finding pages on a given set of topics Hypertext Links example - Categorized into upward, downward, crosswise, and outward Directory Links - Directory structure relation in pages in the absence of hypertext links Structure within a Page - Page can be considered a tree of nodes, each with attached text and links embedded in the text Other - Domain names, relationships between concepts represented by words and phrases, paths traveled through Web sites by visitors

22 A4: Application Finding Moved Pages Finding Related Pages
- Exploiting hyperlinks - Exploiting directory links Finding Related Pages - Collaborative filtering - When searching for a related page with similar pages got, ParaSite can find the page (A) that has maximum links to the pages user got and return other pages referneced by A. A Person Finder

23 Conclusion World Wide Web information increase exponentially and Internet architecture turns to be more complicated. Applying bibliometrics to the Web will help us control and manage web information wisely.

24 Example of Hypertext Link
Back to hypertext link


Download ppt "Informetrics, Webometrics and Web Use metrics"

Similar presentations


Ads by Google