Download presentation
Presentation is loading. Please wait.
Published byDarren Terry Modified over 7 years ago
1
The Range of Webometrics: Forms of Digital Social Utility as Tools
Professor Peter Ingwersen, Ph.D. Information Interaction & Information Architecture Royal School of LIS, Denmark -
2
Table of Contents Webometrics – Cybermetrics – a framework
Link topology & structural conceptions in webometrics Overview of potentials Search engine analysis Link analyses – Web Impact Factor (Web-IF) Dataset Usage Indicators Web mining – Trend analyses (blog contents) Concluding remarks 2010 Åbo Ingwersen
3
Webometrics The study of quantitative aspects of the construction and use of information resources, structures and technologies on the Web, drawing on bibliometric and informetric methods search engine performance link structures, e.g., WIFs, cohesiveness of link topologies, etc. users’ information behaviour (searching, browsing, etc.) web page contents – knowledge mining – blog trends Dataset analyses & impact cybermetrics: quantitative studies of the whole Internet i.e. chat, mailing lists, news groups, MUDs, etc. - and WWW 2010 Åbo Ingwersen Lennart Björneborn 2001
4
infor-/biblio-/sciento-/cyber-/webo-/metrics
L. Björneborn & P. Ingwersen 2003 infor-/biblio-/sciento-/cyber-/webo-/metrics informetrics bibliometrics scientometrics cybermetrics webometrics 2010 Åbo Ingwersen
5
corona model (Björneborn 2004)
SCC Strongest Connected Component IN traversable to SCC OUT reachable from SCC I constructed this corona model to show how the over 7000 subsites were located in different graph components. The core in the model is the Strongly Connected Component in which any pair of subsites can reach each other by directed link paths. I won’t describe the other components here. In the project I focused on the Strongly Connected Component. The corona model is modified after the bow-tie model developed by American scientists. [click!] I modified the model because the bow-tie does not show that many links actually point directly from the IN to the OUT component. --- The IN component consists of subsites that can reach the SCC through link paths but cannot in turn be reached from the SCC. Correspondingly, subsites in the OUT component can be reached from the SCC but cannot reach back. Subsites in the Tendrils and Tube are connected with the IN and OUT components but cannot reach to the SCC or be reached from the SCC. The remaining Disconnected component are not connected in any way with the main corona. Disconnected IN-Tendrils connected from IN Tube connecting IN to OUT OUT-Tendrils connected to OUT 2010 Åbo Ingwersen
6
Source: www.cybergeography.org
2010 Åbo Ingwersen
7
Link terminology basic concepts
L. Björneborn & P. Ingwersen 2003 B has an outlink to C; outlinking : ~ reference B has an inlink from A; inlinked : ~ citation B has a selflink; selflinking : ~ self-citation A has no inlinks; non-linked: ~ non-cited E and F are reciprocally linked A is transitively linked with H via B – D H is reachable from A by a directed link path A has a transversal link to G : short cut C and D are co-linked from B, i.e. have co-inlinks or shared inlinks: co-citation B and E are co-linking to D, i.e. have co-out-links or shared outlinks: bibliog.coupling A B E G C D F H co-links 2010 Åbo Ingwersen
8
Levels of web nodes Lennart Björneborn 2002
c Levels of web nodes Lennart Björneborn 2002 3 basic levels of web nodes: pages , sites, TLDs different levels of selflinks and outlinks a = page selflink b = page outlink and site selflink c = site outlink and TLD selflink d = TLD outlink more levels: frames (page sections), sub-sites, sub-TLDs ... 2010 Åbo Ingwersen
9
Search engine analyses
See e.g. Judith Bar-Ilan’s excellent longitudinal analyses Mike Thelwall et al. in several case studies Scientific material on the Web: Lawrence & Giles (1999): approx. 6 % of Web sites contains scientific or educational contents Increasingly: the Web is a web of uncertainty Allen et al. (1999) – biology topics from 500 Web sites assessed for quality: 46 % of sites were ”informative” – but: 10-35 % inaccurate; % misleading 48 % unreferenced 2010 Åbo Ingwersen
10
2010 Åbo Ingwersen
11
2010 Åbo Ingwersen
12
Ingwersen Knoxville 2010 12
13
Possible types of Web-IF:
E-journal Web-IF Calculated by in-links Calculated as traditional JIF (citations) Scientific web site – IF (by link analyses) National – regional (some URL-problems in TDL) Institutions – single sites Other entities, e.g. domains Best nominator: no. of staff, beds – or simply use external inlinks (Thelwall et al., 2002) Blog IF: no. of external inlinks / blog entries Twitter IF: no of external inlinks / twitter entries (Holmberg, 2009) Ingwersen Knoxville 2010 13
14
The only valid webometric tool: Site Explorer Yahoo Search …
If one enters (old valid) commands like: Link:URL or Domain: topdomain (edu, dk) or Site:URL you are transferred to: Or find it via this URL The same facilities are available in click-mode, as one starts with a given URL: Finding ‘all’ web pages in a site Finding ‘all’ inlinks to that site/those pages Also without selflinks! – this implies … Ingwersen 2010 Åbo
15
… to calculate Web Impact Factors
But one should be prudent in interpretations. Note that external inlinks is the best indicator of recognition (see sample) Take care of how many sub-domains (and pages) that are included in the click analysis. Results can be downloaded Ingwersen 2010 Åbo
16
Consequences for Yahoo Site Expl.
Take care on which domain-level you are: does not contain sub-domains like maps.yahoo.com – only those below its name directly. Yahoo.com will thus contain maps… Also beware of the path structure Minor tests show that probably the inlink no. really implies inlinks – not inlinking web pages. Ingwersen 16 2010 Åbo
17
Search sample: www.db.dk/pi
Ingwersen 2010 Åbo
18
Without selflinks … Ingwersen 2010 Åbo
19
The Web-Impact Factor Ingwersen, 1998
Intuitively (naively?) believed as similar to the Journal Impact Factor Demonstrate recognition by other web sites - or simply impact – not necessarily quality Central issue: are web sites similar to journals and web pages similar to articles? Are in-links similar to citations – or simply road signs? What is really calculated? DEFINE WHAT YOU ARE CALCULATING: site or page IF 2010 Åbo Ingwersen
20
Web-links like citations?
Kleinberg (1998) between citation weights and Google’s PageRank: Hubs ~ review article: have many outlinks (refs) to: Authority pages ~ influential (highly cited) documents: have many inlinks from Hubs! Typical: Web index pages = homepage with self-inlinks = Table of contents 2010 Åbo Ingwersen
21
Reasons for outlinking …
Out-links mainly for functional purposes Navigation – interest spaces… Pointing to authority in certain domains? (Latour: rhetoric reasons for references-links) Normative reasons for linking? (Merton) Do we have negative links? We do have non-linking (commercial sites) 2010 Åbo Ingwersen
22
Some additional reasons for providing links
In part analogous to providing references (recognition) And, among others, emphasising the own position and relationship (professional, collaboration, self-presentation etc.) sharing knowledge, experience, associations … acknowledging support, sponsorship, assistance providing information for various purposes (commercial, scientific, education, entertainment) drawing attention to questions of individual or common interest and to information provided by others (the navigational purpose) 2010 Åbo Ingwersen
23
Other differences between references, citations & links
The time issue: Aging of sources are different on the Web: Birth, Maturity & Obsolescence happens faster Decline & Death of sources occur too– but Mariages – Divorse – Re-mariage – Death & Resurrection … & alike liberal phenomena are found on the Web! (Wolfgang Glänzel) 2010 Åbo Ingwersen
24
Dataset usage indicators: a novel webometric approach
Biodiversity datasets are: Searchable Downloadable … in Open access See e.g. GBIF website and 2009 publication: Vishwas S Chavan and Peter Ingwersen, BMC Bioinformatics, 2009, 10(Suppl 14):S2 2010 Åbo Ingwersen
25
Example: Denmark – GBIF dataset providers
DanBioInfoFacility – many datasets HerbariumUA: only two datasets Comparable US dataset provider: OBIS – Ocean Bio Info System Ingwersen 2010 Åbo
26
Data Capture Procedure
Go to: Click on ’Datasets’ (top right on page) ’Data Providers’ in alphabetical order. – Select, e.g. OBIS You will get to Below map you observe OBIS’ 181 datasets. Below that is a link: ’View Event log for Ocean …’ – Click on link On log console select: Datasets: ALL Events: Usage - ALL -– 3. Level: ALL Start date: 1st Jan End date: 31st Jan 2010 Click on ’REFRESH’ 2010 Åbo Ingwersen
27
Data capture cont. You obtain the logs of usage of all OBIS datasets
One may ’Download these logs’ From the logs one may obtain all the necessary data to create the Dataset Usage Indicators through Excell by data import. 2010 Åbo Ingwersen
28
DanBIF distribution of datasets – sample selection sorted by Search Events
Ingwersen, P. & Vishwas, C. (under review): INDICATORS FOR A DATA USAGE INDEX: AN INCENTIVE FOR PUBLISHING PRIMARY BIODIVERSITY DATA THROUGH A GLOBAL INFORMATION INFRASTRUCTURE. BMC Bioinformatics. Ingwersen 2010 Åbo
29
Sample of Dataset Usage Indicators (DUI)
2010 Åbo Ingwersen
30
Issue tracking – Web mining
Adequate sampling requires knowledge of the structure and properties of the population- the Web space to be sampled Issue tracking of known properties / issues may help Web mining the unknown is more difficult, due to the dynamic, distributed & diverse nature the variety of actors and minimum of standards the lack of quality control of contents Web archeology – study of the past Web 2010 Åbo Ingwersen
31
Nielsen Blog Pulse – social utility indicator
Observes blogs worldwide by providing: Trend search – development over time of terms/concepts – user selection! Featured trends – predefined categories Coversation tracker – blog conversations BlogPulse profiles – blog profiles Look into: 2010 Åbo Ingwersen
32
Home > Tools Trend Search
2010 Åbo Ingwersen
33
Informetric methods useful
Co-occurrence analyses (terms; names…) Co-link and co-linking analyses Bradford-like (skewed) distributions of links probably found in sectors of web space In order to define the strong ties…between top frequency web objects in two sectors of topical difference Weak (low frequency) ties – Small-Worlds – Serendipity between objects in the two sectors: UNEXPECTED relations may occur 2010 Åbo Ingwersen
34
Source: www.cybergeography.org
Weak Tie! 2010 Åbo Ingwersen
35
Concluding remarks One may be somewhat cautious on Web-IF applications without careful sampling via robots due to its incomprehensiveness and what it actually signifies One might also try to investigate more the behavioural aspects of providing and receiving links to understand what the impact might mean and how/why links are made Better to understand the Web space information structure Design workable robots, downloading & local analyses Move into the social media and open access genres with social utility indicators 2010 Åbo Ingwersen
36
Concluding remarks - 2 Issue tracking and web mining are: applications of Web IR / Webometrics Combined IR and informetric methods seem promissing: co-occurrence analyses – mapping - clustering co-links and co-linking - transversal links Knowledge discovery and use in diversified web spaces 2010 Åbo Ingwersen
37
Additional References
Adamic, L. (1999). The small world Web. Lecture Notes in Computer Science, 1696: Almind, T.C. And Ingwersen, P. Informetric analyses on the World Wide Web: Methodological approaches to ”Webometrics”. Journal of Documentation, 53 (1997), Björneborn, L. (2001). Small-world linkage and co-linkage. Proceedings of the 12th ACM Conference on Hypertext, pp Björneborn, L. & Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1): Björneborn, L. & Ingwersen, P. (2004). Towards a basic framework of webometrics. (submitted) Broder, A. et al. (2000). Graph structure in the Web. Computer Networks, 33(1-6): Chakrabarti, S. et al. (1999). Mining the Web’s link structure. IEEE Computer, 32(8): Chavan, Vishwas S. & Ingwersen, P. (2009). Towards a data publishing framework for primary biodiversity data. BMC Bioinformatics, 10(Supp. 14): S2 Granovetter, M.S. (1973). The strength of weak ties. American Journal of Sociology, 78(6): 2010 Åbo Ingwersen
38
References -2 Ingwersen, P. The calculation of Web Impact Factors. Journal of Documentation, 54 (1998), Kousha, K. & Thelwall, M. (2007). How is Science cited on the Web? A classification of Google unique Web citations. Journal of American Society for Information Science and Technology, 58(11): Matthews, R. (1998). Six degrees of separation. New Scientist, June 6. Newman, M.E.J. (2001). The structure of scientific collaboration networks. PNAS, 98(2): Rousseau, R. Daily time series of common single word searches in AltaVista and Northern Light. Cybermetrics, 2/3, paper 2. ISSN: ( Small, H. (1999). A passage through science: crossing disciplinary boundaries. Library Trends, 48(1): Swanson, D.R. (1986). Undiscovered public knowledge. Library Quarterly, 56(2): Thelwall, M. Web impact factors and search engine coverage. Journal of Documentation, 56 (2000), Watts, D. J. & Strogatz, S.H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393 (June 4): 2010 Åbo Ingwersen
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.