Analysing the link structures of the Web sites of national university systems Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK
Why analyse university link structures? Obtain evidence of online impact of work Identify trends in informal scholarly communication Basic research: the Web is important and a valid object for scientific study
Methodology: Data collection Web crawler AltaVista advanced queries host:york.ac.uk AND link:wlv.ac.uk AllTheWeb advanced queries Google Does not support same level of Boolean querying
Methodology: Data analysis 1 Link counts to target universities Inter-site links only Colink counts
Methodology: Data analysis 2 Alternative Document Models Aggregate Web pages into documents based upon site, domains or directories for link counting Can’t be done easily from search engine data Produces better results in some situations than simple link counting
Methodology: Data analysis 3 Statistical techniques for evaluating results Correlation with known research performance measures Factor analysis, Multi-Dimensional Scaling, Cluster analysis for patterns Techniques from Communication Networks research
Methodology: Data analysis 4 Simple graphical techniques Display linkages above a certain threshold Community identification techniques from computer science
Results 1: Links associate with research Counts of links to UK, Australian, Taiwanese universities correlate significantly with measures of research productivity Counts of links in China appear not to Results are better with ADMs for the UK but not Taiwan
Results 2: Most links are only loosely related to research A random sample of links between UK university sites revealed over 90% had some connection with scholarly activity, including teaching and research. Less than 1% were equivalent to citations
Results 3: Links are related to geography Interlinking between universities in the UK decreases with geographic distance
Results 4: Universities cluster by geographic region This is clearest for Scotland but also for other groupings, including Manchester- based universities Coherent clusters are difficult to extract because of overlapping trends
Results 5: Linguistic factors in EU communication English the dominant language for Web sites in the Western EU In a typical country, 50% of pages are in the national language(s) and 50% in English Non-English speaking extensively interlink in English {Research with Rong Tang, SUNY Albany}
Results 6: Power laws in the Web Academic Webs have a topology dominated by power laws, including Inlink counts Outlink counts Directed component sizes Undirected component sizes
Results 6: Power laws in the Web
Results 7: Academic Web Topology
Criticism What do the statistics mean? A variety of factors influence link creation, mainly informal About 90% of inter-site links have some connection to research Links an informal scholarly communication soup, from which patterns can be sieved out
The future Results of research leading into: improved Web-related policy making in the EU Improved Web information retrieval algorithms Improved understanding of informal scholarly communication on the Web It is easy to get some statistics, but very hard to get meaningful statistics