An Overview of Link Analysis Techniques for Academic Web Sites Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK.

Slides:



Advertisements
Similar presentations
Access Part I Accessing Health Information Through the Internet.
Advertisements

Paris, May 2007 How good is the research base? New approaches to research indicators Colloque de l’Académie des sciences "Évolution des publications scientifiques"
1 Literacy PERKS Standard 1: Aligned Curriculum. 2 PERKS Essential Elements Academic Performance 1. Aligned Curriculum 2. Multiple Assessments 3. Instruction.
Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Disciplinary Differences in Selected Scholars' Twitter Transmissions Kim Holmberg 1 and Mike Thelwall 2 1 |
1 Search Engines What is the Internet? The Web is only part of the Internet The Internet is a computer network connecting millions of computers.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Informetrics Umeå Kim Holmberg Information Studies Åbo Akademi Åbo, Finland Supervisors: Dr. Gunilla Widén-Wulff Dr. Mike Thelwall
Information Retrieval in Practice
Search Engines and Information Retrieval
Scientific Web Intelligence The Birth of a New Research Field Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK.
Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK
Measuring Scholarly Communication on the Web Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Bibliometric Analysis.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Information Retrieval in Practice
Bibliometrics overview slides. Contents of this slide set Slides 2-5 Various definitions Slide 6 The context, bibliometrics as 1 tools to assess Slides.
© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.
Aims Correlation between ISI citation counts and either Google Scholar or Google Web/URL citation counts for articles in OA journals in eight disciplines.
Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
1 Scopus Update 15 Th Pan-Hellenic Academic Libraries Conference, November 3rd,2006 Patras, Greece Eduardo Ramos
- Hyperlink Analysis - Merton & Garfield vs. Malinowski & MacRoberts Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton,
PAGE # 1 Presented by Stacey Hancock Advised by Scott Urquhart Colorado State University Developing Learning Materials for Surface Water Monitoring.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Vocabulary Spectral Analysis as an Exploratory Tool for Scientific Web Intelligence Mike Thelwall Professor of Information Science University of Wolverhampton.
Patterns of International and National Web Inlinks to US University Departments Rong Tang Catholic University of America, USA Mike Thelwall University.
Information Retrieval
Analysing the link structures of the Web sites of national university systems Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton,
Methods for Exploiting Academic Hyperlinks Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK.
My Research, its Potential, and its Contribution to SCIT Mike Thelwall.
Hyperlinks and Scholarly Communication Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Methods Seminar, University.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Overview of Search Engines
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Citations and links as measures of effectiveness of online LIS journals Alastair G. Smith School of Information Management, Victoria University of Wellington.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Search Engines and Information Retrieval Chapter 1.
Experiences with a bibliometric indicator for performance-based funding of research institutions in Norway Gunnar Sivertsen Nordic Institute for Studies.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Does metadata count? A Webometric investigation Alastair G Smith School of Information Management Victoria University of Wellington New Zealand
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Google Scholar as a cybermetric tool Alastair G Smith Victoria University of Wellington New Zealand
Methods: Pointers for good practice Ensure that the method used is adequately described Use a multi-method approach and cross-check where possible - triangulation.
European Studies David Kereselidze European Studies Relatively new field, the origin of which was conditioned by the integration processes.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Data Mining By Dave Maung.
Chapter 6: Information Retrieval and Web Search
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Can scientific collaboration and excellence be measured by Web presence and Web links? Judit Bar-Ilan Bar-Ilan University and The Hebrew University of.
So, what’s the “point” to all of this?….
LEONARDO TRANSFER OF INNOVATION PROJECT MEDIA TECH: The future of media industry using innovative technologies No. LLP-LdV-ToI-11-CY Kick-off meeting:
Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview.
Reference Collections: Collection Characteristics.
Building a Multi-Year Database of AAG Conference Abstracts André Skupin /Shujing Shu Dept. of Geography / Dept. of Computer Science University of New Orleans.
Patterns and processes of the ecosystems of the northern mid-Atlantic MAR-ECO Public Outreach modules film documentary, other video material communication.
MEASURING RESEARCHERS: FROM RESEARCH PUBLICATIONS AND COMMUNICATION TO RESEARCH EVALUATION Lucie Vavříková 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Preparing to teach OCR GCSE (9-1) Geography B (Geography for Enquiring Minds) Planning, constructing and introducing your new course.
Information Retrieval in Practice
Evaluation Anisio Lacerda.
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Searching for and Accessing Information
Levelled Assessment Success Criteria
IL Step 3: Using Bibliographic Databases
International Legal Research
EERQI Innovative Indicators and Test Results
Presentation transcript:

An Overview of Link Analysis Techniques for Academic Web Sites Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK. Funded by the European Union WISER Project - (Web indicators for scientific, technological and innovation research,

Contents 1. Data collection 2. Data processing 3. Analysis 4. Results

Why analyse university link structures? Analogies with citation studies Ensure that the Web is efficiently used for research communication Identify trends in informal scholarly communication Suggest improvements in search tools Exploratory research: the Web is important and a valid object for scientific study

Methodologies: Data collection Web crawler Google Does not support adequate level of Boolean querying AllTheWeb advanced queries AltaVista advanced queries host:wlv.ac.uk AND link:edu.cn (results of this query are on the next page…)

host:wlv.ac.uk AND link:edu.cn

YUNNAN AGRICULTURAL UNIVERSITY

Shanghai University Dalian University of Foreign Languages

Methodologies: Data processing 1 Link counts to target universities Inter-site links only Colink counts B and C are colinked Couplings D and E are coupled BC A DE F

Methodologies: Data processing 2 Alternative Document Models E.g. count links between domains (ignoring multiple links) instead of pages P1 P2 P3 P4 P5 P6

Methodologies: Data analysis Statistical techniques for evaluating results Correlation with known research performance measures Factor analysis, Multi-Dimensional Scaling, Cluster analysis for patterns Simple graphical techniques Techniques from Communication Networks research / Geography

Results section 1 – Patterns of links between university Web sites

Results 1: Links associate with research Counts of links to universities within a country can correlate significantly with measures of research productivity

Links to UK universities counted by domain

Results 2: Links between universities in a country can be related to geography

Results 3: Universities cluster by geographic region This is clearest for Scotland but also for other groupings, including Manchester- based universities Coherent clusters are difficult to extract because of overlapping trends

A pathfinder network of UK university interlinking with geographic clusters indicated

Results section 2: Links and subject areas

Results 4: Links to departments associate with research In the US, links to chemistry and psychology departments from other departments associate with total research impact No evidence of a significant geographic trend Disciplinary differences in the extent of interlinking: history Web use is very low {Research with Rong Tang}

Results 5: Links for precision, colinks and couplings for recall For the UK academic Web, about 42% of domains connected by links alone are similar, and about 43% connected by links, colinks and couplings But over 100 times more domains are colinked or coupled than are directly linked Colinks and couplings can help the task of finding additional subject-based pages

Results 6: Most links are only loosely related to research A random sample of links between UK university sites revealed over 90% had some connection with scholarly activity, including teaching and research. Less than 1% were equivalent to citations

Results section 3: International academic links

Results 7: Linguistic factors in EU communication English the dominant language for Web sites in the Western EU In a typical country, 50% of pages are in the national language(s) and 50% in English Non-English speaking extensively interlink in English {Research with Rong Tang}

Results 8: Can map patterns of international communication Counts of links between Asia- Pacific universities are represented by arrow thickness. {Research with Alastair Smith, VUW, NZ}

The future Results of research leading into: Improved Web-related policy making Improved Web information retrieval algorithms Improved understanding of informal scholarly communication on the Web More effective use of the Web by scholars, e.g. via PhD training