Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Slides:



Advertisements
Similar presentations
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
Advertisements

The added value information service that focuses on the European Union, the countries of Europe, and on the issues of concern to citizens, stakeholders.
Victorian Curriculum and Assessment Authority
Disciplinary Differences in Selected Scholars' Twitter Transmissions Kim Holmberg 1 and Mike Thelwall 2 1 |
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
BIBLIOMETRICS Presented by Asha. P Research Scholar DOS in Library and Information Science Research supervisor Dr.Y.Venkatesha Associate professor DOS.
Informetrics Umeå Kim Holmberg Information Studies Åbo Akademi Åbo, Finland Supervisors: Dr. Gunilla Widén-Wulff Dr. Mike Thelwall
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Scientific Web Intelligence The Birth of a New Research Field Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK.
Measuring Scholarly Communication on the Web Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Bibliometric Analysis.
Aims Correlation between ISI citation counts and either Google Scholar or Google Web/URL citation counts for articles in OA journals in eight disciplines.
Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,
An Overview of Link Analysis Techniques for Academic Web Sites Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK.
- Hyperlink Analysis - Merton & Garfield vs. Malinowski & MacRoberts Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton,
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Vocabulary Spectral Analysis as an Exploratory Tool for Scientific Web Intelligence Mike Thelwall Professor of Information Science University of Wolverhampton.
Patterns of International and National Web Inlinks to US University Departments Rong Tang Catholic University of America, USA Mike Thelwall University.
Analysing the link structures of the Web sites of national university systems Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton,
Methods for Exploiting Academic Hyperlinks Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK.
My Research, its Potential, and its Contribution to SCIT Mike Thelwall.
Hyperlinks and Scholarly Communication Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Methods Seminar, University.
Employers’ Expectation for Entry-Level Catalog Librarians: What Position Announcement Data Indicate.
Publishing strategies – A seminar about the scientific publishing landscape Peter Sjögårde, Bibliometric analyst KTH Royal Institute of Technology, ECE.
Overview of Search Engines
Literature Search Techniques 2 Strategic searching In this lecture you will learn: 1. The function of a literature search 2. The structure of academic.
How to Critically Review an Article
Educator’s Guide Using Instructables With Your Students.
Disciplinary boundaries and heterogeneity of sciences Catherine Laurent ( UWC 5-6 november 2007)
BECTa ICT Research Conference – June 2002 Intro  Survey Details  Secondary Surveys conducted July 2000 and June/July 2001  Sponsored by Fischer Family.
Bibliometrics toolkit: ISI products Website: Last edited: 11 Mar 2011 Thomson Reuters ISI product set is the market leader for.
Designing and implementing of the NQF Tempus Project N° TEMPUS-2008-SE-SMHES ( )
Literature Review Evaluating Existing Research
Writing research proposal/synopsis
Digging Deep for Hidden Information in the Web Part 1: Automated blog analysis Part 2: Automated hyperlink analysis.
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
SEARCH ENGINES Jaime Ma, Vancy Truong & Victoria Fry.
Google Scholar as a cybermetric tool Alastair G Smith Victoria University of Wellington New Zealand
Methods: Pointers for good practice Ensure that the method used is adequately described Use a multi-method approach and cross-check where possible - triangulation.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
 Remember, it is important that you should not believe everything you read.  Moreover, you should be able to reject or accept information based on the.
European Studies David Kereselidze European Studies Relatively new field, the origin of which was conditioned by the integration processes.
Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
Citation Searching with Web of Knowledge Roger Mills Catherine Dockerty OULS Bio- and Environmental.
LITERATURE REVIEW  A GENERAL GUIDE  MAIN SOURCE  HART, C. (1998), DOING A LITERATURE REVIEW: RELEASING THE SOCIAL SCIENCE RESEARCH IMAGINATION.
Project Thesis 2006 Adapted from Flor Siperstein Lecture 2004 Class CLASS Project Thesis (Fundamental Research Tools)
Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Knowledge building in the 21 st century at The Geelong College: Information-to-Knowledge Continuum “As we increasingly move toward an environment of instant.
Common Core State Standards in English/Language Arts What science teachers need to know.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Irakli Garibashvili Director, National Scientific Library in Georgia.
Introduction My class is a 7 th grade Science class which consist of 20 students total, 11 females-9 males, 4students are special needs and.
1 Prepared by: Laila al-Hasan. 1. Definition of research 2. Characteristics of research 3. Types of research 4. Objectives 5. Inquiry mode 2 Prepared.
Data Management: Data Analysis Types of Data Analysis at USGS There are several ways to classify Data Analysis activities at USGS, and here are some of.
1 RESEARCHING USING ONLINE SOURCES _____________________________ A Guide to Searching for and Evaluating Web Pages on the Internet.
Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.
The New Illinois Learning Standards
Demonstrating Scholarly Impact: Metrics, Tools and Trends
Bibliometrics toolkit: Thomson Reuters products
The New Illinois Learning Standards
Hyperlinks in academia: some stylised facts and a first attempt at model development by Franz Barjak, University of Applied Sciences Northwestern Switzerland.
CSCD 506 Research Methods for Computer Science
Internet Basics and Information Literacy
Citation Searching with Web of Knowledge
Research Proposal and Report
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Presentation transcript:

Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Link Analysis Manifesto Links are: A wonderful new source of information about relationships between people, organisations and information An easy to collect data source But: Results should be interpreted with care

Talk Structure Part 1: Academic link analysis –mainly from an information science perspective Part 2: Software demonstration Part 3: A social science link analysis methodology

Link Analysis: Motivation Individual hyperlinks reflect concrete creation reasons such as connections between web page contents or creators Counts of large numbers of hyperlinks may reflect wider underlying social processes Links may reflect phenomena that have previously been difficult to study, opening up new research areas E.g. informal scholarly communication

Part 1: Academic Hyperlink Analysis To map patterns of communication between researchers in a country based upon university web sites Patterns of communication are also mapped based upon journal citations or journal title words Provides useful information about the structure and evolution of research fields Can identify previously unknown field connections Web analysis could illustrate wider and more current patterns

Data Collection Web crawler AltaVista advanced queries, e.g. Links from Wolves Uni. to Oxford Uni. domain:wlv.ac.uk AND linkdomain:ox.ac.uk Google link queries Find links to specific URLs, e.g. links to the Institute home page link:

Types of link count Direct link counts Inter-site links only Co-inlink counts B and C are co-inlinked Co-outlink counts D and E are co-outlinked BC A DE F

Alternative Document Models A method to ignore multiple similar links E.g., domain ADM: count links between domains instead of pages P1 P2 P3 P4 P5 P6

Some Inter-University Hyperlink Patterns Mainly for the UK and Europe

Citation-Style Hyperlink Analysis Citation counts are known to be reasonable indicators of research quality but is the same true for inlink counts? Counts of links to universities within a country can correlate significantly with measures of research productivity The significance of this result is in giving ‘permission’ to investigate the use of inter-university links for researching scholarly communication

Most links are only loosely related to research 90% of links between UK university sites have some connection with scholarly activity, including teaching and research But less than 1% are equivalent to citations So link counts do not measure research dissemination but are more a natural by-product of scholarly activity Cannot use link counts to assess research Can use link counts to track an aspect of communication

Links to UK universities against their research productivity The reason for the strong correlation is the quantity of Web publication, not its quality This is different to citation analysis

Universities tend to link to neighbours

Universities cluster geographically

Language is a factor in international interlinking English the dominant language for Web sites in the Western EU In a typical country, 50% of pages are in the national language(s) and 50% in English Non-English speaking extensively interlink in English {Research with Rong Tang & Liz Price}

Can map patterns of international communication Counts of links between EU universities in Swedish are represented by arrow thickness.

Counts of links between EU universities in French are represented by arrow thickness.

Which language???

Linking patterns vary enormously by discipline No evidence of a significant geographic trend Disciplinary differences in the extent of interlinking: e.g., history Web use is very low, Chemistry is very high Individual research projects can have an enormous impact upon individual departments E.g. Arts web sites are often for specific exhibitions or for digital media projects Links not frequent enough to reliably reveal patterns of interdiscipliniarity

The next slide is a (Kamada-Kawai) network of the interlinking of the “top” 5 universities in AEAN countries (Asia and Europe) with arrows representing at least 100 links and universities not connected removed. (Research with Han Woo Park)

Clustering using links

Background: Power laws in Academic Webs Academic Webs have a topology dominated by power laws, including Counts of links to pages (inlink counts) Counts of links to pages (outlink counts) Groups of interconnected pages Power laws mean that Link creation obeys the ‘rich get richer’ law “Communities” of pages or sites are rarely pure but tend to multiply overlap

Page Outlinks

Topological component sizes: “pure link communities”

Community Identification Algorithm: “Impure communities” Can apply to pages, directories and domains Gives complimentary results: a “layered approach”

Stretching links further: co- inlinks, co-outlinks More interlinked does not imply more similar For the UK academic Web, about 42% of domains connected by links alone host similar disciplines, and about 43% connected by links, co-inlinks and co- outlinks Can use any type of link to look for similar sites Over 100 times more domains are co-inlinked or co- outlinked than are directly linked Links in any form are less than 50% reliable as indicators of subject similarity

Summary Studies of the relatively restricted subdomain of university web sites Produce direct research results For Web Information Retrieval (e.g. search engines), they also Help refine methodologies Help build intuition about web structure

Part 2: Software Demonstration SocSciBot Web crawler for social sciences research SocSciBot Tools Link analyser for SocSciBot data Cyclist Search engine with some corpus linguistics capability (e.g. word frequency lists for each site)

Part 3: A General Social Science Link Analysis Methodology A general framework for using link counts in social sciences research For research into link creation or Together with other sources, for research into other online or offline phenomena Applicable when there are enough links relevant to the research question to count For collections of large web sites or For large collections of small web sites

Nine stages for a research project 1. Formulate an appropriate research question, taking into account existing knowledge of web structure 2. Conduct a pilot study 3. Identify web pages or sites that are appropriate to address the research question

Nine stages for a research project 4. Collect link data from a commercial search engine or a personal crawler, taking appropriate accuracy safeguards 5. Apply data cleansing techniques to the links, if possible, and select an appropriate counting method 6. Partially validate the link count results through correlation tests, if possible

Nine stages for a research project 7. Partially validate the interpretation of the results through a link classification exercise 8. Report results with an interpretation consistent with link classification exercise, including either a detailed description of the classification or exemplars to illustrate the categories 9. Report the limitations of the study and parameters used in data collection and processing

Interpreting link counts For most research, need to be able to place an interpretation on link counts E.g. A links to B more than C, therefore… A is inlinked more than B therefore… Do links ‘measure’ visibility, luminosity, authority, information exports/imports, communication, impact, online impact, quality, importance, interpersonal communication, nothing, random actions,…?

Interpreting link counts Classifying random samples of links can help decide how to interpret them E.g. Links predominantly reflect… Correlation test are also useful as a form of triangulation E.g. Links counts associate with…

The theoretical perspective for link counting In order to be able to reliably interpret link counts, all links should be created individually and independently, by humans, through equivalent gravity judgments (e.g., about the quality of the information in the target page). Additionally, links to a site should target pages created by the site owner or somebody else closely associated with the site.

Summary Link counts are an information source that may reveal new insights into online and offline phenomena Can be used in conjunction with other data sources to address many research questions With existing tools, are relatively easy to use in research