Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bibliometrics and Scientometrics: Tools and Techniques By Dr. Samir Kumar Jalal Deputy Librarian Central Library, IIT Kharagpur.

Similar presentations


Presentation on theme: "Bibliometrics and Scientometrics: Tools and Techniques By Dr. Samir Kumar Jalal Deputy Librarian Central Library, IIT Kharagpur."— Presentation transcript:

1 Bibliometrics and Scientometrics: Tools and Techniques By Dr. Samir Kumar Jalal Deputy Librarian Central Library, IIT Kharagpur

2 Contents Part-I – Bibliometric Laws – Bibliometric Techniques – Bibliometric Indicators Part-II – Statistical Techniques for Data Analysis Part-III – OSS for Scientometric Analysis of Faculty Publications

3 Definitions Bibliometrics – Pritchard said that bibliometrics is the application of mathematics and statistical methods to books and other media of communication. Scientometrics – Nalimov and Mulchenko defined scientometrics as “the application of quantitative methods with the analysis of science viewed as an information process” – Scientometrics is the study of measuring and analysing science, technology and innovation

4 Definitions Informetics “the study of the quantitative aspects of information in any form, not just records, or bibliographies, and in any social group, not just scientists”(Tague-Sutcliffe,1992). Cybermetrics – Cybermetrics includes webometrics and something else Webometrics Webometrics is the quantitative analysis of web phenomena, drawing upon informetric methods. It includes Webpage Content Analysis, Web Link Structure analysis, Web technology analysis, Web Usage Analysis.

5 Types of Bibliometrics Descriptive: It implies the analysis of documents using simple bibliometric indicators. Example: simple counting of publications against authors. Evaluative: It uses citation techniques to assess the impact of scholarly works. Example: comparison of the contribution of two authors. Relational: Bibliometric methods to examine relations in science through ISI data. Example: co-citation as a measure of similarity and co-authorship pattern

6 Relation between Bibliometrics, Scientometrics, Informetrics, Cybermetrics and Webometrics Fig-1: The relation among Bibliometrics, Informetrics, Scientometrics, Cybermetrics and Webometrics explained in the above diagram (Björneborn & Ingwersen, 2004).

7 Bibliometric Laws Bradford’s Law: Law of Scattering – Bradford's Law serves as a general guideline to librarians in determining the number of core journals in any given field. Lotka’s Law: Scientific productivity – Lotka's Law describes the frequency of publication by authors in a given field. Zipf’s Law: Word Frequency – Zipf's Law is often used to predict the frequency of words within a text.

8 Bradford’s Law The Bradford’s Law may be defined as follows: “If scientific journals are arranged in order of decreasing productivity on a given subject, they may be divided into a nucleus of journals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus when the numbers of periodicals in the nucleus and the succeeding zones will be as 1:b:b 2 …” -----------(Brardford, Engineering, 1934)

9 Lotka’s Law

10 How to build Lotka’s Law? No. of Pairs No of articles (x) No. of authors observed (y) Log value articles (X) Log value of authors (Y) XYX2X2 1301 1.477002.1815 2242 1.3800.300.4141.904

11 Zipf’s Law

12 Bibliometric Techniques Publication Counts Citation Counts Citations Per Publication (CPP) Literature Usage Counts Impact Per Publications (IPP) Citation Analysis Bibliographic Coupling Co-citation Analysis Co-word Analysis

13 Publication Counts Paper counts, which measure productivity, are the most basic bibliometric measure and provide the raw data for all citation analysis. Ranking institutions in terms of paper counts helps to compare the productivity and volume of research output among various institutions. The number of researchers at an institution should be taken into account when comparing publication counts across institutions. Characteristics of the papers, such as document type, publication year, and categorization method, should also be considered.

14 Citation Counts Citation counts are the number of citations that a particular journal or an article receives during a particular time. Citation count technique is applied to determine how many citations are being received by – A documents – An author over a period of time – An Institution

15 Citation Counts: Type of Study Authorship study Type of document Used Ranking of Journals Self-citation study Obsolescence study Half-life study

16 Authorship Study Author may write articles independently or on collaboration basis. The study will focus on how single author publications get citations in comparison to collaborative works. – Rate of single author citation – Rate of multi-author citation – Verification of Lotka’s Law

17 Ranking of Journals Ranking of Journals can be done in three different ways: – Ranking journals by no. of citations; – Ranking of journals by IF; – Ranking of Journals by Immediacy Index

18 Impact Factor (IF) The impact factor was devised by Eugene Garfield, the founder of the Institute for Scientific Information. IF can be calculated as: Where, A= No. of citations received by Journal X in the year 2015; B= Total Number of citable articles published by Jr. X in 2013 & 2014 http://www.citefactor.org/journal-impact-factor-list-2014.html

19 Immediacy Index (I)

20 Half-Life Half-life of the literature is the time by which one half of the currently published literature become obsolete. Example: A resource acquired by a library in the year 1995. The statistics given above are of the circulation in the period of 16 years. Total number of issue of this book in period of 16 year is 83 and the median as similar of the above example comes 42, which is achieved in year 2000. Then, 2000 minus 1995 equals five year. So, the half life of the item received in 1995 would be of 5 years.

21 Period: 1995-2010 i.e. 16 years; Median Use: 60; Half life= 4 years

22 Obsolescence Obsolescence is the process whereby materials become no longer useful or reliable. Rate of obsolescence varies with the discipline. May be applied for Weeding out

23 Citations Per Publication (CPP)

24 Literature Usage Counts Counting of literature usage is very important technique in Bibliometrics because it is important to know whether subscribed or procured documents are used or not.

25 SCImago Journal Rank (SJR) SCImago Journal Rank (SJR indicator) is a measure of scientific influence of scholarly journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from.

26 Eigenfactor Eigenfactor is a rating of total importance of scientific journal. Journals are rated according to the number of incoming citations, with citations from highly ranked journals weighted to make a larger contribution to the eigenfactor than those from poorly ranked journals. Eigenfactor score is influenced by the size of the journal. Eigenfactor scores and Article Influence scores are calculated by eigenfactor.org, where they can be freely viewed.

27 Eigenfactor, SJR & Article Influence Eigenfactor excludes all journal self-citations while SJR limits journal self-citations to 33%. Eigenfactor uses a five year citation window; SJR uses a three-year citati.on window The Article Influence score measures the average influence of articles in the journal, and is therefore comparable to the traditional impact factor.

28 Citation Analysis Citation analysis is a technique to find out important journals in a particular field like physics, chemistry, mathematics etc. Citation analysis examines pattern of distribution like authorship pattern, publication pattern, citation pattern etc.

29 Bibliometric Techniques

30 Source Normalized Impact per Paper (SNIP) SNIP measures the impact of a paper within a subject field. SNIP measures contextual citation impact by weighting citations based on the total number of citations in a subject field.

31 Calculation of IPP and SNIP YEARIPPSNIPCITATIONSPAPERS 20143.3791.4772394808 20132672760 20122566746 20112722850 Let’s take a Journal ‘X’

32 Bibliographic Coupling Kessler (1963) proposed a technique, known as bibliographic coupling to measure the similarity between two scientific documents in terms of number of references they make in common. The strength of bibliographic coupling is measured by no. of references that are in common.

33 Bibliographic Coupling C B A D F E References Documents

34 Co-citation Analysis Small (1973) proposed another technique, co- citation to measure the similarity between two documents as the number of common documents that cites both documents. Co-citation analysis has become the dominant method for the empirical study of the structures of scientific communication. The objective of co-citation analysis is to map the topical relatedness of clusters of authors, journals or articles.

35 Co-citation Analysis C B A D F E Citations Documents

36 Co-citation Analysis

37 Co-word Analysis The co-word analysis involves identification of keywords and their co-occurence in an attempt to generate a map index of papers linked by the degree of co-occurence of the keywords. co-word frequencies are used to construct a co-word structure;

38 Clustering “the process of organizing objects into groups whose members are similar in some way” A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

39 Citation Databases Scopus Web of Science Google Scholar Any other Bibliographic databases

40 Scopus Temporal Coverage: From1995 to Present. Coverage: Health Sciences (32%), Physical Sciences (29%), Social Sciences (24%), Life Sciences (15%). It covers nearly 22,000 titles from over 5,000 publishers, of which 21,500 are peer- reviewed journals. Updation: Scopus is the only leading database that is updated daily rather than just weekly Source: www.elsevier.com/solutions/scopus/content

41 Scopus: Coverage Over 60 million records [63% post-1995+ 37% before] More than 27 million patent records Over 7.2 million conference papers “Articles-in-Press” from over 5,000 journals More than 116,000 books with 10,000 added each year thereafter Cited references in Scopus go back to 1970 with the project started in March 2014. Scopus has been adding cited references for pre-1996 content, going back to 1970. As of December 2015, Scopus has added over 93 million pre-1996 cited references to nearly 5 million articles.

42 Web of Science Web of Science, previously known as ISI Web of Knowledge, is an online subscription-based scientific citation indexing database. Publisher: Thomson Reuters Temporal coverage: 1900 to present No. of records: 90 millions Web of Science consist of: 7 online databases

43 Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Launched: November 20, 2004 Publisher: Google Inc.

44 Bibliometric Indicators Impact factor h-index g-index p-index i-index

45 Impact Factors (IF)

46 h-index It is defined as number of articles h (e.g. h=5) that each articles receives at least h citations (h-5). Hirsch (2005) has introduced the concept of h-index. The h-index is an author level metrics. The h- index is defined as follows: A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np-h) papers have no more than h citations each.

47 h-index Characteristics – H-index increases over time; – H-index is based on citation counts; – Linear relationship between value of h-index and time. Advantages – A single indicator provides the idea of scientific productivity; – H-index does not depend on total number of citations; – H-index can easily be obtained from WoS and Scopus. Disadvantages – H-index is time dependent; – H-index is size independent [e.g. two scientists h-index value may be same but their performance could not be the same].

48 g-index

49 p-index

50 i-index In July 2011, Google has developed an academic publication indicator as part of their research. A scientist has index i if i of his or her N papers have at least 10 citations each. If an author publish 50 papers and is having i- index value =15, which means that his/her 15 out of 50 papers receives at least 10 citations each.

51 Calculation of h-index, g-index, i-index for Author[ no. of publications: 42] Rank doc. No. citation h-indexRank 2 Cummu. Sum g-indexi-index 122 1 i-index =1 29 4 31 39 9 40 4816 48 56 25 54 66h-index = 6 36 60 74 49 64 84 68 g-index=8 92 81 70 102 100 72

52 Webometrics Webometrics measures the web related phenomenon. Webometrics includes – Webpage Content analysis Ex. automatic categorization of webpages and texts ; – Web Link structure analysis; Ex. categorization of hyperlinks and inlinks, self-links and external links – Web usage analysis; and Ex. exploitation of log files for users’ searching and browsing behavior) – Web technology analysis. Ex. Performance of Search Engines

53 Web Impact Factor (WIF)

54 Altmetrics It is alternative to traditional metrics. The term Áltmetrics’ is proposed in 2010, basically for developing article-level metrics. Some of the aspects like view, discussed, download, saved, cited, recommended etc are taken into consideration while measuring impact in altmetrics.

55 Altmetric Score The altmetric score provides an idea of how important an article is by the quantitative value of attention that it receives. It is calculated through the weighted counts of the values of different social media sources such as newspaper stories, tweets, google+, blogs, comments etc.

56 Article Level Metrics Four major hat aggregate and provide article level metrics. PLOS (http://article-level-metrics.plos.org/). It is a non-profit making publisher, which provides article-level metrics free. It covers only articles.http://article-level-metrics.plos.org/ ImpactStory (http://impactstory.it/). It is a non-profit making article-level metrics providers, which provides article-level metrics freely. It covers not only articles but also code, software, presentations and datasets.http://impactstory.it/ Altmetric (http://altmetrics.org/): It is a non-profit making article-level metrics providers, which provides article-level metrics freely. It covers not only articles but books, code, software and datasets also. Plum Analytics: It is a non-profit making article-level metrics providers, which provides article-level metrics freely. It covers only articles.

57 Part-II Some Statistical Techniques

58 Some Statistical Techniques

59 Calculation of Correlation & SD

60 Rank Correlation Let us suppose a group of n-individuals is arranged in order of merit in possession of two characteristics A and B. These ranks in the two characteristics will be different. Pearson’s coefficient of correlation between the ranks x i and y i is called the rank correlation coefficient. Where, D= difference between ranks, N= no. of individuals,

61 Example to solve the Repeat Rank Highest to Lowest (x) Raw RankActual Rank 8011 7522.5 7532.5 6844 6456 66 76 5588 5099 4010

62 How to solve the problem in same rank under Rank Correlation?

63 Chi-square Test Chi-square is used to compare observed data with data we would expect to obtain according to a specific hypothesis. Chi-square is the sum of the squared difference between observed value (o) and the expected value (e) divided by the expected value (e) in all possible categories.

64 Chi-square

65 Is there a statistically significant relationship between citation impact and number of authors of articles? Is there any relationship between citation impact and the number of references cited in research articles? Is there any relationship between citation impact and the numbers of significant words in titles of articles?

66 Chi-square Number of citations No. of Articles Number of words 01-34-67 or more 1-5 44 (29) 27 (42) 20 (24) 25 (21) 116 6-10 56 (…) 111 (…) 55 (…) 44 (…) 266 11 or more1630231685 No. of Articles1161689885467

67 Part-III Publication Analysis

68 Publication Analysis: Data Sources Annual Reports Individual Faculty IDRs Manual Data collection From Bibliographic databases/ citation databases

69 Unit of Analysis Individual Level Institutional Level – Intra Institution – Inter Institution Country Level – Intra Country – Inter Country Subject Level

70 Publication Analysis Individual – Single author – Single Country – Single Institutions/Universities Collaboration – Multiple Authors – Multiple Institutions/Universities – Multiple Countries

71 Publication Growth Analysis: Models Exponential Growth Model Linear Growth Model Logistic Growth Model Relative Growth Rate

72 Exponential Growth Model

73 Linear Growth Model

74 Logistic Growth Model

75 Relative Growth Rate

76 Figure shows the trend in nanotechnology research during last 10 years. The result found that India made a remarkable progress and tried to maintain the world growth rate and even to some extent better. Figure 1: Growth of literature during 2002-2011

77 Doubling Time Doubling time (Dt) is directly related to Relative Growth Rate (RGR). It is the time required for articles/ citations to become double. Mathematically, doubling time may be calculated using either formula 1 or 2 1) 2)

78 Calculation of RGR

79 Thank You


Download ppt "Bibliometrics and Scientometrics: Tools and Techniques By Dr. Samir Kumar Jalal Deputy Librarian Central Library, IIT Kharagpur."

Similar presentations


Ads by Google