Bibliometrics and preference modelling Thierry Marchant Ghent University.

Slides:



Advertisements
Similar presentations
Vector Spaces A set V is called a vector space over a set K denoted V(K) if is an Abelian group, is a field, and For every element vV and K there exists.
Advertisements

Assessing and Increasing the Impact of Research at the National Institute of Standards and Technology Susan Makar, Stacy Bruss, and Amanda Malanowski NIST.
Measuring Science (II) Morten Brendstrup-Hansen. No science without scientific publications Scientific publications are direct and tangible products of.
Ronald L. Larsen May 22, Trace relationships amongst academic journal citations Determine the popularity and impact of articles, authors, and publications.
N EW WAYS TO TRACK SCHOLARLY PRODUCTIVITY : T HE H - AND G - INDICES.
April 2, 2015Applied Discrete Mathematics Week 8: Advanced Counting 1 Random Variables In some experiments, we would like to assign a numerical value to.
Algebraic Structures DEFINITIONS: PROPERTIES OF BINARY OPERATIONS Let S be a set and let  denote a binary operation on S. (Here  does not necessarily.
SCIENTROMETRIC By Preeti Patil. Introduction The twentieth century may be described as the century of the development of metric science. Among the different.
Håkan Carlsson Gothenburg University Library Bibliometrics – A Tool in the Evaluation of Science.
1 Using Scopus for Literature Research. 2 Why Scopus?  A comprehensive abstract and citation database of peer- reviewed literature and quality web sources.
1 Scopus Update 15 Th Pan-Hellenic Academic Libraries Conference, November 3rd,2006 Patras, Greece Eduardo Ramos
Foundations of Measurement Ch 3 & 4 April 4, 2011 presented by Tucker Lentz April 4, 2011 presented by Tucker Lentz.
Journal Status* Using the PageRank Algorithm to Rank Journals * J. Bollen, M. Rodriguez, H. Van de Sompel Scientometrics, Volume 69, n3, pp , 2006.
Using Journal Citation Reports The MyRI Project Team.
A tutorial on how to compute H-index using Web of Science database.
Bibliometrics in Computer Science MyRI project team.
Setting up a Profile LRC Information Literacy Series: 7 (Google Scholar Citations) By Shri Ram.
Journal Impact Factors and H index
The Changing Role of Intangibles over the Crisis Intangibles & Economic Crisis & Company’s Value : the Analysis using Scientometric Instruments Anna Bykova.
Determining Sample Size
The Web of Science database bibliometrics and alternative metrics
Welcome to Scopus Training by : Arash Nikyar June 2014
The Profile (Google Scholar Citations) May 2015 Prof Hiran Amarasekera University of Sri Jayewardenepura Japura Media.
Institute of Information Technology of ANAS Rahila Hasanova "New Challenges in the European Area: International Baku Forum of Young Scientists.
Social Networking Techniques for Ranking Scientific Publications (i.e. Conferences & journals) and Research Scholars.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
1 Scopus as a Research Tool March Why Scopus?  A comprehensive abstract and citation database of peer-reviewed literature and quality web sources.
Bibliometrics: coming ready or not CAUL, September 2005 Cathrine Harboe-Ree.
Bibliometric research methods Faculty Brown Bag IUPUI Cassidy R. Sugimoto.
Impact factorcillin®: hype or hope for treatment of academititis? Acknowledgement Seglen O Per (BMJ 1997; 134:497)
The Web of Science, Bibliometrics and Scholarly Communication 11 December 2013
Journal Impact Factors and the Author h-index:
THOMSON SCIENTIFIC Patricia Brennan Thomson Scientific January 10, 2008.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Bibliometric methods of (research) assessment Themis Lazaridis Chemistry Department City College of New York/CUNY
NIFU STEP Norwegian Institute for Studies in Innovation, Research and Education 7 th euroCRIS strategic seminar, Brussels Recording Research.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
An overview of main bibliometric indicators: Dag W. Aksnes Data sources, methods and applications Nordic Institute for Studies in Innovation, Research.
A Bibliometric Comparison of the Research of Three UK Business Schools John Mingers, Kent Business School March 2014.
Bibliometrics for your CV Web of Science Google Scholar & PoP Scopus Bibliometric measurements can be used to assess the output and impact of an individual’s.
How to use Bibliometrics in your Career The MyRI Project Team.
Why publishing (and publishing in European Urology) is important for you Christian Gratzke Associate Editor European Urology How to Write a Manuscript.
Lotkaian Informetrics and applications to social networks L. Egghe Chief Librarian Hasselt University Professor Antwerp University Editor-in-Chief “Journal.
Bibliometrics toolkit Website: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Further info: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Scopus Scopus was launched by Elsevier in.
EuroCRIS Platform Meeting - Vienna 2-3 October 1998 CRIS as a source for tracking science publication patterns Fulvio Naldi - Carlo Di Mento Italian National.
H-Index. H-index was born ! We need an Index both to include quantity & also quality of an authors' paper Productivity Impact Not affected by “big hits”
The Web of Science, Bibliometrics and Scholarly Communication
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
RESEARCH EVALUATION - THE METRICS UNITED KINGDOM OCTOBER 2010.
Bibliometrics and Publishing Peter Sjögårde, Bibliometric analyst KTH Royal Institute of Technology, ECE School of Education and Communication in Engineering.
University rankings, dissected Răzvan V. Florian Ad Astra association of Romanian scientists Center for Cognitive and Neural Studies, Cluj, Romania.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Discrete-time Random Signals
1 CS 430: Information Discovery Lecture 5 Ranking.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
1 e-Resources on Social Sciences: Scopus. 2 Why Scopus?  A comprehensive abstract and citation database of peer-reviewed literature and quality web sources.
Discrete Mathematics Lecture # 17 Function. Relations and Functions  A function F from a set X to a set Y is a relation from X to Y that satisfies the.
Prepared By Meri Dedania (AITS) Discrete Mathematics by Meri Dedania Assistant Professor MCA department Atmiya Institute of Technology & Science Yogidham.
Writing scientific papers and publishing your research 1) Writing a paper helps you identify missing information 2) Helps develop new ideas 3) Documents.
THE BIBLIOMETRIC INDICATORS. BIBLIOMETRIC INDICATORS COMPARING ‘LIKE TO LIKE’ Productivity And Impact Productivity And Impact Normalization Top Performance.
Measuring Research Impact Using Bibliometrics Constance Wiebrands Manager, Library Services.
CHALLENGES 1.
Unit-III Algebraic Structures
Chapter 3 The Real Numbers.
Altmetrics: Analysis of Library and Information Science (LIS) Research in the Social Media Ifeanyi J. Ezema (Ph.D) Paper Presented at the 1st International.
Chapter 3 The Real Numbers.
Jian Wang Assistant Professor Science Based Business Program LIACS, Leiden University
The three Bibliometric ‘Laws’ – plus one
Advanced Scientometrics Workshop
Writing scientific papers and publishing your research
Presentation transcript:

Bibliometrics and preference modelling Thierry Marchant Ghent University

Some academic rankings

Top 5% Authors, as of April 2008 Average Rank Score

Outline Why rank ? Which attributes? Some popular rankings. How can we motivate a ranking ? The axiomatic approach. Comparing peers and apples

Why rank ?

Why rank universities ? To choose one for studying (bachelor student). To attract good students (good university). To obtain subsidies (good university). To allocate subsidies (government). To allocate students to various universities in function of their score at an exam (government)....

Why rank departments ? To choose one for studying (doctoral student). To attract good students (good department). To obtain subsidies (good department). To allocate subsidies (government). To allocate students to various departments in function of their score at an exam (government)....

Why rank scientists ? To determine the salary (university). To award a scientific distinction (scientific society). To hire a new scientist (university). To choose a thesis director (student). To evaluate a department or university (...). To evaluate a journal (...). To allocate subsidies (government)....

Why rank journals ? To choose one for publishing (scientist). To maximize the dissemination of one’s results. To maximize one’s value. To evaluate a scientist (...). To evaluate a department (...). To evaluate a university (...). To improve one’s image (good publisher)....

Why rank articles ? To select articles (scientist). To evaluate a scientist (...). To evaluate a departement (...). To evaluate a university (...). To evaluate a journal (...)....

Focus in this talk Rankings of scientists Rankings of departments Rankings of universities Rankings of journals Rankings of articles

Which attributes ?

Many relevant attributes Quality –Evaluation by peers –Quality of the journals –Citations (#, authors, journals, +/-) –Coauthors –Patents –Awards –Budget Quantity –Number of papers –Number of books –Number of pages –Coauthors (#) –Number of patents –Citations (#) –Awards –Budget –Number of thesis students Various –Age –Carreer length –Land –Nationality –Discipline –Century –University

Bibliometric attributes Quality –Evaluation by peers –Quality of the journals –Citations (#, authors, journals, +/-) –Coauthors –Patents –Awards –Budget Quantity –Number of papers –Number of books –Number of pages –Coauthors –Number of patents –Citations (#) –Awards –Budget –Number of thesis students Various –Age –Carreer length –Land –Nationality –Discipline –Century –University

Bibliometric attributes Quality –Evaluation by peers –Quality of the journals –Citations (#, authors, journals, +/-) –Coauthors –Patents –Awards –Budget Quantity –Number of papers –Number of books –Number of pages –Coauthors –Number of patents –Citations (#) –Awards –Budget –Number of thesis students Various –Age –Carreer length –Land –Nationality –Discipline –Century –University

Bibliometric attributes Why using bibliometric attributes ? Cheap Objective ? Reliable ?

Some popular rankings of scientists

Some popular rankings Number of publications Total number of citations Maximal number of citations Number of publications with at least  citations. Average number of citations The same ones weighted by Number of authors Number of pages Impact factor The same ones corrected for age h-index, g-index, hc-index, hI-index, R-index, A-index, …

The h-index Published in 2005 by physicist G. Hirsch. 462 (1267) citations in March 2009 (May 2013). Adopted by Web of Science (ISI, Thomson). The h-index is the largest natural number x such that at least x of his/her papers have at least x citations each. h-index = 6

How to justify a ranking ? THE true and universal ranking does not exist.

How to justify a ranking ? THE true and universal ranking does not exist. Two departments: 50 scientists with 2000 citations 3 scientists with 180 citations

How to justify a ranking ? THE true and universal ranking does not exist. If one knows the true ranking, one may compute some correlation between the true one and another one.

How to justify a ranking ? THE true and universal ranking does not exist. If one knows the true ranking, one may compute some correlation between the true one and another one. Assessing the Accuracy of the h- and g-Indexes for Measuring Researchers’ Productivity, Journal of the American society for information science and technology, 64(6):1224–1234, “The analysis quantifies the shifts in ranks that occur when researchers’ productivity rankings by simple indicators such as the h- or g-indexes are compared with those by more accurate FSS.”

How to justify a ranking ? THE true and universal ranking does not exist. If one knows the true ranking, one may compute some correlation between the true one and another one. Assume a law linking the numbers of papers and citations to the quality of the scientist (unobserved variable) and his age. This law may be probabilistic. Derive then an estimation of the quality of a scientist from his data (papers and citations).

How to justify a ranking ? THE true and universal ranking does not exist. If one knows the true ranking, one may compute some correlation between the true one and another one. Assume a law linking the numbers of papers and citations to the quality of the scientist (unobserved variable) and his age. This law may be probabilistic. Derive then an estimation of the quality of a scientists from his data (papers and citations). Analyze the mathematical properties of rankings.

Characterization of scoring rules

Definitions Set of journals : J = { j, k, l, …} Paper: a paper in journal j with x citations and a coauthors is represented by the triplet (j,x,a). Scientist: mapping f from J× N × N to N. The number f(j,x,a) represents the number of publications of author f in journal j with x citations and a coauthors. Set of scientists: set X of all mappings from J× N × N to N such that Σ j ∈ J Σ x ∈ N Σ a ∈ N f(j,x,a) is finite. Bibliometric ranking : weak order ≥ on X (complete and transitive relation).

Scoring rules Scoring rule : a bibliometric ranking is a scoring rule if there exists a real-valued mapping u defined on J× N × N such that f ≥ g iff Σ j Σ x Σ a f(j,x,a) u(j,x,a) ≥ Σ j Σ x Σ a g(j,x,a) u(j,x,a) Examples : u(j,x,a) = 1 # papers u(j,x,a) = x # citations u(j,x,a) = x/(a+1) # citations weighted by # authors u(j,x,a) = IF(j) # papers weighted by impact factor …

Axioms Independence: for all f, g in X, all j in J, all x, a in N, we have f ≥ g iff f + 1 j,x,a ≥ g + 1 j,x,a.

Axioms Independence: for all f, g in X, all j in J, all x, a in N, we have f ≥ g iff f + 1 j,x,a ≥ g + 1 j,x,a. > + 1 paper in j, with x citations with a coauthors + 1 paper in j, with x citations with a coauthors > f g

Axioms Archimedeanness: for all f, g, h, e in X with f > g, there is a natural n such that e + nf ≥ h + ng.

Axioms Archimedeanness: for all f, g, h, e in X with f > g, there is a natural n such that e + nf ≥ h + ng. < e h + f : 10 papers with 20 citations + g : 1 paper with 1 citation + f : 10 papers with 20 citations + g : 1 paper with 1 citation + f : 10 papers with 20 citations + g : 1 paper with 1 citation + f : 10 papers with 20 citations + g : 1 paper with 1 citation ≥

Axioms Independence: for all f, g in X, all j in J, all x, a in N, we have f ≥ g iff f + 1 j,x,a ≥ g + 1 j,x,a.  Not satisfied by the max # of citations or h-index.  Reversal with the h-index when adding 2 papers. Archimedeanness: for all f, g, h, e in X with f > g, there is an integer n such that e + nf ≥ h + ng.  Not satisfied by the max # of citations, h-index, lexicographic ranking.

Result Theorem : A bibliometric ranking satisfies Independence and Archimedeanness iff it is a scoring rule. Furthermore u is unique up to a positive affine transformation. Proof: (X, +, ≥ ) is an extensive measurement structure as in [Luce, 2000]. (X, +) is a cancellative (f+g = f+h  g=h) monoid. It can be extended to a group (X’, +) by the Grothendieck construction. (X’, +, ≥ ) is an Abelian and Archimedean linearly ordered group. It is isomorphic to a subgroup of the ordered group of real numbers (Hölder).

Special case: u(j,x,a) = x /(a+1). Transfer: for all j in J, all x, y, a in N, we have 1 j,x,a + 1 j,y+1,a ~ 1 j,x+1,a + 1 j,y,a ( u affine in # citations). Condition Zero: for all j in J, all a in N, there is f in X such that f + 1 j,0,a ~ f ( u linear in # citations). Journals Do Not Matter: for all j, j’ in J, all a, x in N, 1 j,x,a ~ 1 j’,x,a ( u independent of journal). No Reward for Association: for all j in J, all m, x in N with m >1, 1 j,x,0 ~ m 1 j,x,m-1 ( u inversely proportional to # authors).

Characterization of conjugate scoring rules for scientists and departments

Introduction Consider two departments each consisting of two scientists. The scientists in department A both have 4 papers, each one cited 4 times. The scientists in department B both have 3 papers, each one cited 6 times. Both scientists in department A have an h-index of 4 and are therefore better than both scientists in department B, with an h-index of 3. Yet, department A has an h-index of 4 and is therefore worse than department B with an h-index of 6. Hence, the “best” department contains the “worst” scientists.

Definitions Scientist: mapping f from N to N. The number f(x) represents the number of publications of scientist f in with x citations. Set of scientists: set X of all mappings from N to N such that Σ x ∈ N f(x) is finite. Ranking of scientists : weak order ≥ s on X. Department : vector of scientists Set of all departments denoted by Y. Ranking of departments : weak order ≥ d on Y.

Scoring rules Scoring rule : a ranking of scientists is a scoring rule if there exists a real-valued mapping u defined on N such that f ≥ s g iff Σ x f(x) u(x) ≥ Σ x g(x) u(x) Scoring rule : a ranking of departments is a scoring rule if there exists a real-valued mapping u defined on N such that (f 1, …, f k ) ≥ d (g 1, …, g l ) iff Σ i Σ x f i (x) v(x) ≥ Σ j Σ x g j (x) v(x) Conjugate scoring rules : ≥ s and ≥ d are conjugate scoring rules if u = v.

Axioms Consistency: if f i ≥ s g i, for i = 1, …, k, then (f 1, …, f k ) ≥ d (g 1, …, g k ). In addition, if f i > s g i, for some i, then (f 1, …, f k ) > d (g 1, …, g k ). Totality: if (f 1, …, f k ) and (g 1, …, g l ) are such that Σ i f i = Σ j g j, then (f 1, …, f k ) ~ d (g 1, …, g l ). Dummy : (f 1, …, f k ) ~ d (f 1, …, f k, 0).

Result Theorem : ≥ s and ≥ d satisfy Consistency, Totality, Dummy and Archimedeannness of ≥ s iff they are conjugate scoring rules. Furthermore u is unique up to a positive affine transformation.

Discussion

Axiomatic analysis of more rankings is needed. Axiomatic analysis of indices is different but also relevant. Consistency is important (e.g. h-index for scientists and IF for journals).

Literature Scientometrics Journal of Informetrics Journal of the American Society for Information Science and Technology

Comparing peers and apples

Comparing scientists of different ages h-index =  h-index =  

Instead of h-index, use an index that is independent of time. For instance, the average number of citations per paper, i.e. Σ x ∈ N x f(x)/ Σ x ∈ N f(x) Problem: suppose f has one paper with 50 citations and g has 10 papers with 40 citations. Divide the h-index by the length of the carreer Problem: the h-index is not a linear function of time Comparing scientists of different ages

Comparing across disciplines The average number of citations per paper is 80 times larger in medicine than in mathematics. Any comparison of scientists across disciplines, using an index based on citations is therefore flawed. Field normalization: for a given index, compute the distribution of the index in each field (medicine, physics, economics, mathematics, literature, …). Define then the normalized index of a scientist as his/her percentile. Problem: the definition of a field is arbitrary. The average number of citations per paper is 20 times larger in physics than in mathematics. But only 2-3 times in theoretical physics.

Source field normalization Papers in medicine are often cited. This implies that they have long reference lists. Papers in mathematics have short reference lists. Instead of defining disciplines or fields, use the length of the reference list to normalize. Thus, divide the number of citations received by a paper by the length of the reference list.

Distributions

Lotka’s law Proportion of scientists with n papers : F(n) = C/n a with C ≃ 2 and a depending on the field.

Non universal power law Peterson Pressé and Dill, Proceedings of the National Academy of Sciences, 107, Direct citations : the probability that a new paper will randomly cite paper A is P direct = 1/N, with N the total number of published papers. Indirect citations : the author of the new paper may first find a paper B and learn of paper A via B’s reference list. P indirect = k/Nn, with k the number of existing citations to A and n the average length of the reference list.

Non universal power law (ctd) Fraction of the N papers with k citations :