Bibliometric research methods Faculty Brown Bag IUPUI Cassidy R. Sugimoto
Overview Vocabularly Citation analysis Citation indices Bibliometric laws Impact factor Applications
Vocabulary Scholarly Communications Formal and information Scientometrics Scientific communication Infometrics Thinking beyond scholarly “texts” Webometrics web Bibliometrics Application of statistical and mathematical methods (formal channels)
Citation analysis Why do people cite? Why are some articles not cited? What does a citation mean? Citing document Cited document B is cited by A AB A references B
Who’s on first? Embedded citation index from ` En mishpat: Babylonian Talmud (1546) (Weinberg, 1997) Shepard’s Citation Index (1873) Shapiro (1992)
Institute for Scientific Information (ISI)
Scopus
GoogleScholar
Comparison Overlap 57% (4,892) Scopus 29% (2,441) Web of Science 14% (1,216) Scopus n=7,333 (86%) Web of Science n=6,108 (71%) Distribution of unique and overlapping citations in Scopus and Web of Science (n=8,549)
Are you a citation index?
Bibliometric research OR “Why I love good indexes”
Citation analysis Citing document Cited document B is cited by A AB A references B
Citation analysis: methods Not just articles…
Variable:PRODUCERS
Variable:ARTIFACTS
Variable:CONCEPTS
Hybrid approaches Chaomei Chen:
h-index Hirsch (2005) A scientist has index h if h of [his/her] N p papers have at least h citations each, and the other (N p − h) papers have at most h citations each.
Bibliometric laws Lotka’s Law (1926) the number (of authors) making n contributions is about 1/n² of those making one; and the proportion of all contributors, that make a single contribution, is about 60 percent (60,15,7…6>10) Not statistically exact May be changing with the current model of scholarship
Bibliometric laws Bradford’s law (1934) Journals in a field can be divided into three parts: 1)Core: relatively few # of journals producing 1/3 of all articles 2)Zone 2: same # of articles, but > # of journals 3)Zone 3: same # of articles, but > # of journals The mathematical relationship of the number of journals in the core to the first zone is a constant n and to the second zone the relationship is n². 1:n:n² Not statistically exact General power law distribution (akin to Pareto’s law in economics)
Bibliometric laws Zipf’s Law (1935) Not statistically exact General power law probability distribution listing the words occurring within that text in order of decreasing frequency, the rank of a word on that list multiplied by its frequency will equal a constant. The equation for this relationship is: r x f = k where r is the rank of the word, f is the frequency, and k is the constant James Joyce's Ulysses 10 th most frequent: 2,653 times 100 th most frequent: 265 times 200 th most frequent: 133 times rank of the word multiplied by the frequency of the word equals a constant that is approximately 26,500
Bibliometric laws Other power law probability distributions Pareto’s law (economics) rule Law of the vital few Principle of factor sparsity PageRank (google) The Long Tail (markets)
Journal impact factors
As a research method… Reliability? Validity? Limitations?
Applications? Finding and use Collection development Reference services Collection evaluation Use studies Information retrieval algorithms Diffusion of ideas Domain areas and interdisciplinarity Mapping science
Writing your paper…