N-gram Topic Models for Bibliometric Analysis Gideon Mann, David Mimno, and Andrew McCallum Can topic models provide better measurements of the impact.

Slides:



Advertisements
Similar presentations
The impact of Grey Literature in the web environment: A citation analysis using Google Scholar Rosa Di Cesare, Daniela Luzi, Roberta Ruggieri Consiglio.
Advertisements

Library The Web of Science, Bibliometrics and Rankings 23 November 2011.
Using Journal Citation Reports Compiled by Ilona Eberle and Robyn Tweedale.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Bibliometrics Toolkit Google Scholar (GS) is one of three central tools (the others being ISI and Scopus) used to generate bibliometrics for researchers.
Bibliometrics – an overview of the main metrics and products The MyRI Project team.
The Web of data with meaning... By Michael Griffiths.
Information Management for Science in Korea Hyun Y. Cho Department of Library & Information Science Kyonggi University
Disasters and Human Factors Literature Nestor L Osorio Northern Illinois University.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
The Virtual Library Annette Healy LaVentra Ellis-Danquah Shiffman Medical Library June 19, 2006.
CSCD555 Research Methods Lecture 4 CS Body of Literature Winter 2012.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Education Databases -- ERIC. What is ERIC? Sponsored by the Dept. of Education (U.S.) A primary electronic database for education research ERIC stands.
Introduction Project goal was to develop simple way to characterize level of access to journal literature in physical sciences and engineering provided.
Using Journal Citation Reports The MyRI Project Team.
Not all Journals are Created Equal! Using Impact Factors to Assess the Impact of a Journal.
Tamas Doszkocs, Ph.D. Computer Scientist Meta Searching and Clustering.
WEB OF SCIENCE now including the CONFERENCE PROCEEDINGS CITATION INDEXES.
Types of Sources Used in Research Nancy McEnery, MLIS.
Journal Impact Factors and H index
BIBLIOMETRICS & IMPACT FACTORS University of Idaho Library Workshop Series October 15 th, 2009.
New Web of Science Rachel Mangan Customer Education
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Advanced Information Retrieval CSCI 6403 – Tuesday Gwendolyn MacNairn Computer Science Librarian Room 209.
By Kousar Taj A Seminar Paper on LITERATURE REVIEW.
INFORMATION SOLUTIONS Mary L. Van Allen 21 September 2005 Open Access Journals and citation patterns International Seminar on Open Access for Developing.
The Web of Science database bibliometrics and alternative metrics
Impact of the Toll-access vs. Open-access Resources.
Bibliometrics toolkit: ISI products Website: Last edited: 11 Mar 2011 Thomson Reuters ISI product set is the market leader for.
Bibliometric Impact Measures Leveraging Topic Analysis Gideon Mann David Mimno Andrew McCallum Computer Science Department University of Massachusetts.
Rajesh Singh Deputy Librarian University of Delhi Measuring Research Output.
Web of Science® Krzysztof Szymanski October 13, 2010.
LIS618 lecture 11 Citation indexing and searching Thomas Krichel
April 9, 2003Santiago, Chile The ISI Database: Reflecting the Best of International and Regional Research Keith R. MacGregor Sr. Vice President The Americas,
Journal Evaluation. Impact Factor  The impact factor, often abbreviated IF, is a measure of the citations to science and social science journals. citationsscience.
Journal Impact Factors: What Are They & How Can They Be Used? Pamela Sherwill, MLS, AHIP April 27, 2004.
Bibliometrics for your CV Web of Science Google Scholar & PoP Scopus Bibliometric measurements can be used to assess the output and impact of an individual’s.
Announcements Literature search lab on Wednesday (focus on your project) Keep track of your searching to document on the search log…for each search instance:
Citation Searching with Web of Knowledge Roger Mills Catherine Dockerty OULS Bio- and Environmental.
EuroCRIS Platform Meeting - Vienna 2-3 October 1998 CRIS as a source for tracking science publication patterns Fulvio Naldi - Carlo Di Mento Italian National.
Indexes and Abstracts: Dissecting the Resource By M. Leedy.
1 Making a Grope for an Understanding of Taiwan’s Scientific Performance through the Use of Quantified Indicators Prof. Dr. Hsien-Chun Meng Science and.
JOURNAL CITATION REPORTS James Cook University Celebrating Research 9 OCTOBER 2009 Steven Werkheiser Manager, Customer Education & Training ANZ Thomson.
Bibliometric Impact Measures Leveraging Topic Analysis Gideon Mann David Mimno Andrew McCallum Computer Science Department University of Massachusetts.
Download data versus traditional impact metrics : Measuring impact in a sample of biomedical doctoral dissertations Urban Andersson, Jonas Gilbert,
Citation Searching with Web of Knowledge Roger Mills.
Citation Searching To trace influence of publications Tracking authors Tracking titles.
Today’s lineup… Data-to-Story Project – description due questions about mid-term? Bibliometrics and citation analysis.
Web of Science: The Use & Abuse of Citation Data Mark Robertson & Adam Taves Scott Library Reference Dept.
STIMULATE 5 Ronald Rousseau Web page: users.telenet.be/ronald.rousseau.
Announcements Intro to Legal Research on Wednesday Keep track of your searching to document on the search log…for each search instance: – what database.
Evaluation of Scholarship using Web of Science Gayle Baker Electronic Services Coordinator UT Libraries.
Publication Pattern of CA-A Cancer Journal for Clinician Hsin Chen 1 *, Yee-Shuan Lee 2 and Yuh-Shan Ho 1# 1 School of Public Health, Taipei Medical University.
A SYSTEMATIC REVIEW OF THE LITERATURE OF RISK ASSESSMENT APPLICATIONS IN REGULATORY TOXICOLOGY Wen-Sheng Ko 1, Wen-Ta Chiu 2, Wen-Sen Lee 3 and Yuh-Shan.
Google Scholar and ShareLaTeX
Bibliometric Analysis of Herbal Medicine Publications, 1991 to 2004
D. E. Koditschek 358 GRW ESE 290/291 Introduction to Electrical & Systems Engineering Research Methodology & Design
Graduate Students Workshop
Bibliometric Impact Measures Leveraging Topic Analysis
Citation Searching with Web of Knowledge
For academic research Using Google Scholar For academic research
Building an autonomous citation index for grey literature: the
Information Science in International Perspective
Indication of Publication Pattern of Scientometrics
Bibliometric Analysis of Process Safety and Environmental Protection
Citation Searching with Web of Knowledge
Scientometrics of Horizontal Gene Transfer Research during
Bibliometric Analysis of Desflurane Research
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Presentation transcript:

N-gram Topic Models for Bibliometric Analysis Gideon Mann, David Mimno, and Andrew McCallum Can topic models provide better measurements of the impact of research literature?

Bibliometrics and Scientometrics Typically analyzes patterns of citations in research literature Derek de Solla Price: “Little Science, Big Science” Eugene Garfield: Science Citation Index, Journal Citation Reports

Comparing apples to apples: top journals by citations Biochemistry and molecular biology: J. Biol. Chem Cell Biochem.-US96809 Mathematics Lect. Notes Math6926 T. Am. Math. Soc6469 J. Math. Anal. Appl.6004 Source: Journal Citation Reports (2004)

What’s wrong with grouping by journal? 10 of the 200 most cited papers in CiteSeer are unpublished technical reports, 15% of most cited papers are from conference proceedings Open-access publication increasing, but venue information often not available Hand entered ISI citation data noisy Article has only one venue, journals cover many topics

A topic model for N-grams Determine whether the next word will be part of an n-gram based on the current word and the current hidden topic. “White house” is a collocation in politics, but may not be one in real estate.

Sample n-gram topics 1. Digital Libraries (102): digital, electronic, library, metadata, access; “digital libraries”, “digital library”, “electronic commerce”, “dublin core”, “cultural heritage” 2. WWW (129): web, site, pages, page, www, sites; “world wide web”, “web pages”, “web sites”, “web site”, “world wide” 3. Ontologies (186): semantic, ontology, ontologies, rdf, semantics, meta; “semantic web”, “description logics”, “rdf schema”, “description logic”, “resource description framework” 4. Web services (184): web, services, service, xml, business; “web services”, “web service”, “markup language”, “xml documents”, “xml schema”

Assigning topics to documents 1. Build a 200 topic n-gram topic model on 300k documents 2. Remove stopword or methodological topics (e.g. “efficient, fast, speed”) 3. For each document d, if more than 10% of d’s tokens are assigned to topic t, and that comprises more than two tokens, assign d to t Each topic is now an intellectual “domain” that includes some number of documents. We can substitute topic for journal in most traditional bibliometric indicators. We can also now define several new indicators.

Impact Factor Journal Impact Factor: Citations from articles published in 2004 to articles in Cell published in , divided by the number of articles published in Cell in Impact factors from JCR: Nature Cell JMLR5.952 Machine Learning3.258

Topic Impact Factor

Broad Impact: Diffusion Journal Diffusion: # of journals citing Cell divided by the total number of citations to Cell, over a given time period, times 100 Problem: relatively brittle at low citation counts. If a topic/journal is cited twice by two different topics/journals, it will have high diffusion.

Broad Impact: Diversity Topic Diversity: Entropy of the distribution of citing topics Better at capturing broad end of impact spectrum: the high diffusion topics are identical to the least frequently cited topics

Broad Impact: Diversity Topic Diversity: Entropy of the distribution of citing topics Topic diversity can also be measured for papers:

Longevity: Cited Half Life Two views: Given a paper, what is the median age of citations to that paper? What is the median age of citations from current literature?

History: Topical Precedence Within a topic, what are the earliest papers that received more than n citations? Information Retrieval (138): On Relevance, Probabilistic Indexing and Information Retrieval, Kuhns and Maron (1960) Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems, Cooper (1968) Relevance feedback in information retrieval, Rocchio (1971) Relevance feedback and the optimization of retrieval effectiveness, Salton (1971) New experiments in relevance feedback, Ide (1971) Automatic Indexing of a Sound Database Using Self-organizing Neural Nets, Feiten and Gunzel (1982)

Sharing: Topical Transfer