Mining Citation Data Using the Web of Science API A Data Gold Rush Phil White Earth Sciences & Environment Librarian philip.white@colorado.edu
The Question: How would I conduct a citation analysis? How could it be done more efficiently?
New Methods Web of Science Web of Science API How do I do this? API = Application Programming Interface SOAP API: runs on XML API has a URL Send the API URL an XML message, it will send an XML message in return How do I do this? One at a time using API tools (Postman, Hurl.it) Programmatically using a program language like Ruby, Python, R
Test Case: Geological Sciences @ CU Downloaded bibliography of Geoscience faculty pubs at CU for past 5 years Symplectic Elements (CSV) Each faculty publication comes with a Web of Science accession number 421 publications indexed by WOS Developed Python script: https://github.com/outpw/WOKapiscripts Opens CSV containing each WOS accession number Sends XML message requesting all cited references for each accession number Compiles each response into one XML document (24,448 citations) About 9 minutes (bye bye student workers) Cleaned data in OpenRefine Standardized journal names using OpenRefine clustering tools Matched citation data to local holdings data using OpenRefine reconciliation tool
Test Case: Geological Sciences @ CU Results: CU provides access to 92% of items cited 5 times or more 80% of all citations go to just 10% of all items cited (50% to just 1%) Discovered gaps in library collection Identified core collection of Geoscience serials (and the opposite)
Next Steps I’m not done! Current work: Future work: Refine methods—test case matched data sets on serial titles. Very close now to matching on ISSNs. This will speed up process dramatically. Integrate other APIs into workflow: OCLC, Crossref Total time for test case about 40–50 hours. Could be as fast as 1 day. Current work: New science faculty at CU Evaluate all sciences at CU Future work: Cross-institution comparison …?
Implications A revolution for citation analysis and collection assessment? Speed Scale
Thank You! Want to collaborate? Scripts: https://github.com/outpw/WOKapiscripts More: http://slides.com/philipwhite/datamining Want to collaborate? Phil White Earth Sciences & Environment Librarian philip.white@colorado.edu