Download presentation
Presentation is loading. Please wait.
Published byEthelbert Anderson Modified over 6 years ago
1
Data and text mining: facilitating our researchers’ needs in the 21st century A case study
Hui Hua Chua ALCTS CRS College & Research Libraries IG ALA Annual Conference · Chicago, IL · June 25, 2017 Macro-level presentation > micro-level case study Speaking about Michigan State University Libraries’ implementation of LexisNexis Web Services Kit for TDM: specifically the decision and process of creating a web-based interface (otherwise known as Text Assembler) to provide campus users unmediated access to search and download news content from the LexisNexis corpus for text mining.
2
What is Text Assembler? Project context Implementation process Lessons learned
3
What is Text assembler? TA user interface Text Assembler WSK API
LexisNexis news corpus User interface: authenticated users query and view sample search results; queue query for processing and download results TA: passes query to API; retrieves and stores results for delivery; processes results to create plain text file; manages multiple queries and query processing based on WSK parameters Not a programmer; not a technical presentation First 2 components created by MSU Libraries; second 2 licensed from LN and comprise LN WSK
4
Michigan state university
Fall 2016: 50,344 students enrolled Undergraduate (39,090 or 77.6%) vs graduate (11,254 or 22.4%) Top five colleges by enrollment: Business (7686), Social Science (6562), Natural Science (6192), Engineering (6075), Agriculture & Natural Resources (4588) MSU public land grant status and history; in contrast A/L + RCAH = 2115 College Fall 2016/ Headcount/ Rank Communication Arts and Sciences 3,628 6 Education 3,544 7 Lyman Briggs College 1,968 8 Arts and Letters 1,829 9 Osteopathic Medicine 1,434 10 James Madison College 1,129 11 Nursing 1,116 12 Human Medicine 1,061 13 Veterinary Medicine MSU College of Law Music Residential College in Arts & Humanities
5
TDM landscape in 2014 In flux and developing
Libraries and publishers receiving researcher requests, but often no established business, technical or access models Ad-hoc access to data requested on case-by-case basis from publisher License or purchase corpus and host in-house Project began in 2014
6
MSU Libraries’ role in tdm
Facilitate data access (license negotiation, purchase, data hosting, work with APIs) Consult or provide assistance with tools, methods or data Almost always driven by specific researcher request Only just beginning to think about incorporating TDM language into existing/new licenses.
7
LexisNexis web services kit (wsk)
Product: API plus access to content; enables larger downloads of data than LexisNexis Academic Content: current news with updates Business model: subscription No specific user request or demand “Let a hundred flowers bloom” Given this environment, what was different, unique or compelling to MSU about WSK? Content: Had already licensed or worked with historical news & govt content, journal articles, citation data, Google Books for TDM. Business model: one-time purchase Experiment with different products and business models: Let a hundred flowers bloom
8
Implementation team Digital Scholarship Librarian: Thomas Padilla
Digital Library Programmer: Devin Higgins Programmer: Megan Schanz Liaison to School of Journalism: Hui Hua Chua
9
Implementation timeline & process
7/2014 WSK acquired 8/ /2014 Fact-finding & testing In-house LN sales and technical staff Other universities that had licensed WSK Potential MSU users and use cases Decision made to develop unmediated web-based user interface. Why? Technical considerations and large potential user base Read documentation; use test API; internal discussions. LN sales and technical staff: content, license and technical use restrictions, test API, documentation Temple University: how WSK is used and how access is managed; resources required MSU poli sci faculty: text-mining needs and use cases for current news/legal content, specific current research question. Decision made to develop unmediated web-based user interface based on technical considerations and potential large user base
10
Implementation timeline & process
11/2014-1/2015 Direct API use to answer 2 specific research questions 12/2014 Project request submitted to MSUL Systems department 2/2015 Programmer assigned. System developed (1.5 months) 8/2015 Usability testing of UI 9/2015 Public launch of Text Assembler 12/2016 Permission received from MSU Technologies to share code with acknowledgement. Source code: One major software update since launch. The ongoing maintenance time is very low, the same amount of time spent maintaining all of our servers and nothing specific to Text Assembler. Maybe a few hours per month total.
13
Lessons learnt: before implementation
Better coordination of acquisition and implementation processes. Fact-finding could have been completed before purchase, leading to less time to public launch. Somewhat limited as documentation only available after purchase. But would have saved some time (1 year to launch)
14
Lessons learnt: post implementation
How to balance system constraints with user behavior or preferences System update (1 week development time) Work with users to formulate more targeted queries Hourly query rotation More guidance needed with query formulation More explicit search instructions, specifically for selecting appropriate data sources and field searching Work with users individually Takes a while for new system to be adopted and used Without specific use cases we were learning about users as they used the system, so lessons learnt were mainly about our users. 10-page Boolean query How many of you here are familiar with LN Academic? LN Academic data structure not easy to understand Oct first query queued.
15
Usage Queued searches since system went live 121
Number of different users in the last 3 months 5 Average number of results per search 24,855 Largest search 299,146 Usage statistics and privacy
16
Usage continued Based on review of specific queries and user contacts
Primarily used as research tool; not the UG teaching tool we had hoped for Heaviest use by/in the Social Sciences Library relationship with TA users is arms-length Similar to any third-party database. Hear from users when have problems. - Contrast to other libraries such as Vanderbilt that have adopted a different approach: collaboration with users on research projects, skills and community building. This is OK if receiving heavy use but isn’t.
17
Further questions How is success defined?
How do libraries define, assess and demonstrate value in TDM services? Potential assessment of contribution to research outputs use in teaching Questions I have that we can maybe discuss in Q&A later
18
Thank you! Hui Hua Chua Collections & User Support Librarian Michigan State University Libraries TA source code: Technical questions? I can pass them on to the appropriate person.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.