Presentation is loading. Please wait.

Presentation is loading. Please wait.

GSLIS Research Showcase, April 9, 2010

Similar presentations


Presentation on theme: "GSLIS Research Showcase, April 9, 2010"— Presentation transcript:

1 GSLIS Research Showcase, April 9, 2010
Moving from science experiment to library service GSLIS Research Showcase, April 9, 2010 John Unsworth, Dean Graduate School of Library and Information Science

2 Funded for two years (2007-2009) by the Andrew W. Mellon Foundation
Focus: apply text-mining tools to digital libraries in the humanities; facilitate “reading at library scale.” Funded for two years ( ) by the Andrew W. Mellon Foundation Involved faculty and staff at Illinois (GSLIS and NCSA), Northwestern, Nebraska, Maryland, Alberta, McMaster Content (150M words of literary text) contributed by Virginia, Indiana, UNC, ProQuest, Cengage Coverage: literature of many genres, in English, from s 2010 GSLIS Research Showcase

3 In 2009, project was complete, meaning:
All texts had been normalized to TEI-A markup and modern spelling (with old spellings preserved), part-of-speech tagged, provided with enhanced item-level metadata, and ingested into a database. A web-based user interface had been produced, allowing users to define collections, select analytic routines and parameters, and examine or export results as tables or visualizations. 2010 GSLIS Research Showcase

4

5 However, one of the deliverables promised in the original grant proposal was:
Beta installations of MONK alongside several large collections provided by libraries or publishers, hosted on their servers Tim Cole at the UIUC Library had been keeping up with the project as it progressed, and after wrapping up the work on interface and content, I worked with him and Mike Grady (CITES) to produce this deliverable. 2010 GSLIS Research Showcase

6 However, one of the deliverables promised in the original grant proposal was:
Beta installations of MONK alongside several large collections provided by libraries or publishers, hosted on their servers Tim Cole at the UIUC Library had been keeping up with the project as it progressed, and after wrapping up the work on interface and content, I worked with him and Mike Grady (CITES) to produce this deliverable. 2010 GSLIS Research Showcase

7 Authentication was required because the texts from ProQuest and Cengage (about 100M of our 150M words) were licensed to CIC universities…. All the CIC libraries licensed EEBO and ECCO, but only about half licensed ProQuest’s 19th-century fiction collection ProQuest agreed to allow all CIC institutions access to these texts in MONK, so all we needed was a mechanism for authentication. CIC CIOs had recently agreed to deploy a federated identity management system, called InCommon, and they were willing to provide some funding for a proof-of-concept integration of MONK with InCommon, allowing MONK to be presented as a library service. 2010 GSLIS Research Showcase

8 2010 GSLIS Research Showcase

9 “In an last week, I hinted that we had some collective work to do to complete the Shibboleth access protocol for MONK text analysis functionality.  For Shibboleth to work, authenticated user information has to be pre-loaded.  That’s relatively easy when we have a couple of hundred known users as is the case for CICme, but is a project of different order when we’re trying to authenticate 400,000 users.  Somehow, someway, the University of Illinois Library—which manages the MONK servers— needs to know who on each of the CIC campuses “deserves” to be granted access to MONK. “  2010 GSLIS Research Showcase

10 -- email from Mark Sandler, CIC
“That list will look a lot like your library circulation or e- license authenticated user lists, but in this case you’d be transferring the data to another agency and that will make some of your campus colleagues—registrars and H.R. folks—nervous.  They’ll want to know who is getting the data and why, what the privacy policies are for the University of Illinois, what “attributes” (name, address, campus status, etc) are being released, plans to refresh the data, etc. “ -- from Mark Sandler, CIC 2010 GSLIS Research Showcase

11 At this point, we’re still working on getting access opened up to all CIC users. The problems have much more to do with policy, law, and institutional process than they have to do with technical challenges, although there are certain technicalities in MONK that raise policy problems—for example, the need to have identity information persist, so that work can be conducted in multiple sessions. Moral of the story? 2010 GSLIS Research Showcase


Download ppt "GSLIS Research Showcase, April 9, 2010"

Similar presentations


Ads by Google