THROUGH OR AROUND? SCIENTIFIC RESEARCH DATA AND THE INSTITUTIONAL REPOSITORY Panel Presentation for the International Conference on University Libraries Universidad Nacional Autónoma de México November 6, 2013 Christopher Stewart, Ed.D. Assistant Professor Graduate School of Library and Information Science Dominican University
Enabling Access to Research Data Not a new issue for universities and academic libraries, but rapidly developing one… Data Archiving Requirements AgencyNation/Region NSFUnited States NIHUnited States INSPIREEuropean Union UK Research CouncilUnited Kingdom ARCAustralia EUR-OCEANSFrance CIHRCanada FODAZIONE CARIPLOItaly Source: SHERPA/JULIET
Expanding the Mandate U.S. Office of Science and Technology Policy directive, 2/22/2013* *Requires each Federal agency with over $100 million in annual conduct of research and development expenditures to develop a framework for awardees.
Research Data can be: Heterogeneous Unless accompanying publication, often “raw” Highly idiosyncratic Characterized by implied description rather than explicit description Small and big
Big Data can be: Unstructured Unsuited for traditional (e.g., hierarchical, relational) database models Complete, not sampled Linked
Goals for Describing Scientific Research Data Access Re-use Context Content, not container (Yarmey, 2013)
Research Lifecycle Source: University of Virginia Library, Data Consulting Group
Describing Scientific Research Data: Semantic Modeling Shared vocabularies provide metadata across a range of subjects Ontologies allow for contextual relationships Linked data enable multiple types of data, documents, etc. to be viewed as one database
Data Description Schemes (Greenberg, 2013) Simple: interoperable, easy to generate, low barrier, multidisciplinary, agnostic, flat, general, properties Simple/moderate: interoperability with specific needs, requires expertise and greater domain focus, extensible, granular Complex: hierarchical and granular, domain-centered, extensive, 100+ properties
Are Research Data Collections? Selecting: partially, though volume and scope of data challenge current digital collection development frameworks Acquiring: partially, though data not “owned” Describing: yes, although some content may reside elsewhere Organizing: yes, but with not with “traditional” IR taxonomies
How Academic Libraries are Working with Research Data Now Institutional repositories are about all types of data, but are clearly set-up for research publications (Salo, 2010) Most institutional repositories rely on Dublin Core, which is required as minimum operability by OAI-PMH, but most research and exchange standards use XML/RDF as base (Salo, 2010) Geared for output, not context
Primary Metadata Use in Institutional Repositories StandardPercent of Use Dublin Core68% OAI-PMH46% MARC40% Source: Simons & Richardson, 2012
Challenges for Current Data Curation Models in Academic Libraries Beyond metadata at project level, dataset level provides some context for data, but can be limited (Yarmey, 2013) Discoverability in institutional repositories is generally limited to library websites, catalogs, and Google Scholar (Burns, Lana, & Budd, 2013)
Content in Institutional Repositories Content TypeNumber of Repositories Holding Response Rate Courseware1431% Data sets2351% Other2556% Books2964% Book chapters3578% Tech reports, working papers3987% Conference articles4089% Presentations4191% Theses and dissertations4343% Journal articles4444% Source: Burns, S. L., Lana, A., & Budd, J. M. (2013). Institutional Repositories: Exploration of Costs and Value. D-Lib Magazine, 19(1/2). Retrieved from
Domain Repositories Existing and developing metadata standards (e.g., Dryrad/DCAM, ICPSR/DDI)Dryrad ICPSR Centralized or distributed (e.g., DataONE) DataONE Evidence suggests that scholars who deposit materials in subject repositories prefer them over institutional repositories, and are not likely to use both (Xia, 2008) Built around communities of interest Cost sharing for cloud services
Data Management: Education and Programming Opportunities for Academic Libraries Training and support for data management plans Data librarianship Data literacy
An Evolving Model Subject/Domain Data RepositoryInstitutional Repository “Raw” dataPublished data LinkedHierarchical Open DataOpen Access Complex descriptionBasic description Multi-type dataMulti-type documents
References Burns, S. L., Lana, A., & Budd, J. M. (2013). Institutional Repositories: Exploration of Costs and Value. D-Lib Magazine, 19(1/2). Retrieved from Greenberg, J. (2012, August 22). Metadata for Managing Scientific Research Data. Presented at the NISO/DCMI Webinar. Retrieved from Salo, D. (2010). Retooling Libraries for the Data Challenge | Ariadne: Web Magazine for Information Professionals. Ariadne, (64). Retrieved from Simons, N., & Richardson, J. (2012). “New Roles, New Responsibilities: Examining Training Needs of Repository” by Natasha Simons and Joanna Richardson. Journal of Librarianship and Scholarly Communication, 1(2). Retrieved from Xia, J. (2008). A Comparison of Subject and Institutional Repositories in Self-Archiving Practices. Journal of Academic Librarianship, 34(6), 489–495. Yarmey, K. A., & Yarmey, L. R. (2013). All in the Family: A Dinner Table Conversation about Libraries, Archives, Data, and Science - Archive Journal Issue 3. Archive Journal, Summer 2013(3). Retrieved from archives-data-and-science/
Image Credits Slide 7: what-is-linked-data-and-why-does-it-matter-to-journalists- and-publishers/ what-is-linked-data-and-why-does-it-matter-to-journalists- and-publishers/ Slide 10: