Data Infrastructure Services for Data Curation Jian Qin School of Information Studies Syracuse University Syracuse, New York ALA 2015, San Francisco, CA
Data infrastructure “a sustainable data infrastructure that will be discoverable, searchable, accessible, and usable to the entire research and education community.” “usable by multiple scientific disciplines…” “…that can support and provide data solutions to a broader range of scientific disciplines while reducing duplicative efforts.” ALA 2015, San Francisco, CA2
Nature of an infrastructure Embeddedness. Infrastructure is sunk into, inside of, other structures, social arrangements, and technologies. Transparency. Infrastructure does not have to be reinvented each time of assembled for each task, but invisibly supports those tasks. Reach or scope beyond a single event or a local practice. Learned as part of membership. Links with conventions of practice. Embodiment of standards. Built on an installed base. Becomes visible upon breakdown. Is fixed in modular increments, not all at once or globally. ALA 2015, San Francisco, CA3 Star, S.L. & Ruhleder, K. (1996). Steps toward an ecology of infrastructure: Design and access for large information space. Information Systems Research, 7(1):
Capability of infrastructural services ALA 2015, San Francisco, CA4 Safe haven DOI minting service or related support Active data storage Data catalog Data repository Institutional repository Code of good research practice Institutional research strategy Data governance / Access committee Institutional RDM policy or aspirational statement Open access policy Source:
Data infrastructure services Library & info services Data science IT management Data infrastructure services Data services Data infrastructure Library IT ALA 2015, San Francisco, CA 5
Data repositories Publication repositories Scenario: Data repositories Institutional repositories Community repositories Subject repositories ALA 2015, San Francisco, CA6 Links between data and publications Separate or combined? Relations? May be at institutional and/or community levels
A broader view of RDM: data science ALA 2015, San Francisco, CA “An emerging area of work concerned with the collection, presentation, analysis, visualization, management, and preservation of large collections of information.” Stanton, J. (2012). Introduction to Data Science. 3/DataScienceBook1_1.pdf 3/DataScienceBook1_1.pdf 7
Building data infrastructure services To change in composition or structure (what we are/what we do) To change the outward form or appearance (how we are viewed/understood) To change in character or condition (how we do it) ALA 2015, San Francisco, CA8
The keyword for data infrastructure services is: ALA 2015, San Francisco, CA9
Institutionalization Infrastructure Standards Policy, procedures, training, best practice, compliance, IP protection and rights Networks, systems, databases, software tools, data services Data format standards, metadata standards, ontologies, controlled vocabularies/taxon omies ALA 2015, San Francisco, CA10
Start it the right way Repeatable Sustainable financially and technically A community of practice Institutionalization Collaboration and coordination Conformance to regulations and laws ALA 2015, San Francisco, CA11
Capacity building: RDM human capital – Deep Subject, Process, or Technical Expertise – Deep Service Commitment – Commitment to Research and Development – Commitment to Assessment and Evaluation – Communication and Marketing Skills – Project Development and Management Skills – Political Engagement – Resource Development Skills – Commitment to Rigor – Entrepreneurial Spirit – Commitment to Collaboration – Leadership/Inspirational Capacity ALA 2015, San Francisco, CA12
ALA 2015, San Francisco, CA13 MS in Library and Information Science CAS in Data Science
ALA 2015, San Francisco, CA14 Thank you!