Download presentation
Presentation is loading. Please wait.
1
Launching the Texas Data Repository
How to Implement TDR at Your Institution Kristi Park, Director, Texas Digital Library Santi Thompson, University of Houston Since September 2015, the Texas Digital Library (TDL) Dataverse Implementation Working Group (DIWG) has worked with Texas Digital Library staff to pilot and implement a consortial repository for sharing and archiving research data, as well as to develop policies and workflow documentation associated with a data repository service. We are ready to begin working with our members to implement the service on their campuses. In this webinar, we’ll discuss the work that we’ve done and how our members can start getting ready to offer the service to faculty and staff.
2
Today’s agenda Background of the Texas Data Repository
Capabilities of Dataverse How the Service Works Local Implementation Wrap Up & Questions After discussing some background of the Texas Data Repository, Santi will talk about the capabilities of Dataverse -- the repository software that we’re using -- and how the overall service will work. Then I will talk about local implementation -- the information member institutions will need to get the service set up on their campuses -- and wrap things up. We should have time to answer questions at the end, and I encourage you as we go along, to enter those questions in the chat window as you think of them, and we’ll address them as we have a moment.
3
Part 1: Background
4
The Texas Digital Library is a consortium of Texas higher education institutions that builds capacity for preserving, managing, and providing access to unique digital collections of enduring value. The Texas Digital Library has since its inception, supported greater access to scholarly communication through a number of hosted services -- hosted institutional repositories, open journals, and ETD publishing. We have 22 member institutions of varying sizes and flavors from across the state of texas. And indeed, in the past several years, our members have consistently expressed their need for support from TDL in providing infrastructure and services for research data management. Those expressed needs have been driven by funders, governments, advocacy groups, and others pushing for improved accessibility and usability of research data (as well as other research outputs). This increased focus on data sharing and re-use was famously accelerated by the 2013 OSTP Directive that required plans from federal agencies to support increased public access to the results of the research they fund.
5
Goals for TDR: Provide infrastructure (and other) support that will help researchers: comply with funding mandates receive greater recognition for for research data and researchers (DOIs, discoverability) produce better research (more reproducible, more efficient) And so, the goals for any repository service that we developed was to, first and foremost, help researchers comply with funding mandates -- providing them with infrastructure for compliance at the end of the research process and also for data management planning at the beginning of the process. Additionally, we wanted the service to faciliate greater recognition for research data and researchers through the assignment of DOIs and enhanced discoverability. And finally, we hope that improved infrastructure for data sharing will ultimately produce better research by enabling sharing of reproducible results and making the research process more efficient.
6
Goals for TDR (cont.): Provide infrastructure (and other) support that will help libraries: Become an integral part of the Research Data Lifecycle. Provide services at whatever level they are able Benefit from collaborative infrastructure/management and mutual support. Showcase institutional collections of research data Those are the goals for end users of the service. For our direct stakeholders -- academic libraries, we wanted to offer a service that would enable their participation in the research data life cycle in a way that is flexible and mutually beneficial -- so that any institution regardless of its own staff resources could use the service. And finally, we want to enable libraries to help their institutions showcase the research outputs produced by faculty, staff, and students on their campuses.
7
TDL Data Management Working Group
Maria Esteva (TACC); Colleen Lyon (UT Austin); Jeremy Donald (Trinity); Martha Buckbee (UT Southwestern); Christie Peters, Santi Thompson (UH); Kristi Park, Ryan Steans (TDL) Bruce Herbert, Texas A&M (Chair) Charge: Help the TDL determine what kinds of data management services it could provide at a consortial level. Develop criteria Evaluate proposed projects Investigate issues Document findings Make recommendations for services To get ourselves started, we convened an exporatory working group in 2013, to explore possibilities for consortial services for RDM. Its objectives included: Articulating criteria for selecting pilot projects Evaluating proposed projects based on that criteria Selecting no more than three projects to implement Investigating issues related to storage and accessibility of data sets Documenting findings and recommendations for services This group issues a report last fall recommending the use of Dataverse as a platform for a consortial data repository.
8
TDL Dataverse Implementation Working Group
Santi Thompson (Chair) Working Group members TDL member institutions 14 1 7 TDL Dataverse Implementation Working Group Charge: Pilot test, assess, and launch a consortial repository for research data archiving and management. Members; Denyse Rodgers (Baylor); Bruce Herbert, Sean Buckner, Wendi Kaspar, Cecilia Smith (TAMU); Ray Uzwyshyn, Todd Peters (Texas State); Christopher Starcher (Texas Tech); Jeremy Donald (Trinity); Kristi Park, Ryan Steans, Nick Lauland, Laura Waugh (TDL) Formed as a direct result of the previous group’s recommendations, the TDL Dataverse Implementation Working Group began in September 2015 to address all aspects of building a consortial data repository, including policies and workflows, technology implementation, business/funding model, metadata, and outreach. This has been a big effort involving much individual and institutional support -- not all of it represented here by these numbers. I want to thank all our members who participated in both these groups for contributing their hard work and expertise to this process. We most certainly could not have done any of this work alone.
9
Group timeline September 2015: Working Group formed
April -May 2016: Repository Pilot Project September 2016: Soft Launch November 15-16: Data Symposium/TDR training December 2016: Formal Launch The Dataverse Implementation Working Group formed a year ago. It organized into subgroups that were focused on addressing four objectives: budget and business plan, policy and governance, technical configuration, and workflows and outreach. They divided their work into two broad phases, with the completion and assessment of a pilot project serving as a unifying activity across the phases. In the first phase, the group devised draft workflows and policies and configured the repository software in preparation for a pilot project. We pursued and received approval from the institutional review boards of our respective institutions to proceed with the pilot project, which took place in the spring of You can read more about that project and the results in the working group’s report which was released last week. We are now in a “soft launch” phase in which we are talking to all our members -- like today -- about the service and getting people ready to use the service. We will use this fall to educate our members and refine the service and have planned an event in November -- the first of what we hope will be a regular fall symposium -- to start building a community of practitioners around research data management. We are working towards a formal launch date of December 1.
10
Texas Data Repository Centrally hosted, collectively managed
Built in open-source Dataverse De-identified data only And so next we’ll talk about the products of all this work: the Texas Data Repository, a platform for publishing and archiving datasets (and other data products) created by faculty, staff, and students at Texas higher education institutions. The TDR is a single repository hosted by TDL staff, but managed collectively by our member institutions. It is built in open-source Dataverse, originally developed and in use by Harvard University, and I will turn it over to Santi next to talk more about the capabilities of that platform as we have implemented here for TDL membes. Importantly, the repository ONLY accepts de-identified data. It is not intended for data with sensitive or confidential information like social security numbers, health information, etc, even if that data remains unpublished.
11
Part 2: Capabilities of Dataverse
12
Benefits Overview Data storage and sharing Data versioning
Permissions and Terms of Use Measuring impact and tracking use Discipline-based metadata schema Data visualization
13
Dataverse and Datasets
We will start our discussion with some definitions. To understand the functionality and capabilities of the repository, you first must come to know the term Dataverse. Most broadly, a Dataverse is a home for your research project, your community, and your data. One instance of the term reflects the name of the software that underlies the Texas Data Repository. Another instance of the term corresponds to a folder of content. This kind of Dataverse is represented by the cubes on the screen. A Dataverse is a container for datasets (which are made up of your research data, code, documentation, and metadata). A dataverse can also contain other Dataverses, which can be setup for individual researchers, departments, journals, and organizations. For those familiar with Dspace, think of a Dataverse as a collection.
14
Add Data Share Data Version Data Organize, publish, and archive
Share data with trusted group Version Data Maintain multiple versions of data With definitions behind us, we can examine more of the benefits of the repository software. The primary function of the Texas Data Repository is to provide a mechanism for users to upload data and share their datasets as widely as they desire, from completely open to limited access with a trusted group that they designate. Part of the storing and dissemination of data includes the user’s ability to generate versions of the uploaded data, ensuring that that evolution of their work is documented and, if so desired, made available.
15
In this screenshot, we see the versions interface, which provides information on three different versions of the John Doe dataset. The user has the ability to select various versions nad view the diffrences between them, as the green highlighted areas to the left and right of the slide show. Additionally, those browsing can view details about individual versions.
16
In addition to tracking version, the repository also maintains and displays any terms of use or permissions granted to the dataset. This information is accessed through the Terms tab, as shown by the upper red arrow. By default, published data is assigned a CC0 license, so that others may freely access and build upon the work. This designation is highlighted by the lower red arrow on the screen. According to the creative commons website, CC0 indicates that “The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, ass without asking permission.” Note that researchers can alter this license and create custom terms of use for their data if appropriate.
17
Audience members should also note that the interface reminds the user that any data reused from the Texas Data Repository should be fairly cited. The TDR Community Norms spell out the pieces of such a citation in detail. Citation information is also included in the blue shaded box on every dataset page. Using this screenshot as an example, the lower red arrow indicates the citation, including a DOI, which is a unique and persistent identifier. Because the Texas Data Repository emphasizes the citation of data in the repository, and it users unique and persistent identifiers as part of this process, the ability to track the impact of your research is possible. Additionally, the repository displays the number of downloads for each dataset, as shown by the upper red arrow on the screen.
18
The Texas Data Repository also contains a “guestbook” feature.
Guestbooks allow you to collect data about who is downloading the files from your datasets within a Dataverse for which you are the administrator. You can decide to collect account information (username, given name and last name, affiliation, etc.) as well as create custom questions (e.g., What do you plan to use this data for)?). You are also able to download the data collected from the enabled guestbooks as Excel files to store and use outside of Dataverse. The green highlighted area on the screen shows the guestbook question customization interface.
19
No repository would be beneficial without metadata, and the Texas Data Repository contains several metadata schema to help depositors describe their datasets in the most efficient and effective manner. The most basic metadata fields generate a citation for the dataset. Pictured on this slide is a screenshot of some citation metadata. Additionally, TDR supports Geospatial, Social Science and Humanities, Astronomy and Astrophysics, and Life Sciences metadata fields and schema. These disciplinary-based metadata fields are based on established standards, vocabularies, and best practices, including fields that are compliant with the DDI codebook, DataCite 3.1, Dublin Core’s DCMI Metadata Terms, ISO language codes, the International Virtual Observatory Alliance’s (IVOA) VOResource Schema, the OBI Ontology and the NCBI Taxonomy for Organisms.
20
The repository software will also offer visualization options for certain types of datasets.
For example, browsers can use TwoRavens, a visual tool for manipulating statistical data in tabular format, to visualize CSV, Rdata, and dta files. An “Explore” button will automatically appear next to the published datasets in the data list. Here is an image of a small, rather simplistic dataset visualized using the TwoRavens tool. Other visualization tools take advantage of geo-referenced and astronomy data.
21
Other functionality includes the ability to contact the owner of any dataverse or dataset in the Texas Data Repository through the repository software.
22
Other Benefits Integration with OJS
DOI and Citation Metadata Aggregated by SHARE APIs Additional benefits of the Texas Data Repository software include the ability to integrate with OJS, so the online publication can link the dataset that informed the publication, and vice-versa. Metadata in TDR will also be harvested by SHARE, which “gathers, cleans, links, and enhances metadata that describes research activities and outputs—from data management plans and grant proposals to preprints, presentations, journal articles, and research data.” Developers will also be able to build tools to interoperate with TDR by taking advantage of the repository’s APIs.
23
Secure infrastructure
Amazon Web Services Controlled Access Data back ups Finally, a major benefit of the Texas Data Repository is that it offers a secure environment to store and share research data. The Texas Data Repository is hosted in Amazon Web Services, which provides cloud security services and support, network monitoring and protection, and identity management and authentication. Only TDL staff maintain system-wide access to the repository. A controlled number of library personnel will have special privileges for administering an institutional collection of research data deposited by users from their university. These personnel will be limited to one user account per participating member institution. The TDL backs up data in the Texas Data Repository according to its organization-wide backup policy and also maintains copies of data in three distinct locations.
24
Part 3: How the Service Works
Now that we have a better understanding of some of the benefits of the repository, I will next discuss how the Texas Digital Library and its membership will provide services around the Texas Data Repository.
25
Scope & Collecting policies
Research data and products Any discipline; any file type Midsized-data set size Free of confidential or sensitive information Policies that identify the scope of the repository and the kinds of content/collections accepted into it are critical for the long-term sustainability and maintenance of the Texas Data Repository The repository will focus on research data and products, such as codebooks. The repository is not intended for published papers, which are frequently found in institutional or discipline-based repositories. The repository will accept data from any discipline and with any file type. It is designed for regular to mid-sized datasets. As such, individual file sizes up to 2 GB and research projects with up to 10GB total per project will be accepted. Because datasets are intended to be accessed by any number of people, no confidential or sensitive information should be contained in the data. For more information, see TDR website:
26
Texas Data Repository Texas Digital Library (technology)
Steering Committee (TDL & Data Repository Librarians) Member Libraries (service & outreach) Researchers (deposit, search, publish) Administering the Texas Data Repository will be done in a hybrid model. There is a single Dataverse repository hosted by the Texas Digital Library. TDL provides organizational support in the form of training, tech support, and limited coordination. Member institutions will provide services based on their local needs and resources. Their roles are Each member institution will supply a data repository librarian to manage the institutional dataverse, act as a local expert for the repository, and serve on a TDL data librarian repository steering committee. The steering committee will help TDL recognize trends in research data management, address repository issues, and recommend future groups needed for sustaining data management support. Any data curation support will also be provided by the member institution. The user is responsible for depositing data. This hybrid model is flexible enough to accommodate different processes at different institutions (some requiring library intervention during ingest, some may not). Kristi will talk more about the roles and responsibilities of TDL and its membership in the next section.
27
Texas Data Repository Texas Digital Library (technology)
Steering Committee (TDL & Data Repository Librarians) Member Libraries (service & outreach) Researchers (deposit, search, publish) By default, as represented in this slide, researchers will be responsible for depositing research data. They can ingest data to personalized dataverses they have created, to their institutional dataverse and/or to the “root” Dataverse, which is the Texas Data Repository. Audience members should note that this self-deposit process can be customized on an institution-by-institution basis
28
Texas Data Repository Texas Digital Library (technology)
Steering Committee (TDL & Data Repository Librarians) Member Libraries (service & outreach) Researchers (deposit, search, publish) For example, institutions could make it so researchers do not have permission to “deposit” and the library would have to be involved in the deposit process, as represented in the following slide. Institution will need to contact TDL is they would like to modify the researcher self-deposited model.
29
Every TDL member institution that participates in the Texas Data Repository will have their own institutional dataverse, which will contain all the dataverses and datasets produced at their respective institutions. Institutional dataverse are a way to showcase the data depositoed at your institutions. TDL will rely on the designated data repository librarians to maintain your institutional dataverses. The scrolling carousel, outlined in the red box on the screen, give some examples of the institutional dataverses that will be available for those browsing for content. Kristi will now discuss the next steps for implementing the Texas Data Repository at your respective instiutitons.
30
Part 4: Implementation
31
Contracts and Cost Launch TDR under existing contracts with members, using an MOU to articulate the roles and responsibilities Member institutions must sign an MOU to use the repository in No additional charge for use of the repository in the first year ( ) Limits on storage: 2GB per file and 10GB per project First: As we roll the service out this year and make any necessary tweaks, we will do so under existing member contracts and without any additional member fees. However, a couple of things to note: We will ask members to sign a Memorandum of Understanding that governs the various roles and responsibilities of TDL and member libraries. While there will be no additonal charges this year, we will be monitoring costs and reviewing the service. It is likely that a service fee will be required for the service in future years, especially as the repository grows. There are a couple of size limits to be aware of as we start out, which I mentioned before. The system currently can only handle file sizes of 2GB or less per file. And we are limiting the overall size of “projects” in the repository to 10GB. So, for instance, any given research project might contain multiple data files (each under 2GB), ut the overall volume of those files shouldn’t exceed 10GB. Anything going beyond those storage limits would require additional conversation with TDL (and some method for recouping of costs)
32
Memorandum of Understanding
Purpose: To clearly identify the roles and responsibilities of the member institution and Texas Digital Library (TDL) as they relate to the Texas Data Repository. Definitions Roles & Responsibilities Member Institutions Data Repository Librarians Texas Digital Library Ownership (UT Austin as lead agency for TDL) Signatures (Library Administrator and Data Repository Librarian) Next we’ll talk in more depth about the MOU, which is intended to clearly identify the roles and responsibilities of the member institution and Texas Digital Library (TDL) as they relate to the Texas Data Repository. This slide lists out the sections of the repository, which includes a set of definitions, a section on roles and responsibilities of various parties. It articulates the “ownership” of the repository, which is essentially UT Austin as the lead agency for TDL. And it has an area for signatures of representatives from the member institution and from TDL. The MOU is available in the Working Group report, or you can contact us and we’ll send you a copy.
33
Texas Data Repository Texas Digital Library (technology)
Steering Committee (TDL & Data Repository Librarians) Member Libraries (service & outreach) Researchers (deposit, search, publish) The most important part of the MOU is the roles & responsibilities section, which fleshes out the different groups Santi talked about in this diagram. We’ll use these layers to discuss in more depth the roles & responsibilities articulated in the MOU.
34
TDR Roles & Responsibilities
TDL Technology implementation, hosting, backup Training & community development Coordination of steering committee Technical support (helpdesk) Steering Committee: composed of Data Repository Librarians, provides advice and coordination of repository service Data Repository Librarian Serve as local liaison for repository users (registry of local experts) Maintain institutional collections of data (i.e. an “institutional dataverse”) Serve on repository Steering Committee Optionally, “value added” services to campus researchers assisting with data curation, data management planning, etc. Member Libraries Identification of “Data Repository Librarian” Local Campus Outreach Communication with TDL on institutional requirements (e.g. liaise with campus IT) Establishment of relevant institutional policies TDR Roles & Responsibilities Researchers Deposit, share, and/or publish data Maintain self-deposited research data Adhere to TDR Terms of Use Because of the hybrid nature of the Texas Data Repository service, it is essential that institutions participating in the service understand the roles and responsibilities of all parties. At the top is TDL, which provides stewardship, technological oversight, and upgrades of the data repository software infrastructure and its associated websites. It will assure access to and secure backup of data submitted to the repository. It will also coordinate a membership-wide steering committee of data repository librarians, provide technical support for all TDR users via the TDL Helpdesk and provide training and professional development opportunities to data repository librarians. At the bottom of stack are researchers, who deposit, share, and (maybe) publish data. They are responsible for maintaining any data that they’ve self-deposited and for adhering the TDR Terms of Use. In between, In the red area, you see two different sets of responsibilities related to our member institutions, who provide the true “service layer” for the repository. We ask that our members appoint someone to serve as “Data Rrepository lLibrarian”, promote and support the TDR service locally, and establish institutional policies around copyright inquiries, takedown requests, and other rights decsisions and inform TDL when necessary repository actions are required. Finally, we need our member institutions to help us connect with any necessary departments on campus to get the repository service set up. Notably, we will need to integrate the repository with authentication systems on each campus via Shibboleth and will need someone to liaise with central IT offices, or whomever is responsible for managing authentication locally. The Data Repository Librarian role will be responsible for serving as a local contact for repository users -- they will be listed as a “repository expert” on the TDR website -- and for maintaining institutional dataverses. They will also serve on the repository steering committee, which is the governance layer operating between TDL and the member institution -- providing advise and guidance to TDL on all matters related to the TDR.
35
Authentication and User Accounts
Researchers/depositors will log in via local institutional accounts (via Shibboleth) Default permission to deposit, publish, create “dataverses” in: Institutional dataverse “Root” dataverse Not other institution’s dataverses I mentioned on the previous slide the need to integrate the repository with local authentication systems at each institution, and I want to talk a little more about that. We use Shibboleth for this purpose, as we do for both our DSpace hosting and Vireo hosting services. By doing this researchers who use the service will be able to log in via their local institution accounts. From a security standpoint, it allows us to control who has accounts and who, by default, has privileges to deposit in the repository.
36
Other types of user accounts
Data Repository Librarians will log in via specially created accounts. Researchers can invite non-affiliated researchers to create accounts for sharing/collaboration. These user accounts do not have deposit permissions by default. Not everybody will log into via Shibboleth. There will be other methods for account creation -- so that researchers can invite non-TDL-affiliated users to share/collaborate in the system. But those users will not have deposit privileges, at least not by default. I won’t go into detail on this process here, but we will cover it in training and in the technical documentation.
37
Repository Training TDL Data Symposium (November 15-16)
December webinar User guide ( Speaking of training, we will be offering a full TDR training workshop as part of TDL Data Symposium on November For the most part, the folks attending that event have been identified and invitations have been sent. If you have any questions about whether you should be attending, please let us know as soon as possible. We will be hosting another webinar in December as part of the formal launch of the repository and will provide additional instruction for repository use there. And, of course, we have the user guide available on the TDR site.
38
Part 5: Wrapping Up I’ll spend the last few minutes reviewing a few things and talking about our future plans for TDR, and then we’ll have some time for Q&A. Please be thinking of questions as we finish up the prepared material and feel free to go ahead and enter those q’s in the chat window.
39
More information Texas Data Repository Website Information Sheets
Info Sheet For Researchers: Sheet.pdf Info Sheet For Librarians: Sheet.pdf Info Sheet For Administrators: Information-Sheet.pdf Other information User guide: Metadata dictionary: Dictionary.pdf Policies (external): TDL Dataverse Implementation Working Group Report: As you can probably tell, we’ve put together a good bit of documentation around the TDR service, and this slide lists the places on the TDR website where you can find them.
40
More information, cont. November 2016: TDL Data Symposium
On November 15-16, 2016, the TDL will hold its first ever Data Symposium at Baylor University Libraries. The two-day Symposium is designed to support the development of a core community of TDL member librarians who provide (or intend to provide) research data services on their campuses. Along with opportunities to discuss and learn about research data management generally, the Symposium will offer a half-day training workshop on Dataverse and the Texas Data Repository service. December 2016: Formal launch webinar After working throughout the fall of 2016 to onboard and train participating libraries, the TDL will formally launch the Texas Data Repository in December with a press release and a second webinar for TDL members (date TBD). And as I mentioned we’re planning a couple of events throughout the fall to make sure everyone has the information they need to get started.
41
Looking Ahead Digital preservation
Integrating with other TDL Applications Including data into local discovery systems Accommodating “big” data Re-architecting to scale Looking ahead, over the next year, we have a list of things we want to improve and build upon with the repository. The Dataverse working group, in its report, made a series of recommendations related to digital preservation of data, and we itnend to look into those and make some decisions related to long-term preservation policy and process. We also want to look into integration with other TDL applications (like Vireo, for isntance) and with local discovery systems -- so that, for instance, your institution can make its deposited data products discoverable through your ILS. Finally, we want to work on improving the scalability of the system, by developing workflows for accommodating large data stores -- most likely by using the dataverse repository as a metadata record pointing to big datasets hosted elsewhere -- and by making sure that the systems are architected in ways that will scale up effectively as the overall volume of data in the repository grows.
42
Get started Contact the TDL to begin implementation for your institution Get TDL in contact with relevant IT personnel to implement Shibboleth. Review the documentation on the Texas Data Repository site at Review and sign the TDR MOU. If you’re institution is ready to begin the implementation process for your campus, we ask that you contact us through the TDL Helpdesk at When this happens, we will provide you with the TDR MOU and get in contact with the relevant IT personnel on our campus to begin shibboleth integration.
43
Questions and Discussion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.