Implementing a Data Publishing Service via DSpace Jon W. Dunn, Randall Floyd, Garett Montanez, Kurt Seiffert May 20, 2009.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
1. Author submission 2. UMI/BePress dissemination 4. Metadata management 5. Permissions acquisition 3. Repository ingest 6. Embargo management Open Access.
Web Plus Overview Division of Cancer Prevention and Control National Center for Chronic Disease Prevention and Health Promotion CDC Registry Plus Training.
Institutional Repository for CDU What’s in your bottom drawer? Ruth Quinn, Director Library and Information Access Charles Darwin University.
OPEN RESEARCH DATA, EPFL, 28 October 2014, M. Töwe, M. Bärlocher docuteam packer: viewer and editor for file structures and metadata.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Depositing e-material to The National Library of Sweden.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
The KnowledgeBank: Powered by DSpace Laura Tull Systems Librarian Ohio State University Libraries WiLSWorld July 27, 2004.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
How SharePoint Has Made Access To My Digital Information At IU More Convenient September 29 th, 2011 Presenters Cory P. Retherford Richard LeBeau.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Planning for a University of Guelph Institutional Repository: DSpace Implementation Helen Salmon & Ron MacKinnon Presentation to Information Services Committee.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of Pretoria.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional research repository for the University of Pretoria.
Welcome to the Minnesota SharePoint User Group. Introductions / Overview Project Tracking / Management / Collaboration via SharePoint Multiple Audiences.
BY THE “DIGITAL UNDERGROUND ” DSpace Workflow & Tutorials for the Backyard Botanicals Collection.
SharePoint Users’ Group Indiana University Users’ Group Committee Leaders Andy Hill Brian Hughes Richard LeBeau Cory P. Retherford.
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
DSpace: Introduction and Starting an Institutional Repository
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Enabling E Research ANU Data Commons. What is it ? Building a repository for data sets o data can be deposited o updated o published to Research Data.
DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Journalism & Media Studies Graduate Student Culminating Work : Steps for Submitting to the Campus Digital Archive at USFSP November 21, 2011 by Carol Hixson.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
DSpace vs Fedora Ralph LeVan OCLC Research. What Do You Want From a Repository? How do you create your metadata? How do you assemble your objects? How.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
CSUN eCommons Submitting Learning Objects to CSUN eCommons: A Preliminary Guide February 7, 2008.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
IUScholarWorks Repository Update Jim Halliday, Stacy Konkiel & Jennifer Laherty.
The Storage Resource Broker and.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Here are some things you can do while you wait 1.Open your omeka.net site in your browser (e.g. 2.Open.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
VI-SEEM Data Discovery Service
VI-SEEM Data Repository
What Is Sharepoint? Mohsen Ashkboos
Implementing an Institutional Repository: Part II
York University Libraries Research in YorkSpace
Implementing an Institutional Repository: Part II
Digital Library and Plan for Institutional Repository
How to Implement an Institutional Repository: Part II
Digital Library and Plan for Institutional Repository
Presentation transcript:

Implementing a Data Publishing Service via DSpace Jon W. Dunn, Randall Floyd, Garett Montanez, Kurt Seiffert May 20, 2009

Outline IUScholarWorks Massive Data Storage Service Example of the data publishing need What is the data publishing service Conceptual overview of DSpace implementation

IUScholarWorks IUScholarWorks – Indiana University's (IU’s) scholarly communication services IUScholarWorks Team – members from IU Libraries and the Digital Library Program Current services: –A DSpace-based IR - articles, papers, technical reports, etc –An Open Journal System-based scholarly journal hosting service

Overview of MDSS Massive Data Storage System (MDSS) Current system for research data storage Installed in 1998 Based on IBM developed High Performance Storage System (HPSS) software It offers over 2.8 petabytes of disk- and tape-based storage. Distributed between Indianapolis and Bloomington campuses

IUB Subsystem IUPUI Subsystem Research Network Bloomington Users Indianapolis Users HPSS Movers HPSS Movers Research Network TCP/IP Wide Area Network FC SAN IUB Campus Network IUPUI Campus Network Disk ArraysTape LibraryDisk ArraysTape Library HPSS Core Servers Distributed between IUB and IUPUI

Transferring Files in MDSS Fastest Methods –hsi –Gridftp –pftp_client –kerberized ftp Convenient Methods –Sftp –https –Samba –Hpssfs

Example of Data Publishing Need Linked Environments for Atmospheric Discovery (LEAD) –Weather forecasting experiments –Want to capture the entire workflow from an experiment –Each workflow ~10GB –They are looking for a mechanism to preserve the workflows and make them available to others

IUScholarWorks Data A new service of the IUScholarWorks repository Allow for the publishing of datasets Data will have a persistent URL so it can be linked to publications The service will combine our DSpace repository with IU’s Massive Data Storage system (MDSS), a system that researchers are already uses If a file is over a certain size, it will be stored in MDSS Allows discovery over the Web Preservation – bit level

Collaborative effort IU Libraries Research Technologies division - IU’s central IT organization, University Information Technology Services (UITS) Digital Library Program (a collaboration between the Libraries and UITS) IU's Office of the Vice-Provost for Research

Current Activities Two phased implementation –Phase one – more manual on the part of the DSpace administrator, user –Phase two- more automated system Convene focus groups Metadata requirements DSpace/MDSS integration

Two scenarios Researcher already uses MDSS to store their data Researcher does not use MDSS to store their data

Classes of Files 1.Small Data Files – would go directly into DSpace in the underlying asset store as bitstreams 2.Large Data File 1.Preexisting datasets in MDSS account directory 2.User needs to upload new datasets to MDSS

Conceptual overview of DSpace implementation

IUScholarWorks Data in DSpace Recap of the primary goals of the service: –Discovery and access of datasets and related publications through the IUScholarWorks Repository service –Facilitating the submission process for both the researcher and collection manager

IUScholarWorks Data in DSpace Discovery and access of datasets and related publications through the IUScholarWorks Repository service –DSpace records that are searchable, indexed, and harvested and available at stable URL’s –DSpace records that contain DSpace bitstreams for small datasets –DSpace records that link to large datasets in IU MDSS

IU MDSS MDSS web server HTTP Server hpssfs filesystem IUScholarWorks Data: Linking to MDSS and delivery via HTTP Item record with URL’s of datasets in MDSS

IUScholarWorks Data in DSpace Facilitating the submission process for both the researcher and collection manager –Because some datasets are external in MDSS, this is inherently an asynchronous process for both –We will facilitate the process for submitters via the DSpace Configurable Submission system –We will facilitate the data collection manager’s process via steps in the DSpace workflow system

Instructions and preparation Describe item metadata form(s) Review step File upload step MDSS and dataset info and form Finalize/ Accept License IUScholarWorks Data: Item submission user interface DSpace Configurable Submission System More instructions… leave service, go to MDSS to move files to drop box Perhaps a step with no form or action that outlines the process to come? Also establish relationship to other items or published works using metadata here

Preexisting datasets in MDSS account? Upload new datasets to MDSS? Save item progress in personal workspace YES Move files to IUSW Data drop box Upload new files to IUSW Data drop box YES Resume submission process from workspace IUScholarWorks Data: File management in IU MDSS

Instructions and preparation Describe item metadata form(s) Review step File upload step MDSS and dataset info and form Finalize/ Accept License DSpace Configurable Submission System More instructions… offline interaction with MDSS to move files around would happen here Perhaps a step with no form or action that outlines the process to come? Also establish relationship to other items or published works using metadata here Submitter can still add small files directly to this item if desired Item progresses to edit/accept workflow Submitter lists locations of any files in the drop box IUScholarWorks Data: Item submission user interface

IUScholarWorks Data: Collection Manager Workflow Gather file location info Enter workflow queue Files exist in drop box? Contact submitter, resolve issues Move datasets from drop box to IUSW account Query MDSS technical metadata Edit IUSW Data item metadata Verify item accuracy and dataset accessibility NO Claim workflow task from queue Accept submission into IUSW Data Service Link item to MDSS datasets Still need this step to make sure everything happened correctly

IU MDSS Initiate MDSS actions (move datasets, etc.) Instructions and preparation Describe item metadata form(s) Review step File upload step MDSS and dataset info/form Finalize/ Accept License IUScholarWorks Data: Item submission user interface Phase 2, automated workflow DSpace Configurable Submission System Non-interactive processing steps Update metadata Query MDSS technical metadata (checksum, etc.)

End result… End result is a published data item that contains: –Descriptive metadata –Links to related publications –Actual DSpace Bitstreams for small datasets –URL links to large datasets in IU MDSS –Technical metadata about both classes of datasets

Beyond linking via URL… Storage abstraction layers to get to IU MDSS –DSpace support for Storage Resource Broker (SRB) –Akubra, a low-level storage API from Topaz and Fedora Commons Direct mounting of MDSS directories on the DSpace server –Configure a separate DSpace asset store using a network mounted filesystem from MDSS

Beyond linking via URL… These solutions would all imply the same thing: configuring additional DSpace asset stores and performing item registration –We don’t want to use one of those methods for the default asset store and upload very large files through the DSpace web interface

Beyond linking via URL… But… item registration of existing files is a batch oriented command-line process –assumes ready to go packages with descriptive metadata, just like importing items

Beyond linking via URL… We lose the convenience of the submission interface to facilitate the service The ideal solution would be to connect to IU MDSS as an alternative asset store and be able to register files to items through the submission interface, versus just being able to register files as new items

Questions, opinions, or comments?