Dataverse at Scholars Portal Alan Darnell Director, Scholars Portal
Ontario Council of University Libraries (OCUL) Scholars Portal is the technology support service of OCUL … Tackle problems that are too big for any one institution 21 Libraries 450,000 FTE
Numeric Data Published data, highly curated
Geospatial Data Published data, highly curated
SP Research Data Repository Thank you IQSS ! dataverse.scholarsportal.info
Dataverse 3 Open to any researcher – 77 published datasets – 472 studies – 6,357 files Slow but growing uptake from libraries – 12 institutional dataverses Wide range of file formats – WARC files, Twitter feeds, spreadsheets, documents, historical census data, survey files, image files, weather data, etc…
Dataverse 4 Stronger institutional focus DataCite DOIs Shibboleth Canadian Access Federation Internationalization (coming soon) September 2016
A wish list
Ontario Library Research Cloud Utilize existing network and data center facilities in Ontario universities to build a PB-scale distributed storage network using OpenStack object storage (Swift) and commodity storage hardware cost-effective long-term storage for digital assets 5 nodes / 370 TB Ottawa, Queens, York, Toronto, Guelph
Wish 1 : Big Data Support for in place ingestion of files stored in the cloud Storage Model that supports block and object services – OpenStack Swift & S3 – ownCloud and DropBox
Dataverse > Archivematica > Swift Image Credit: Julie Allinson, University of York Storage Service Dashboard OLRC
Wish 2 : Digital Preservation PREMIS – Standard vocabulary to record preservation actions like ingest, transformation PRONOM – Enhanced file identification – droid, Siegfried, FIDO METS – Structural representation of complex digital objects Native XML Export – Concern about JSON as a preservation format
Wish 3 : Plugin Architecture Allow domain specialists to extend file support through a plugin architecture – Encourage and enable community contributions Methods – Describe – Thumbnail – View – Download – Explore – Transform My New File Format
Wish 4 : Tools for Analysis Jupyter and Zeppelin are interactive web based tools used for analysis of a wide range of data formats Use of Apache Spark as a processing engine for big data