An Overview of Data-PASS Shared Catalog

Slides:



Advertisements
Similar presentations
UK DATA ARCHIVE Louise Corti, ODAF April UK Data Archive an internationally-renowned centre of expertise in data acquisition, preservation, dissemination.
Advertisements

A Community Approach to Preservation: Experiences with Social Science Data ASIST Summit 2010 Jonathan Crabtree April 9, 2010.
Karen Dennison Accessing international survey data collections via ESDS British Academy, Tuesday 14 March 2006 ESDS International.
The Data-PASS Partnership: Collaboration, Agreements, and More Myron Gutmann ICPSR University of Michigan.
The Alliance for Data Archive Technologies: Looking towards a Common Future Myron Gutmann, ICPSR Ben Evans, ASSDA Deborah Mitchell, ASSDA Kevin Schürer,
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
V-GISC Presentation – ET_WISC – Geneva - February v-GISC key functionalities ET_WISC meeting 2-5 February 2010 Jean-Pierre Aubagnac, Jacques Roumilhac.
Helping Journals to Upgrade Data Publications for Reusable Research Sonia Barbosa (Project Manager) Eleni Castro (Project Coordinator) Institute for Quantitative.
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
Title Subtitle. Building Relationships: “A Foundation for Digital Archives” Digital Object Repository Systems in Digital Libraries (DORSDL) September.
Replicated & Distributed Storage Technologies : “Impact on Social Science Data Archive Policies” IASSIST 2010 Ithaca, New York Jonathan Crabtree June.
A Community Approach to Preservation: “Experiences with Social Science Data” Community Approaches to Digital Preservation 2009 Jonathan Crabtree February.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
IASSIST 2003 Changes in the Way Data Archives Process Data Data Processing at ICPSR Darrell Donakowski.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Archiving our Social Science Digital History ECURE 2005 March 1, 2005.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Release 4 of the COUNTER Code of Practice for e- Resources and new usage- based measures of impact Peter Shepherd COUNTER May 2014.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
January, 23, 2006 Ilkay Altintas
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Bryan Beecher University of Michigan Director, Computing & Network Services E: W:
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
ArcGIS for Server Security: Advanced
OAIS Producer (archive) Consumer Management
DataNet Collaboration
Joseph JaJa, Mike Smorul, and Sangchul Song
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
CS 501: Software Engineering Fall 1999
CFI John R Evans Leaders Fund Digital Data Management
Data stewardship life cycle
XML Based Interoperability Components
What’s New in Colectica 5.3 Part 1
Introduction to D4Science
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Enabling direct data access to social science research data
Research Data Management
LOD reference architecture
WORLD SCIENTIFIC USER GUIDE
Robin Dale RLG OAIS Functionality Robin Dale RLG
Dataverse for citing and sharing research data
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

An Overview of Data-PASS Shared Catalog Micah Altman, Harvard University

Contents Collaboration components What the Shared Catalog Does How it Works Syndicated Storage Integration

Collaboration Components Partnership Agreement Agreement to establish good practice Preservation copies of data collected Transfer Protocol: support in case of archival failure Operations Central database of leads for acquisition Development of shared procedures Review of acquisitions Documentation of procedure Identification & selection Metadata Security Confidentiality Catalog Discovery Layered Services An Overview of Data-PASS Shared Catalog

Search Across Entire Partners’ Catalogs Finding Data Search Across Entire Partners’ Catalogs Find Studies Collected for Datapass Simple and Fielded Search Browse by Subject, Date, Source An Overview of Data-PASS Shared Catalog

What does it look Like? An Overview of Data-PASS Shared Catalog

Viewing Study Information Study Information: Author, Title, Abstract, Citation,… Permanent Citation (optional) Provenance: Data Author, Producer, Distributor Chain of Responsibility for Metadata Files: Link to that study at partner site List of files (optional) Extended cataloging information Full Catalog Record DDI Record Variable-level information, descriptive stats (optional) An Overview of Data-PASS Shared Catalog

Delivering Data Through Partners’ Sites Through Shared Catalog Shared catalog results always give link to data at partners site If no file information supplied to catalog, this is the only option Through Shared Catalog Catalog server may cache a copy of data for performance Catalog can bundle requests for multiple files Through Analysis Services If partner site runs VDC (or data access proxy), analysis and extraction is available Download data in multiple formats Extract subsets, in multiple formats, with citations and UNF’s Run descriptive stats, crosstabs Advanced analysis -- dozens of statistical models An Overview of Data-PASS Shared Catalog

Current Participation Studies Listed Files Listed Files Available Analysis Available HMDC/ Murray Archives All From Catalog Yes ICPSR From Archive If hosting DataVerse software locally NARA Planned Planned: Selected files made available through The DataWeb ROPER ODUM An Overview of Data-PASS Shared Catalog

Advanced Analysis DataVerse network and Zelig make it easy to provide access to any statistical model available in the R statistical language Specification-driven architecture: describe the model to run the model Currently provides configuration for >25 models: regression, limited dependent variables, factor analysis, event counts, duration models An Overview of Data-PASS Shared Catalog

Supporting Technologies Metadata: OAI+DDI+XSL Citation, validation: Handle+UNF Workflow, repository, analysis: Dataverse Network (VDC) An Overview of Data-PASS Shared Catalog

Each partner catalog is exposed via Metadata Harvesting Each partner catalog is exposed via Dataverse Network (VDC) via OAI Other OAI Server, running on-site Proxy OAI Server, running at HMDC Harvested ad-hoc XSL Metadata to cross-walk applied Made available through OAI DDI-lite schema subset used for exchange Data Documentation Initiative (DDI) – international effort to establish specification schema for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences Provenance, and structural metadata, including: document description (meta-meta data), study description, file description, variable description http://www.icpsr.org/DDI/ An Overview of Data-PASS Shared Catalog

UNF – Universal Numeric Fingerprints Same UNF regardless of hardware, operating system, statistical software, database, or spreadsheet software. UNF’s combine: generalized rounding (dessication), normalization (canonicalization), fingerprinting (cryptographic hash, e.g. SHA256) Available as: C++, R-stat language, Stata, SAS, S-Plus See: http://cran.r-project.org/src/contrib/Descriptions/UNF.html An Overview of Data-PASS Shared Catalog

Technologies Dataverse Network (<http://thedata.org>) Includes integrated developments in web application software, networking, data citation standards, and statistical methods designed to put some of the universe of data and data sharing practices on firmer ground. It facilitates the public preservation and distribution of persistent, authorized, and verifiable research data, with powerful but easy-to-use technology, whether or not the data are in the public domain. The project increases scholarly recognition (including formal scholarly citations to articles and to data sets) and distributed control for authors, journals, and others who make data available, facilitates data access and analysis by the scholarly community, and still enables professional archives to seemlessly provide extensive preservation and other services. Shared Catalog Support Provides ingest and dissemination framework and web-based GUI Provides OAI server Provides Data Services Broker, which supports identification, format conversion An Overview of Data-PASS Shared Catalog

An Overview of Data-PASS Shared Catalog http://dvn.iq.harvard.edu/dvn

Architectural Overview Data Mirror Metadata Catalog Harvester Online Catalog Online Analysis View Information on Data Through Catalog Link to Data at Partner Site Access Data With Extraction and Analysis, Through Catalog Direct to Partner Sites <XSL> Crosswalk proxy Search Shared Catalog OAI An Overview of Data-PASS Shared Catalog

Metadata described in detail on Data-PASS Site Metadata Standards Study level: title, author, abstract, id, usage info, … [Required] Files: What is it? -- Description, URI [Required for backups] Is it valid? UNF (universal numeric fingerprint), MD5, … [Improves backup reliability] Variables: Description, ID, Location, … [Enables on-line analysis] Metadata described in detail on Data-PASS Site An Overview of Data-PASS Shared Catalog

Distributed Preservation - Prototype Study Level Include usage metadata Include identifying tags Include Scanned Usage Agreement as File Files Provide URI’s in metadata Allow access by catalog harvester For more reliability, include MD5 or UNF’s Copies Current version of tagged studies will be mirrored at HMDC Resources may also be cached for speed An Overview of Data-PASS Shared Catalog

Distributed Backup Research – Potential Research Schemas to express inter-archival preservation commitments Asymmetric mirroring to match distribution of holdings across partners Preservation of versioned resources Syndicated Storage Technology Integration LOCKSS SRB/IRODS Distributed Data Manager An Overview of Data-PASS Shared Catalog

More Information Shared Catalog: http://vdc.hmdc.harvard.edu/dataverse/DATAPASS/ Dataverse Network Software: http://TheData.Org Data Citations, UNF’s: http://www.dlib.org/dlib/march07/altman/03altman.html Metadata and Other Partnership Documentation http://www.icpsr.umich.edu/DATAPASS/about.html An Overview of Data-PASS Shared Catalog