Data Wrangling at Rice University Denis Galvin Rice University MetaArchive Annual Membership Meeting Houston Texas.

Slides:



Advertisements
Similar presentations
Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library
Advertisements

ETD Preservation Workshop Session Four: Collection Management for Preservation Gail McMillan, Virginia Tech.
DSpace & OAI-ORE: Object Reuse and Exchange
©2012 Microsoft Corporation. All rights reserved..
Overview of LOCKSS. Session Learning Objectives  Provide an overview of the LOCKSS architecture.  Describe the LOCKSS polling process  Describe how.
Tips: * Just delete the page you don’t like. * Save the page you want only before submit to us.
Mark J. Myers Electronic Records Archivist, KY Dept for Libraries and Archives (2001-May, 2014) Electronic Records Specialist, TX State Library and Archive.
AMSTERDAM Best Practices for Dynamic Campaigns. Presented By: AMSTERDAM Naomi Hauser Client Success Manager Points: 14,285 Rank: 10 Level: Platinum.
SharePoint 2013 Catalog Sites Brian Culver ● SharePoint Saturday DFW ● March 7, 2015 Build a SharePoint 2013 Search Driven.
SharePoint 2013 Search NO LONGER JUST FOR ADMINS, NO LONGER JUST FOR FINDING DOCUMENTS.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
  Adds “Share” button to any webpage  Add it to a template page so it’ll be on every page  Select.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
RefWorks: Advanced February 13, What We’ll Cover Today Managing Your Personal Database Searching Your Personal Database Linking to the Full Text.
The attic & the parlor CHM collections & exhibitions overview May 5, 2006 Kirsten Tashev VP Collections & Exhibitions.
Nutch Search Engine Tool. Nutch overview A full-fledged web search engine Functionalities of Nutch  Internet and Intranet crawling  Parsing different.
Wrangling DigiTool Data For LOCKSS Brian Meuse - Digital Collections Systems Analyst University Libraries Boston College MetaArchive Cooperative Annual.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
WHS joined Archive-It in the fall of 2010 Began capturing state information with the capture of Governor Jim Doyle’s websites at the end of the administration.
CORE 2: Information systems and Databases CENTRALISED AND DISTRIBUTED DATABASES.
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
Taking the Ack Out of Acquisitions Presented by Tim Spindler C/W Mars, Inc. Jennifer Pringle Sitka.
The Rejection of D-Space: Selecting Theses Database Software at the University of Calgary Archives Lisa Atkinson Archival Program Manager  University.
EBSCOhost 2.0 GOLD/GALILEO ANNUAL USERS GROUP CONFERENCE August 1, 2008.
What types of resources will the collection contain? JPG images of The Cure’s album covers. Metadata about the items: format, identifiers, title, release.
Maintaining Your Website Using Cascade CMS Presented by UC Irvine Health Information Services and Marketing & Communications.
Plugin Lifecycle Andrew Fabian MetaArchive Annual Membership Meeting Atlanta, Georgia Friday October 24, 2008.
Click to edit Master subtitle style 12/16/09 MetaArchive Architecture Monika Mevenkamp MetaArchive Annual Membership Meeting Houston, Texas Friday October.
AIP Backup & Restore Sunita Barve NCRA, Pune. AIP The latest version of DSpace 1.7.0, supports backup and restore of all its contents as a set of AIP.
Report on Preservation of ETDs: The LOCKSS Prototype The work of Kamini Santhanagopalan Virginia Tech Graduate Student in Computer Science Reported at.
Preparing for Destiny A few things you can do to get your data ready!
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
IR Applications at University of Saskatchewan Library: present and future CARL Institutional Repository Luncheon Saskatoon, SK June 8, 2005 David Fox Head,
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Library Research Library research is an important part of research. You can often find scholarly journal articles or books at a college/university library.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
CD Web XMS Training How to use the Xeno Media web site content management system.
Token TOKEN User Groups Roles Claims Authentication Provider Identities STSUser Authentication Method UserGroup Role Assignment Permission Level FD.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
| imodules.com Top 10 FAQ in Application Support Kelly Schmiedeler & Amber Quayle.
Introduction to Lot Attributes Forrest Mori QAD Item Attributes.
Opportunities & Obstacles: Prospects of Digital Assets.
Dspace at AUS | American University of Sharjah | DSpace at AUS AMICAL Conference 6 April 2012.
DotNetNuke v4 Overview Stan Schultes Stan Schultes Enterprise architect / application developer Enterprise architect / application developer Conference.
Algebra2/Trig1.  The following website has a variety of information concerning scatter plots.  While viewing the site answer the questions given to.
Migrating IRs: From “Free” to “Fee” Margo Duncan, MLS Terra Bianchi Gullings, MLIS, CA.
How to Complete B-13 Checklists in CIMS How to Complete B-13 Checklists in CIMS A Presentation for Transition Coordinators (TCs) and Transition Coordinator.
How to Complete B-13 Checklists in CIMS How to Complete B-13 Checklists in CIMS A Presentation for Transition Coordinators (TCs) and Transition Coordinator.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
CMU Libraries’ Digital Assets Preservation Strategy Presenter Gabrielle V. Michalek Principal Archivist and Head, Archives/Digital Library Initiatives.
1 Chapter 5 (3 rd ed) Your library is an excellent resource tool. Your library is an excellent resource tool.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Connecting From Home Editing at Home(You don’t have to.)
Click to edit Master subtitle style 9/26/2016 MetaArchive Architecture Monika Mevenkamp Emory University.
SharePoint Solutions Architect, Protiviti
Item and Lot Attribute Orientation Essentials
Metadata Editor Introduction
SASA Website Redesign.
Adventures in ETD metadata wrangling:
Attributes and Values Describing Entities.
Latin American Government Documents Archive, LAGDA
Finding Magazine and Journal Articles in
Implementing an Institutional Repository: Part II
Conducting Studies of the Issue
Health On-Line Patient Education Web Site
Welcome to Physics 5305!!.
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Data Wrangling at Rice University Denis Galvin Rice University MetaArchive Annual Membership Meeting Houston Texas

ETDs at Rice Dspace Collection in a database driven by programming 42,581 G Brief and Full records

ETD Structure Brief Full ?show=full PDFs /13401/ PDF?sequence=1

Testing All testing done on Centos using VMware Plugintool testing Run one daemon Copying other sites plugins

Manifest Page

Dublin Core request?verb=ListRecords&metadataPrefix=oai_dc&s et=hdl_1911_8299

Sub-Manifest Page Links to ETDs within DSpace

Plugin Configuration parameters: Base URL For the sub-manifest pages: Part (integer) ‏

Crawl Rules

Crawl rules explained Include master manifest page: Include sub-manifest page: Include items under /bitstream Include OAI-PMH link

Crawl rules explained Include full record OAI-PMH link on manifest master Pulls in Dublin Core oai/request?verb=ListRecords&metadat aPrefix=oai_dc&set=hdl_1911_8299

Collection Sizes Recommended AU between 1G and 10G 5 AUs between 7 and 10G Create new AUs as collection grows

Tips Don’t trust testing with the plugin tool Read documentation Test with Run One Daemon Test on the caches Use expert mode to write plugin

Questions?