ALA Annual June 2008 CONTENTdm in ConTEXT Geri Ingram OCLC Digital Collection Services Manager, Customer Services
Who should attend this morning? To get the most from the next hour and a half, Either you have: Experience building CONTENTdm collections OR Attended CONTENTdm Training Hands-on: on-site or on-line Demonstration only: Basic Use Webinar
Outline Part One: Review Software architecture Collections and Projects Part Two: Demonstration Importing and searching full text Research papers Yearbooks Postcards Books
Acquisition Stations or “clients” JPEG2000 Extension OCR Extension Administration tools Statistics Authorization settings Exporting to WorldCat Administration tools Statistics Authorization settings Exporting to WorldCat Custom Web interfaces Web-based ‘Add’ CONTENTdm Server Unix (Linux, Solaris) or Windows (2000, 2003) CONTENTdm Server Unix (Linux, Solaris) or Windows (2000, 2003) CONTENTdm site pages CONTENTdm Architecture Archival repository OCLC Connexion ‘digital import’ Search engines E.g., Google® WorldCat.org WorldCat Local Search engines E.g., Google® WorldCat.org WorldCat Local
Configuring a collection What’s a Collection? A group of objects (items) that Share the same metadata schema Live on the same CONTENTdm server How many Collections can I have? Up to 200 collections per server How many items can be in a collection? 16 million items per collection
Populating a collection Through the use of a “Project” What’s a CONTENTdm Project? A workspace on your personal computer Into which you import up to 5000 items at a time Where items reside until you upload to the server A group of settings that are applied to the items E.g., image display resolution, file format, branding E.g., automatic metadata input How many Projects can I have at one time? Limited only by your disk space on the workstation
RELATIONSHIP of Collection to Projects Collection A single Collection Many Projects Collection Project 1 Project 3 Project 2
What’s a CONTENTdm object or item? CONTENTdm can store/index/search items in various formats Display any file format: Viewed with a Web browser natively or viewed via a plug-in Including: JPEG, JPEG2000, TIFF, PDF, WAV or MP3 audio, AVI or MPEG video, html, MrSID ® Simple items—e.g., images, sound files, research papers (We’ll load papers today as PDF items.) Compound objects—multiple simple items assembled together
CONTENTdm Compound Objects CONTENTdm defined classes Documents We will load a section of a yearbook Postcards We will load a handwritten postcard with a typescript Monographs (Structured documents) We will load a book with chapters Picture Cube (six-sided views)
Dublin Core metadata element set
Review: Basics of CONTENTdm Simple and Qualified Dublin Core element sets offered 100 fields per collection Only DC.Title required to create a record Dublin Core is basis for cross-collection searching Text is stored in a metadata field 128,000 characters per “full text search” field 200 collections/server—i.e., 200 different metadata schema
Providing searchable text Remember: metadata fields can be made searchable In addition, full-text, extracted from the digital object itself can be stored in a metadata field designated as “Full text search” data type, in any of three ways: 1.Extracted (by server) from PDFs (if embedded to begin with) 2.Imported as.txt transcript Typescripted from handwritten or OCR’d in advance (external OCR engine) 3.Generated by OCR “on-the-fly” (integrated ABBYY FineReader®)
Review: Populating collections Acquisition Station Projects (PC client) Add from CONTENTdm Administration (Browser-based) Connexion digital import (WorldCat cataloging client function)
Review: 1. Acquisition Station—PC client Project workspace Project settings Tools to manage Image settings Metadata settings
Review: 2. Add –web based function Platform independent Simple item add function may be used for single import of: Images—.jpg,.jp2,.tif (if bandwidth allows) PDF—single and multi-page Audio Video
Review: 3. Connexion digital import function
Simple items—some examples that carry text Reformatted materials e.g., books, documents, posters, broadsides, memos—scans may all contain text Born digital files e.g., PDFs, single or multi-page Single-page PDFs viewed as items May opt for ‘in-line’ Adobe viewer Multi-page PDFs may be handled as if compound object of type “document” Server side conversion Import as simple item regardless of conversion choice
Excerpted from Creating and managing text collections using CONTENTdm
First things First-- Recap: Prepare the Collection For importing searchable text items, whether singly or in batch—at minimum: 1.One empty, searchable field is configured as “Full text search” data type to hold text 2.Collection is configured to treat PDFs as compound objects. 3.Collection is configured to provide Full Resolution file management. 4.Other fields are made searchable, hidden, moved, or added, as needed. 5.OPTIONAL: the Web templates are adjusted to suppress display of components of compound objects in search results.
Recap: Prepare the items These PDFs have been created with searchable text embedded. Beware: Not all PDFs are created equal!
Demonstration 1a--Simple items One simple item—PDF with ‘hidden’ text Acquisition Station Import file Web-based Add
Demonstration 1b--Multiple simple items (Acquisition Station) A batch of simple items, two ways: Method A: Import a batch of simple digital items stored in folders (where Template Creator only is used to automatically generate metadata) Method B: Import a tab-delimited text file naming and describing the digital items (where metadata also resides in imported tab-d file)
Recap: Behind the scenes: prepare the items, organize folders Method A: PDFs had been created with text (Adobe, Word conversion) For importing a batch of PDFs in one load, All PDFs were stored in one folder. Digitization Training
Recap: Behind the scenes: prepare the items, organize folders Method B: PDFs had been created with text (Adobe, Word conversion) For importing a batch of PDFs in one load, All PDFs were stored in one folder. For loading with tab-d files: Prepare.txt file of metadata Place it in a directory different from the.pdf files
Demonstration 2—Single Compound objects Yearbook (OCR’d transcript produced on the fly) Handwritten Postcard (with a previously created typescript file) Book (Separate transcript produced in advance)
Text: Newspapers Newspapers Wissahickon Valley Public Library. (PA) Ambler Gazette Collection. [AccessPA consortium]. Freeport News. (NY) [LILRC consortium] Summit Memory (Ohio) “The Ohio Informer” Lehigh University. (PA) Brown and White newspaper. [AccessPA consortium] [article segmentation]
Text: PDF documents Arizona Memory Project [PDF document accessible via “abstract” – no full text within CONTENTdm – requires secondary search] (Search ‘arizona visitor industry’) Duquesne University, PA [PDF full text all within CONTENTdm – single search of all full text] (Search ‘bermuda’)
Questions & Answers Getting help with Text User Support Center Downloading the appropriate Acquisition Station JPEG2000 Installing, activating the OCR extension Tutorials to study Help files related to text works Write
Questions?
Collections of documents: Text-based letters, newspapers, diaries, yearbooks, PDFs, and more
60-Day Free CONTENTdm Evaluation
Section Break Line Two Subtitle here Contact: Ron Gardner, OCLC For more information about CONTENTdm…