Download presentation
Presentation is loading. Please wait.
Published byGavin Houston Modified over 8 years ago
1
Post-ALA Annual July 11, 2008 Pre-Conference Workshop: The Care and Feeding of Compound Objects Geri Ingram OCLC Digital Collection Services Manager, User Services Fifth Annual Midwest Users Group April 7, 2010
2
Welcome CONTENTdm users To get the most out of this session Either you have: Experience building CONTENTdm collections OR Attended recent CONTENTdm Training
3
Agenda 8: 30 – 9:00Context-setting 9:00 – 9:30 Data file organization, logic, and naming 9:30-9:45Break 9:45- 11:45Exercises 11:45 – NoonQA
4
Context-setting Real world application—it’s all about the users and what collections you have What are the materials you have? How do your users want to access them digitally? You must understand how the tools work with YOUR data Which tools are appropriate for your materials? What do different wizards expect by way of file naming and organization?
5
What are your users looking for? Yearbooks, newspapers…
6
Photograph collections…
7
Historical postcards…
8
Archival papers…
9
How are they looking for these materials? Browsing Searching--across collections, subgroups? Known item searching, and/or Total recall by topic, name, etc. Do you have text-rich materials? If so your users hope for full-text search-ability across the repository.
10
Formats: Born-digital Papers, videos, audio files In CONTENTdm, these are natively simple items, not compound objects, e.g.:.pdf.mp3.avi
11
Formats: Digitized (reformatted) If still to be digitized You have control over the project specification File name and organization Metadata automatically and manually created If already digitized You choose among the tools for the one that best fits your data organization
12
CONTENTdm Compound Objects CONTENTdm defined classes —when 2 or more simple items are bound together by logic (and XML): Documents—”flat”—a series of related items We will load multiple letters, two ways Postcards—exactly two digital files; two-sided items Monographs—”hierarchical”—items related in a hierarchy We will load a single book (a section of a book with chapters) two ways Six-sided views—exactly six digital files (known as “picture cube”)
13
Providing searchable text from image files Remember: metadata fields can be made searchable or not In addition, full-text, extracted from the digital object itself can be stored in a metadata field in any of three ways: 1.Generated by OCR “on-the-fly” (integrated ABBYY FineReader®) 2.Imported as.txt transcript Typescripted from handwritten or OCR’d in advance (external OCR engine) 3.Extracted (by server) from PDFs (if text has been created from the image to begin with)
14
First things First-- Collection Configuration choices If your materials have searchable text, you will need One empty, searchable field configured as “Full text search” data type to hold text If users expect to see “top” level records only in the search result set: set CISOSUPPRESS parameter to suppress display of components of compound objects in search results.
15
Additionally: Processing a.pdf to optimize indexing, search and display Collection configuration option: “Convert PDF to compound object” What it does and does NOT do. How/when you might override it Effect on the end-users’ view A Multi-page PDF will call compound object viewer If it has been processed as if it were a compound object A one-page PDF will ignore the setting and call the item viewer to display NB: The collection should be set for ‘in-line’ Adobe viewer Remember—PDFs are simple files that can be converted to compound objects—still counted AND added, as simple items
16
What PDF conversion does and does NOT do. It DOES allow very large pdf files to be indexed, searched and retrieved quickly—EACH page can have 128,000 characters. It DOES allow end users to search for text across huge volumes of materials without having to re-execute the search inside each document using the Adobe Reader “binoculars” search. It DOES allow the end-user to choose from ‘compound-object’ viewer three more views, including subset and complete Adobe Reader view. It does NOT allow you to “nest” compound objects. I.e., you can assemble multiple PDFs as a compound object, but you cannot then take advantage of the page-level indexing, display etc., within each “page” of the compound object.
17
9:00 File organization and naming Preparing to use the Project Client wizards File organization for adding materials commonly assembled as compound objects, e.g., Yearbooks, Papers, Postcards, Books Adding single and multiple compound objects Add compound objects: Having components in organized file folders only, or also Having text files that describe the objects and the items
18
File and folder facts: regardless of wizard to be used in Add compound object function, SCANS are held together in one folder For all compound object classes Document, Monograph, Postcard, Six-sided view For each compound object All digital files must reside in one directory/folder This is true whether you are adding multiple compound objects or a single compound object.
19
Example: a single Document using Add compound object Directory Structure wizardObject List wizard
20
Example: a single Monograph Using Compound Object wizard Where structured by folder organization Where structured by a tab- delimited text file
21
ALSO—When you add multiple compound objects using tab-d files: Their nature and placement changes Got page-level metadata? Each object needs its own. Got only object-level metadata? All objects share one.
22
Compound objects using tab-d files, depending upon the Object class: The structure of the.txt file changes Document: all “columns” are field attributes Monograph: two new “columns” define structure
23
Questions & Answers Getting help with compound objects User Support Center Tutorials to study Installing, activating the OCR extension Help files related to text works Write contentdmsupport@oclc.org
24
Questions? Geri Ingram ingramg@oclc.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.