An Alternative Approach to Interoperability Testing The Use of Special Diagnostic Records in the Context of Z39.50 and Online Library Catalogs William.

Slides:



Advertisements
Similar presentations
Canada The Bath Profile and The Journey To Interoperability Carrol D Lunau Bath Profile Maintenance Agency July 7, 2003
Advertisements

Z39.50 Profiles The Bath Profile ZIG Meeting Leuven, Belgium July 2000 William E. Moen School of Library and Information Sciences University.
Barriers to Interoperability Technical and Not So Technical William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
The OCLC Metadata Switch Project Jean Godby, Thomas Hickey, Diane Vizine-Goetz OCLC Office of Research Digital Library Federation May 14, 2003.
Corey A Harper DC2006 October 4, 2006 Authority Control for the Semantic Web Encoding Library of Congress Subject Headings (LCSH) in SKOS.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
31 Aug 2003 Talking Systems Janice Sim Technical Services Manager University of Wales College, Newport.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
OCLC Local Holdings Records (LHRs) for the UCs CAMCIG Training October 20, 2009 Presenter: Sara Shatford Layne.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
The MERIC Prototype A Proof of Concept for the MERIC Vision William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge.
Vended Authority Control --Procedures and issues.
Positioning Z39.50 in the Networked Library Standards for Building Sustainable Services William E. Moen School of Library and Information Sciences Texas.
Z39.50 for Finding It All William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton,
Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May , 2009.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Getting Started with CONTENTdm Corey Harper, University of Oregon Terry Reese, Oregon State University OLA - April 8, 2005.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
OpenURL Link Resolvers 101
Testing and Improving Interoperability The Z39.50 Interoperability Testbed William E. Moen School of Library and Information Sciences Texas Center for.
Cataloging 12.3 to 14.2 Seminar. Cataloging 2 -New check routines -Cataloging authorizations -Other innovations -Fix and expand routines -Floating keyboard.
ZLOT Prototype Assessment John Carlo Bertot Associate Professor School of Information Studies Florida State University.
MARC Content Designation Utilization: Inquiry and Analysis Can Empirical Evidence Help Shape the Future of MARC? Amy Eklund, Research Asst., MCDU Project;
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Optimizing Resource Discovery Service Interfaces in Statewide Virtual Libraries: The Library of Texas Challenge William E. Moen, Ph.D. Texas Center for.
MARC Content Designation and Utilization Future of MARC: Challenges and Opportunities of 21 st Century Cataloging William E. Moen School of Library and.
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
Implementation scenarios, encoding structures and display Rob Walls Director Database Services Libraries Australia.
Introduction to Web Services Eric Lease Morgan University Libraries of Notre Dame June 24, 2005.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
MARC Content Designation and Utilization Examining MARC Records as Artifacts Reflecting Metadata Utilization Decisions William E. Moen School of Library.
The physical parts of a computer are called hardware.
Radioactive Metadata Records An Interoperability Testing Approach Based on Metadata Utilization William E. Moen School of Library and Information Sciences.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
MARC Content Designation and Utilization Learning from Artifacts: Metadata Utilization Analysis William E. Moen School of Library and Information Sciences.
Z39.50 & The Z Texas Profile William E. Moen School of Library and Information Sciences University of North Texas Denton, TX.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata Interaction, Integration, and Interoperability MODS, MARC and Metadata Interoperability, ALA Conference, June 27, 2005, Chicago, IL William E.
Overviews of the Library of Texas & ZLOT Project Dr. William E. Moen Principal Investigator.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Interoperability, Z39.50 Profiles & Testing William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
Next Generation Z39.50 A Web Services Approach for Search and Retrieve 6 th Annual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC William.
No Longer Under Our Control? The Nature and Role of Standards in the 21 st Century Library William E. Moen School of Library and Information Sciences Texas.
An Inquiry and Analysis of Metadata Utilization A Case Study of MARC 2005 ASIS&T Annual Meeting, November 1, 2005, Charlotte, North Carolina William E.
Award Number IUG 2004 Boston, MA Integrating Digital Libraries and Traditional Libraries Sue Cody Arlene Hanerfeld Dan Pfohl University of North.
Research and Projects: Z, M, and Beyond! William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Users and Metasearch Applications: New Challenges for Usability Assessment William E. Moen, Ph.D. Texas Center for Digital Knowledge University of North.
Z39.50 and the ZING Initiatives: MAVIS Users Conference, 2003 November 6, 2003 Larry E. Dixson Library of Congress.
Placing All Information Within Our Control? Standards, Information Organization, and the 21 st Century Library William E. Moen Texas Center for Digital.
The ZLOT Project An Overview of Activities and Results William E. Moen, Ph.D. Principal Investigator University of North Texas ZLOT Special Meeting December.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
MARC Content Designation Use I mplications for indexing & interoperability William E. Moen School of Library and Information Sciences Texas Center for.
The ___ is a global network of computer networks Internet.
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
OAI metadata: why and how Jenn Riley Metadata Librarian Indiana University.
A Complex Standard and Its Use Results from an empirical analysis of MARC 2004 Texas Library Association Annual Conference, March 18, 2004, San Antonio,
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
European Network of e-Lexicography
Cataloging Tips and Tricks
Some Options for Non-MARC Descriptive Metadata
Presentation transcript:

An Alternative Approach to Interoperability Testing The Use of Special Diagnostic Records in the Context of Z39.50 and Online Library Catalogs William E. Moen JungWon Yoon School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX ASIS&T Annual Meeting, November 1, 2005, Charlotte, North Carolina

Moen ASIS&T Charlotte, NC-- November 1, Interoperability projects Funded by: U.S. Federal Institute of Museum and Library Services Z39.50 Interoperability Testbed, Phases 1 & 2 Improve Z39.50 semantic interoperability among libraries for information access and resource sharing Establish and operate a testbed for interop testing of Z39.50 clients and servers with library catalogs (Phase 1: ) Explore alternative approach using Radioactive MARC Records (Phase 2: )

Moen ASIS&T Charlotte, NC-- November 1, Factors affecting interoperability Multiple and disparate systems Information retrieval systems, search functionality, etc. Multiple protocols Z39.50, HTTP, SOAP, SRW/U, etc. Multiple data formats, syntax, metadata schemes MARC 21, UNIMARC, XML, ISBD/AACR2-based, Dublin Core Multiple vocabularies, ontologies, disciplines LCSH, MESH, AAT Multiple languages, multiple character sets Indexing, word normalization, and word extraction policies

Moen ASIS&T Charlotte, NC-- November 1, Z-Interop Phase 1 Test dataset: 400,000 MARC 21 records from OCLC Z39.50 reference implementations Z-client, Z-server, information retrieval system Configured to the profile specifications Test scenarios & searches Searches with known result records from dataset Benchmarks Results of test searches against reference implementations Finding: Interoperability improved dramatically using profile specs and common indexing policies Issue: Approach not suitable to interop testing for individual, local library systems

Moen ASIS&T Charlotte, NC-- November 1, Phase 1 interop testing Reference Z39.50 Client Vendor Z39.50 Server Configured to Support Profile Specifications Configured by Vendor for Conformance to Profile Indexed by Vendor According to Vendor’s Specifications Test Dataset Loaded by Vendor or Library Test Searches Retrieval Results Retrieval Benchmarks Compared to

Moen ASIS&T Charlotte, NC-- November 1, Z-Interop Phase 2 Radioactive MARC Records: specially designed diagnostic records A set of test searches and automatic testing script that issues searches, retrieves records, and develops reports on the search and retrieval results A database of MARC documentation that enables the automatic identification of types of searches to issue

Moen ASIS&T Charlotte, NC-- November 1,

Moen ASIS&T Charlotte, NC-- November 1, Radioactive MARC records Specially designed diagnostic records Legitimate instance of MARC record structure Fields/subfields contain content-rich tokens A token is a string of characters that has a specific structure and semantics that will serve as “words” or other data values in specific fields/subfields. Multiple sets of RadMARC records, distinguished by the amount of content designation populated

Moen ASIS&T Charlotte, NC-- November 1, Structure of RadMARC tokens A single alpha character for left-hand padding. Value = r A single alpha character to indicate the format of the material being described or type of record Value = Selected values as defined in MARC Leader/06 – Type of Record or the Leader/07 – Bibliographic Level Three numbers indicating the Field Tag Value = Defined in MARC 21 specifications A single integer to indicate number of occurrence the Field Tag Value = Sequential number starting with 1 A single alpha character to indicate the Subfield Code Value = Defined in MARC 21 specifications A single integer indicating the offset within subfield Value = Use the following scheme: 1=first token in subfield, 2=second token in subfield; 3= third token in subfield, etc. A single alpha character for right-hand padding Value = r

Moen ASIS&T Charlotte, NC-- November 1, Token example ra2451a1r r - Left-hand padding a - Type of record -- this is a Language Material type record Field code 1 – First occurrence of field in record a - Subfield code 1 - Offset within subfield, where 1 = first token in subfield r - Right-hand padding RadMARC example record

Moen ASIS&T Charlotte, NC-- November 1, Test scripts Automate interoperability testing and reporting Test searches defined by Bath Profile and US National Z39.50 Profile for Library Applications RadioMARC Perl module Automatically generates Z39.50 queries with tokens as search terms Sends searches to target servers known to contain copies of specific records Generates reports dependent on whether or not the expected record(s) is present in the result set Sample output of testing

Moen ASIS&T Charlotte, NC-- November 1,

Moen ASIS&T Charlotte, NC-- November 1, MARCdocs database Pilot effort aimed at structuring MARC 21 documentation into a relational database Stores information about all content designation available in the MARC 21 Format for Bibliographic Data specifications Stores additional information about profile-defined searches necessary to the automatic test scripts Implementation uses MySQL and PhP Example display from MARCdocs Special data in RadioMARCdocsRadioMARCdocs

Moen ASIS&T Charlotte, NC-- November 1, Question space for Z-Interop2 Profile conformance level: Addresses the interoperability between the Z-client and Z-server Information retrieval (IR) system level: Addresses the capability of the IR system underlying the online catalog application (e.g., types of searching) Metadata record level: Concerned with how the IR system indexes fields in the metadata record Data content level: Addresses normalization of data, hyphenated words, special characters and diacritics, etc.

Moen ASIS&T Charlotte, NC-- November 1, So far, so good…. Verified procedures and test scripts with the Z- Interop reference implementation server Completed testing with local library Loaded RadMARC records successfully Used the test script and procedures to issue searches Created two sets of RadMARC records

Moen ASIS&T Charlotte, NC-- November 1, RadMARC record sets What content designation should be populated in RadMARC records to support interoperability testing? MARC 21 defines approximately 2,000 structures for holding data Z-Interop2 approach Develop multiple RadMARC record sets Increasing amount of content designation populated Informed by MARC content designation analysis More on this analysis in Metadata Quality and Evaluation Panel, Tuesday, 1:30pm

Moen ASIS&T Charlotte, NC-- November 1, Fields used in Z-Interop dataset MARC 21 Field Groups Currently Defined ObsoleteUnlikely Used Total 00x6006 0xx xx xx xx xx xx xx xx xx TOTAL

Moen ASIS&T Charlotte, NC-- November 1, Occurrence summary Frequency# of Fields/Subfields% of All Occurrences > 600, % 500,000 > 599,99900% 400,000 > 499, % 300,000 > 399, % 200,000 > 299, % 100,000 > 199, % TOTAL3679.5% Total number of fields/subfields instances in dataset = 13,849,499 Only 4% of all fields/subfields account for 80% of all occurrences or 96% of all fields/subfields account for 20% of all occurrences

Moen ASIS&T Charlotte, NC-- November 1, Indexing & MARC Indexing Guidelines to Support Z39.50 Profile Searches (available on Z-Interop website) Identified all MARC 21 fields/subfields that can contain author, title, or subject data Author-related fields/subfields : 119 AuthorTitle-related fields/subfields: 21 Title-related fields/subfields: 253 Subject-related fields/subfields: 144

Moen ASIS&T Charlotte, NC-- November 1, Occurrences in test dataset 537 fields/subfields can contain author, title, subject data 381 of these actually occur in Z-Interop dataset Total occurrences of the 381 = 4,397, of the 381 (5%) account for 80% of all occurrences 9 of 19 are subject-related 5 of 19 are author-related 5 of 19 are title-related Preliminary testing using only 19 indexed fields: 95% - 100% of correct records retrieved! The 19 fields/subfields

Moen ASIS&T Charlotte, NC-- November 1, Initial RadMARC sets Set 1 10 records Populate 19 most frequently occurring Author, Title, Subject fields Distinguished by types of materials cataloged Set 2 4 records (100, 110, 111, 130 main entry fields) Populate the Author, Title, Subject fields occurring 1000 or more times (approximately 50 fields/subfields populated) Sample Set 2 RadMARC Record

Moen ASIS&T Charlotte, NC-- November 1, Extensibility of RadMARC Records can be as simple or as complex as needed Custom records to interrogate system behavior for a library that wants specific assessment of indexing or other policies Assess normalization of characters Testing transformation from one metadata scheme to another MARC Record MARCXML Transformation MODS Transformation DC Transformation Other metadata environments?

Moen ASIS&T Charlotte, NC-- November 1, Concluding thoughts Exploring an innovative conceptual and technical approach for interoperability testing. Conducting a proof-of-concept for a radioactive metadata record approach for diagnosing interoperability factors in an identified question space Extensible in terms of the current focus Extensible to other application environments, metadata schemes, and protocols.

Moen ASIS&T Charlotte, NC-- November 1, References Z39.50 Interoperability Testbed  MARC Content Designation Utilization Project  Indexing Guidelines to Support Z39.50 Profile Searches  RadioMARC Perl module  MARCdocs Database (public interface) 