Lis512 lecture 4 the MARC format structure, leader, directory.

Slides:



Advertisements
Similar presentations
Lis512 lecture 5 the MARC format control fields. MARC variable fields MARC fields have names that are, in fact, numbers. Each field name has three digits.
Advertisements

Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 7e Kendall & Kendall 8 © 2008 Pearson Prentice Hall.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
CHAPTER 6 FILE PROCESSING. 2 Introduction  The most convenient way to process involving large data sets is to store them into a file for later processing.
Introduction to MARC Cataloguing Part 2 Presenters: Irma Sauvola: Part 1 Dan Smith: Part 2.
Fixed Fields Information Session 29 February 2012 Andrew Gloe Map Acquisitions & Cataloguing Team Australian Collections Management & Preservation Branch.
RDA Test “Train the Trainer Module 1: What RDA is and isn’t [Content as of Mar. 31, 2010]
University of Georgia Libraries / Cataloging design robin fay We will explore the following concepts: UNIT.
October 23, Expanding the Serials Family Continuing resources in the library catalogue.
Module C: Identifying expressions User task: identify.
The European Manuscript & Hand Press Book Heritage The role of the Consortium of European Research Libraries (CERL) Manuscript Collection in the National.
Global Update with Confidence Mary M. Strouse Innovative Users Group May 19, 2009.
CATALOGING NON- TRADITIONAL (MOSTLY ONLINE) MATERIALS The Whys and Hows.
Music Library Association SDC Open Meeting Feb. 18, 2005 AACR3: Summary of Part I Draft ( and a glimpse into Parts II and III) Kathy Glennan University.
Link Resolvers: An Introduction for Reference Librarians Doris Munson Systems/Reference Librarian Eastern Washington University Innovative.
Lis512 lecture 6 the MARC format number and code fields.
Changing the record: workshop on transition to MARC 21 BIC Bibliographic Standards Technical Subgroup The British Library CILIP Cataloguing & Indexing.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
Society of American Archivists Research Forum 18 August 2015 A Deep Dive into the Archival MARC Records in WorldCat (and ArchiveGrid) Jackie Dooley Program.
Running a Report.  List Bibliography Report  Found under: All Titles Purpose : Creates customized bibliographies by catalog, call number, or item characteristics.
Kathryn ELUNA 2014 (Montreal) #ELUNA2014 May 1, 2014.
Lis512 lecture 4 XML: documents and records. up until now Relational databases can store information that is internal to an organization. But a lot of.
1 What’s the Worst that could Happen? Trouble-Shooting Your Voyager Queries ELSUG October 8, 2009 Cathy Salika CARLI.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
Implementation scenarios, encoding structures and display Rob Walls Director Database Services Libraries Australia.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall.
Local Holdings Maintenance: The Basics. Agenda Defining Local Holdings Accessing Connexion Searching in Connexion Understanding an LHR Deriving LHR’s.
Resource Description and Access Deirdre Kiorgaard Australian Committee on Cataloguing Representative to the Joint Steering Committee for the Development.
Integrating Resources Cataloging Workshop Instructors Place/Date 1-1.
AACR2 Pt. 1, Monographic Description LIS Session 2.
Cataloguing Code and Cataloguing Process. What is a Catalog(ue)?  A list of library materials contained in a collection, a library, or a group of libraries.
RDA Compared with AACR2 Presentation given at the ALA conference program session Look Before You Leap: taking RDA for a test-drive July 11, 2009 by Tom.
Kathryn EBUG 2014 (Morehead, Ky.) #EBUG2014 June 20, 2014.
MARC What You Really Need to Know About This Stuff! Audrey Church Coordinator, School Library Media Program Longwood University.
The physical parts of a computer are called hardware.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
Description of Bibliographic Items. Review Encoding = Markup. The library cataloging “markup” language is MARC. Unlike HTML, MARC tags have meaning (i.e.,
AACR 2 –Rules for Descriptive Cataloguing
OCLC Research Library Partnership Work-In-Progress webinar 3 December 2015 A Close Look at the Four Million Archival MARC Records in WorldCat Jackie Dooley.
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
Proposed Changes to PLS FY 2016 Data Elements SDC Meeting December 09, 2015.
Sally McCallum Library of Congress
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Variables in C Topics  Naming Variables  Declaring Variables  Using Variables  The Assignment Statement Reading  Sections
Presenter: Tito Wawire US Embassy, Library of Congress.
Programming in Java (COP 2250) Lecture 12 & 13 Chengyong Yang Fall, 2005.
The ___ is a global network of computer networks Internet.
An information retrieval system may include 3 categories of information:  Factual  Bibliographical  Institutional  Exchange and sharing of these categories.
OPERATORS IN C CHAPTER 3. Expressions can be built up from literals, variables and operators. The operators define how the variables and literals in the.
1 Files and databases Suppose a school stores information about its students on record cards. Each student has their own card; this is their record. Record.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
SILO File Upload & Feedback System By Marie Harms State Library of Iowa August 18 & 19, 2010.
SIERRA CATALOGING BASICS. CONTACT INFORMATION Lynn Uhlman Systems Training and Support Librarian Ticket:
Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston.
Theory, Tools, History: A Brief Introduction August 17, 2016.
How to create a virtual TYP field via tab_type_config.lng
Electronic Integrating Resources
Metadata Editor Introduction
Cataloging Tips and Tricks
Case Study: Fixing MARC data with MarcEdit and OpenRefine
Chapter 3 The DATA DIVISION.
MARC: Beyond the Basics 11/24/2018 (C) 2006, Tom Kaun.
Relational Database Model
CSU Millennium to Alma migration
FRBR and FRAD as Implemented in RDA
Presentation transcript:

lis512 lecture 4 the MARC format structure, leader, directory

MARC 21 MARC 21 is as important example of a record format used in by the library community Integrated Library Systems (ILSs) all import MARC21 records into relational database system export MARC21 records from relational database systems MARC21 records describe records from library catalogues.

documentation I use the documentation provided by the library of congress at

warning: the blank The MARC record format uses blanks as values. Since blanks are not usually seen on a slide, I may denote them here by a special sign ␢. Most of the time ‘ ’ does the job. In the documentation at the LoC, they use the # character.

MARC 21 formats There are different format of MARC21 records – MARC 21 format for bibliographic data – MARC 21 format for authority data – MARC 21 format for holdings data – MARC 21 format for classification data We are only studying the first one here and call this the “MARC format” in what follows.

MARC format The MARC format is very complicated. The basic structure is – leader – directory – control fields – data fields The leader and directory are fixed fields. That means they have fixed length. The control and data fields are called variable fields. That means they have variable length.

fixed fields The leader is 24 characters/bytes long. It can not be repeated. Each directory entry is 12 characters/bytes long. Both parts can only contain ASCII characters. There may be many directory entries. Therefore the total length of the directory is not fixed.

MARC leader Described in der.html The leader is 24 bytes long. Each byte houses one ASCII character, so we can also say that it is 24 characters long. We count them from 0 to 23.

grouping of leader bytes I have grouped the bytes into “boring” and “interesting” bytes. Boring bytes contain no information. Their values can be calculated by a computer. Interesting bytes contain some information. They have to be filled in by a human.

sub-classification of boring bytes Boring bytes can be classified as fixed boring bytes and variable boring bytes. Fixed boring bytes always contain the same value for any MARC record. Variable boring bytes contain potentially different values for every MARC record. I write “leader/??” when I want to refer to the byte ?? in the leader.

fixed boring byte positions Here are all fixed boring bytes – leader/10 always contains ‘2’ – leader/11 always contains ‘2’ – leader/20 always contains ‘4’ – leader/21 always contains ‘5’ – leader/22 always contains ‘0’ – leader/23 always contains ‘0’ Here is an example with fixed boring bytes underlined: “01178cam a a 4500”

fixed boring bytes For fixed boring bytes, we need to be able to see where they are. We do not really need to know what they mean since we can’t change them anyway. For the curious, I have explanations in the appendix.

Variable boring bytes These bytes contains length data about the record. Of course, for most records you will find different values. But the values hardly contain any useful information. The boring positions are 00–04 and 12–16. Both are 5 positions long and we study them in turn.

leader positions 00–04 At that position, there appears the length of the entire MARC record. This is the number of bytes, not the number of characters! There are 5 numerical characters. The number is right justified and unused characters contain zeros, as in say “00234”

leader positions 12–16 This contains the position of the start of the data, after the leader and the directory. This is contains the length of the leader plus the length of the directory, plus 1. It is encoded like positions 00–04. The number should be much smaller than the number encountered at 00–04. If you subtract one, you should find a multiple of 12.

finished with boredom Here is an example with fixed all boring bytes underlined: “01178cam a a 4500” In the interesting bytes, we find codings that give us information about the MARC records. If we create a MARC record, we need to fill them out. Sometimes a system may prepopulate them but we still need to know what they are.

interesting code positions From the example, you will see that the code positions that are interesting come in a rear part 17–19 and it a front part 05–09.

leader position 05 The records status indicates the relationship of the record to a set of records in a file. – ‘a’  The record has been changed to a higher encoding level as recorded in position 17. – ‘c’  Corrected or revised Addition/change other than in the encoding level code has been made to the record. – ‘d’  Deleted Record has been deleted. – ‘n’  New Record is newly input. – ‘p’  Increase in encoding level from prepublication.

leader position 6, slide 1 This gives the type of material. Values are – ‘a’  language material – ‘t’  manuscript language material – ‘c’  notated music – ‘d’  a manuscript notated music – ‘e’  cartographic material – ‘f ‘  manuscript cartographic material – ‘g’  projected medium

leader position 6, slide 2 further values are – ‘I’  nonmusical sound recording – ‘j’  musical sound recording – ‘k’  a two-dimensional non-projectable graphic – ‘m’  a computer file – ‘o’  a kit – ‘p’  mixed materials – ‘r’  a three-dimensional artifact or naturally occurring object

leader position 7 Bibliographic level of the description – ‘a’  monographic component part – ‘b’  serial component part – ‘c’  collection – ‘d’  a subunit – ‘I’  an integrating resource – ‘m’  a monograph – ‘s’  a serial

leader position 8 This gives the type of control. There are two valid values – ‘ ’ –> not specified – ‘a’  the item is described according to archival descriptive rules, which focus on the contextual relationships between items and on their provenance rather than on bibliographic detail. All forms of material can be controlled archivally.

leader position 9 This field indicates the character set and encoding scheme used in the record. There are two valid values – ‘ ’  MARC-8 – ‘a’  the UTF-8 encoding of UCS/Unicode

conclusion about the front part Normally, if we are cataloging books, and keep changing the records, we expect something like ‘cam a’. You are allowed to use that for your record provided you don’t go into multi-part resources.

leader position 17, slide 1 This is the encoding level. It indicates the fullness of the bibliographic information in the record. – ‘ ’  “full level”. It is a complete MARC record created from information derived from an inspection of the physical item. For serials, at least one issue of the serial is inspected. – ‘1’  “full level, material not examined”  created from information derived from an extant description of the item, without reinspection of the item. – “2”  “Less-than-full level, material not examined”  created from an extant description of the material without reinspection of the item.

leader position 17, slide 2 More allowed values for position 17 are – ‘3’  “abbreviated level”  meaning a brief record that does not meet minimal level cataloging specifications. – ‘4’  “core level” a less-than-full but greater-than-minimal level cataloging record – ‘5’  “partial (preliminary) level” a preliminary cataloging level record that is not considered final

leader position 17, slide 3 More allowed values for position 17 are – ‘7’  “minimal level”  Record that meets the U.S. National Level Bibliographic Record minimal level cataloging specifications and is considered final – ‘8’  “prepublication level”  prepublication level record. Includes records created in cataloging in publication programs. – ‘u’  “unknown” – ‘z’  “not applicable” the concept of encoding level does not apply to the record.

leader position 18 This field say what cataloging rules were applied in the descriptive part. Allowed values are – ‘ ’  the description does not follow the International Standard Bibliographic Description (ISBD) cataloging and punctuation provisions. – ‘a’  the description uses the AACR2 – ‘c’  the description follows the ISBD, but the end of subfield punctuation is omitted. – ‘i’  the description follows the ISBD including end of subfield punctuation – ‘u’  unknown

leader position 19 This has the multipart resource record level. This pertains to the situation where the record describes part of a resource or a resource that has many parts. – ‘ ’  not specified or not applicable – ‘a’  the record describes a set of resources – ‘b’  the record describes a resource which is part of a set. The resource has a title that allows it to be independent of the set. – ‘c’  the record describes a resource that is part of a set. The resource does not have a title that makes it understandable separately.

conclusion about the rear part Normally, when we download a MARC record, we expect the rear part to be ‘ a ’, with two blanks surrounding an ‘a’. In the record that we create for class we make it ‘7a ’. Let’s avoid complications with multi- part resources. ;-)

Thank you for your attention.

MARC directory The MARC directory follows the leader. The directory contains fixed-length information about the variable-length fields. Each directory entry has 12 bytes numbered 0 to 11. For each variable field appearing in the record, there is a directory entry. The MARC directory contains no information.

appendix This is an appendix to the slides. It contains a description of the MARC directory It contains a partial rationale for the fixed boring bytes.

MARC directory bytes 0–2 Bytes 00–02 give the field name. The field names used in MARC are numbers. So you will find three ASCII numeric characters at that place.

the MARC directory bytes 03–06 They give the field length. This is 4 (four) ASCII numeric characters. The fact that there are four of them is determined by character 20 in the MARC leader. The number is right justified and filled by 0s if required. The field length of the variable field, including the field names, the field indicators, subfield codes, data, and the field terminator character.

the MARC directory bytes 07–11 Bytes 07–11 give the starting character position. This five ASCII numeric characters that specify the starting character position of the variable field relative to the base address. The fact that there are five of them is determined by character 21 in the MARC leader. The number is right justified and filled by 0s if required.

leader position 10 This is the indicator count. It says how many bytes are used by indicators to a field. The indicator count is always 2.

leader position 11 This indicates the number of character positions used for each subfield code in a variable data field. This always ‘2’. There are two characters in each subfield. First there is the delimiter ‘$’ and then there is a lowercase or number code for the field.

leader position 20 This is the length of the length-of-field portion. Meaning how long is the length of a field. It is always “4”.

leader position 21 Length of the starting-character-position portion. At this position, we always find the value “5”. This says that in each entry in the MARC directory (to be covered later) the starting character position part is 5 numbers long.

leader position 22 This gives the length of the implementation dependent part of the record. This is always 0. This means that the record is done according to the specification and there is no additional part that is custom made for use on the information system you are working with. If it were there, it would have to be short!

leader position 23 This is the last position. The meaning of the data at this position is undefined. I suppose it could be defined later when we find a need for it. This position is always occupied by a 0.