Development & Testing of BIBFRAME at the Library of Congress

Development & Testing of BIBFRAME at the Library of Congress
Presented by Randall K. Barry Good morning. My name is Randall Barry. I’ve worked at the Library of Congress for 39 years. 23 of those years have been in cataloging. As was just mentioned, I’ve studied and worked with a variety of languages, mostly European, but also Arabic and Mongolian. Although not expert in East Asian languages, I have worked with standards related to their writing systems which are among the most complex that exist. I spent years working to help develop non-roman character sets and transliteration tables for languages that don’t use the roman alphabet. I was also a MARC standards specialist for 16 years, so you can see why BIBFRAME, sometimes called the “replacement for MARC” interests me. I hope my presentation helps explain two things: 1) What is BIBFRAME? and perhaps more importantly, 2) Why are libraries so interested in developing and implementing it? Presentation to the Council on East Asian Libraries Seattle, WA - March 31, 2016

Presentation Outline Background General Overview of BIBFRAME
Relationship to the MARC formats Testing and Pilot Project at the Library of Congress Challenges for Developing BIBFRAME Project Steps Scenario for Replacement of MARC Handling of Nonroman Data (CJK, and others) BIBFRAME Pilot Phase II (2017) Conclusion This outline above gives you an idea of what I hope to cover in my talk. After describing what BIBFRAME is in general terms, I will talk about it’s relationship to the Machine-Readable Cataloging (MARC) formats. I think some background information will help you understand BIBFRAME. My talk is based on one given recently by Sally McCallum—the person at the Library of Congress who has been a leader in the development and maintenance of the MARC formats for 40 years. Now Sally is one of the leaders of BIBFRAME development. I’ll talk about the challenges of developing BIBFRAME. I’ll discuss the steps being taken in this development process. I’ll describe how BIBFRAME could replace MARC. Finally, I will describe how information in “our” languages—those that do not use the roman alphabet—will be handled in BIBFRAME.

Background 1989: The World Wide Web: developed by Tim Berners-Lee to connect separate document systems for sharing of information 2006: Four Linked Data principles, articulated by Tim Berners-Lee 1) Name things using Universal Resource Identifiers (URIs) ( 2) Encode things to be compatible with the Hypertext Transfer Protocol (HTTP) ( 3) Provide information about what a thing is using the Resource Description Framework (RDF) and SPARQL (an RDF-based Structured Query Language) 4) Include useful URI links in things to other things (a Semantic Web so that data can be reused across platforms) Before saying anything about BIBFRAME, it is important to explain where the World Wide Web came from. In 1989, Tim Berners-Lee, a researcher working at the European Center for Nuclear Research (CERN) developed a way for documents to be shared between separate text systems. People he worked with needed to be able to share their research. It required a common way of “marking up” the text, and a “protocol” for searching and retrieving it from another system. The Hyper-Text Markup Language (HTML) and Hyper-Text Transfer Protocol (HTTP) were standards that grew out of his work. The letters “HTTP” should be familiar to you since you use them to identify the search protocol when you do web searches. The information you get back to your browser (using Internet Explorer on your PC or Safari on your iPhone) is sent in HTML markup. Browsers can read HTML, so that’s how they can display info for you. In 2006, Berners-Lee went even further and proposed four principles for what he called “linked data”. They include: naming things (with URIs), encoding them (with HTTP), describing them (using RDF), and finally linking related things to each other (using hypertext links).

Background - continued
2007: On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control Recommended use of technology to promote broader use of library data, particularly vocabularies (name authorites, subject headings, code lists) Replace the MARC formats with a web-compatible interchange framework, specifically something compatible with RDF 2009: LC’s Linked Data Service is created (id.loc.gov) Library of Congress Subject Headings (LCSH) is made available as the first “curated vocabulary” Name authorities are made available as the next vocabulary (9 million records) Other vocabularies: language codes; country codes; geographic area codes; content, media, carrier types; genre/form terms, etc. As librarians, we have been familiar with using the web to find things for many years. The technology behind the web is somewhat hidden from view, but it works for us, at least for searching the web for information. In 2007 a study was done at LC to consider the future for bibliographic control in libraries. Use of the web in libraries helped drive this study. One of the things reported was that, despite the popularity of the using the web to find information, most of the information in libraries’ online catalogs was NOT easily found on the web. To search library catalogs, you often have to find the catalog on a library’s web site, and then dig your way into that catalog. The 2007 report also suggested the need to replace the MARC formats for library data with something that was more compatible with the web. In 2009, the Library of Congress took a first step toward making its library data available directly to web searches—without the need to first find a link to its online catalog—by making some of its controlled “vocabularies” available as other information is on the web. LC created ID.LOC.GOV. LC started with the Library of Congress Subject Headings—a rich thesaurus of subject terms. LC’s database of subject headings was transformed to give each heading a URI and RDF-compatible structure to the information so browsers could search, find, and display headings.

General Overview of BIBFRAME
2012: Start of the Bibliographic Framework Initiative (BIBFRAME) Replacement for the MARC formats. Why? Age of the standard and its inherent structure (from 1969) Adoption of Resource Description & Access (RDA) for describing and providing access to bibliographic materials Modelling being done by the museum and archival community Shift in materials themselves to electronic (digital) formats Provide for the creation and access to bibliographic data as “linked data” (following the W3C model) Provide better exposure to library data on the internet End the isolation of library data in MARC-based systems In 2012, work on LC’s vocabularies led to the Bibliographic Framework Initiative (BIBFRAME) to support linked bibliographic data. So why would we want to replace the MARC formats and records? Well, the MARC formats are based on a record structure from 1969. Bibliographic records in MARC format are complex packages of information that include descriptive elements, access points, and useful notes of all kinds. Adoption of the Resource Description & Access (RDA) cataloging rules based on the Functional Requirements for Bibliographic Records (FRBR) introduced the concept of a high-level “work”, from which all other information leads. It was agreed that this was a better model. Similar modeling was going on in the museum and archival communities. Library materials themselves were also going through a transformation with the appearance of digital publication formats. Bibliographic information in MARC is hidden from the web. BIBFRAME provides for the creation and access to bibliographic information as linked data following the models currently supported by the W3C. This would provide better exposure to library data on the internet, and end the isolation of library cataloging data in MARC-based systems.

General Overview of BIBFRAME - continued
Establishes four main classes of data: creative works, instances, authorities, and annotations Instance combines elements of the FRBR and RDA expression and manifestion entities Annotations is where holdings and item-level information would be found To give you a little better understanding of the BIBFRAME model for bibliographic information, I will describe classes of information defined in BIBFRAME. There are four. There is the work—someone’s creation There is an instance--edition, version, or printing of that work There are authoritative access points to that work and instance (who created it, who contributed to it) Finally, there are annotations (notes, information about who has copies of the work or instance) that are useful to library patrons For any of you that are familiar with FRBR and RDA, the BIBFRAME concept of “instance” combines the FRBR and RDA concepts of expression and manifestation (I won’t go into more detail here) Something important to notice in this illustration are the linkages between these things. Remember, a key concept for BIBFRAME is “linked data”. Creating packages of MARC cataloging data as isolated records is not compatible with the BIBFRAME model. Instead, catalogers of the future will create linkages between some descriptive elements and information in existing vocabularies. Something as basic as the title of a work would be something you link to in a vocabulary somewhere, rather than type it into a new record.

Relationship to the MARC Formats
BIBFRAME, based on RDF, will replace the MARC record as the vehicle for sharing (providing access to) information about library material and specific holdings The rich set of MARC data elements will be used to help shape the vocabularies needed for linked library data MARC-based library systems will evolve to support the creation, update, and access to BIBFRAME (linked) data Existing MARC-based information (cataloging data) will be transformed to be compatible with BIBFRAME and BIBFRAME-based systems Library will move beyond the limitations of MARC The rich access provided by MARC will be enhanced So, what is the relationship between BIBFRAME and MARC? First of all, MARC was designed as a container for bibliographic information—a way to share that information with other libraries. BIBFRAME will replace MARC as the vehicle for providing access to that information. The MARC record container won’t be needed. The vocabularies will be shared by libraries linking to them. There will be some information that is specific to instances and “copies” of items, but that information will be structured according to RDF so as to make it accessible on the web. All that is in MARC records will not be lost, however. MARC is a rich set of more than 2,600 data elements. The library information that is now packaged in MARC records can be transformed for BIBFRAME—hopefully with no loss of information. Since the limitations of MARC and current library catalogs will be removed, there will be better access to this information once it’s transformed. Sharing of data will be done via links, as opposed to electronically transmitting MARC records and files.

Testing & Pilot Project at the Library of Congress
: The first BIBFRAME pilot was begun to test what works Transformation of existing MARC data: merging and splitting of data originally stored in 18 million individual bibliographic and 8 million authority records Development of an editor (cataloging interface) for librarians creating BIBFRAME-compatible data for new items Enhancement of id.loc.gov to support the pilot project Training of staff to understand linked data and the BIBFRAME editor Keep in touch with other linked data pilot projects At this point I want to tell you what the Library of Congress has been doing to develop and test BIBFRAME. In September of 2015 the first BIBFRAME pilot project was begun. Selected staff (volunteers in various areas) were trained for BIBFRAME. Part (not all) of LC’s existing database of MARC bibliographic and authority records were transformed into linked data for the Phase I test. This involved merging and splitting of data and creating links (by machine) for 18 million bib. records and 8 million authority records. For this phase of testing, it was not possible to transform all 18 million bibliographic records. (I won’t go into detail as to why.) Finding mappings of data in MARC records to links and other entities in BIBFRAME is still being worked on. A basic BIBFRAME “editor” (input system) was developed for the test. Existing web-accessible vocabularies in ID.LOC.GOV were enhanced. LC developers of the BIBFRAME vocabulary and editor consulted with developers at other institutions as work has progressed. LC has been using this first phase of testing to learn more about things like transformation of MARC data, features of a BIBFRAME editor, and techniques/policies for linked data creation for newly cataloged items.

Status of Testing & Pilot Project at LC
Training for the first phase completed in August 2015 Staff handling print material started first Staff create cataloging data using LC’s MARC-based library system and repeat the process to create linked data using the BIBFRAME editor and techniques Standard RDA instructions are followed for all data regardless of the system or framework used For monographs, serials, music and cartographic material, the Phase I pilot will end March 31, 2016 Testing for sound recordings and audio visual materials phase will continue through May 31, 2016 Testing for prints & photographs will end July 31, 2016 Training for BIBFRAME took a bit longer to get started and completed than originally expected. Initial training was completed in August 2015. Staff working with print material were the first to begin testing—including people working on monographs, serials, printed music, and maps. A decision was made to have catalogers create MARC bibliographic and authority records first in LC’s legacy system (Voyager). Linked data for the same items is created using the BIBFRAME editor. Keep in mind, since BIBFRAME involves creating links, not necessarily entering data, some techniques for cataloging are new. Some information must be entered into the editor, especially information specific to an “instance”-that is, the copy of a book in hand. BIBFRAME catalogers apply the same RDA instructions when creating linked bibliographic data as when creating MARC records. Phase I testing for printed items will end on March 31, 2016. Testing for sound recordings and video recordings started later and will continue until July 31, 2016. The feedback from staff has been encouraging. Catalogers understand what they are doing since the concept of linking one thing to another makes sense to those familiar with the internet.

Challenges for Developing BIBFRAME
Revisions to BIBFRAME’s RDF vocabulary are needed Incorporating community input Benefitting from expert advice Guided by the experiences of the pilot Updates to the BIBFRAME editor will be made based on the experiences of the pilot and the next BIBFRAME vocabulary and model (Version 2.0) New/improved MARC-to-BIBFRAME transformations will be developed based on Version 2.0 Vocabulary lists within id.loc.gov will be enhanced and with more linkages A second BIBFRAME pilot will be performed (not before October 2016) Testing BIBFRAME with professional catalogers doing their regular work has had its challenges, but it has given developers valuable information. The testing has shown where revisions and improvements to the RDF-based vocabularies are needed. Remember: catalogers are creating clusters of links between data about specific items (books, serials, music, maps) and vocabularies representing access points. Input is coming from testers, the library community, and experts. Updates to the BIBFRAME editor and vocabulary will be made based on experiences in the Phase I test. New and improved MARC-to-BIBFRAME transformation will be developed, this time based on Version 2.0 of the vocabulary. Since traditional MARC-style records have been added to LC existing databases of bibliographic and authority records during the test, new transformations of those data can be done to “refresh” BIBFRAME data for the Phase II test. The BIBFRAME vocabularies in LC’s linked data services (ID.LOC.GOV) will get more even more lists of things to link to over time. LC does not expect to begin a Phase II test before October 2016

Project Steps Updated BIBFRAME vocabularies and will be made available to implementers (at LC and elsewhere) An updated BIBFRAME editor to support the input and update of linked library data will be made available Training of staff to test BIBFRAME will continue Encourage library system vendors to get involved Share information about LC’s BIBFRAME pilot project with other testers and with those who have a vested interest in the technology You are probably curious about the order in which BIBFRAME development will occur. Before any further testing is done, LC needs to complete work on updates to the BIBFRAME vocabularies. Vocabulary 2.0 is close to being finished now. Updates and enhancements to the BIBFRAME editor will be made also. The current editor does not support the update of linked data once it is created. This was done on purpose to protect the test data during the Phase I pilot. It is still unclear if the Phase II editor will allow update. Of course, training will continue for staff involved in testing. Eliciting feedback from these testers is a valuable part of the process. In Phase II, LC hopes to encourage more involvement from library system vendors since most libraries depend on them now. Their support of BIBFRAME in the systems they market to libraries could prove essential to the success of the new model. LC will share information about its pilot project with others who have a vesting interest in the linked data environment and technology in libraries.

Cooperation in Research & Development
Linked Data for Production (LD4P) : Mellon Foundation funded projects to begin the transition to BIBFRAME Core members: Columbia University, Cornell University, Harvard University, the Library of Congress, Princeton University, and Stanford University The group will develop an Linked Open Data (LOD) bibliographic dataset based on BIBFRAME The overarching goal is to integrate library data with the much larger world of data in the semantic web Similar work in other communities for linked data include: Schema.org, CIDOC-Conceptual Reference Model, and the Europeana Data Model (EDM) Beyond LC, there are other things going on related to BIBFRAME too. With funds from the Mellon Foundation, several projects are being planned under the umbrella of “Linked Data For Production (LD4P). Core members of LD4P include Columbia, Cornell, Harvard, Princeton, and Stanford Universities, in addition to the Library of Congress. The group would like to develop an Linked Open Data (LOD) bibliographic dataset based on BIBFRAME. NOTE: I did not call this a database, since it would not be packaged in records like MARC. The goal, of course, is to integrate rich library data within the larger world of browsable information, in what is referred to as the “semantic web”. Besides the BIBFRAME project, there are similar and related research and development projects being conducted such as: OCLC’s Schema.org The CIDOC-Conceptual Reference Model, and Europeana Data Model The archival and museum communities are also interested in the concepts of linked data for the unique objects they collect. They have always had similar needs to libraries, but the uniqueness of their collections kept them from adopting MARC as a data standard.

Replacement of MARC Work on BIBFRAME Vocabulary 2.0 is ongoing
Long-term plans will focus on several aspects of library data Transform MARC record data to BIBFRAME linked data 18 million bibliographic records at LC 8 million authority records at LC Over 300 million bibliographic records in OCLC databases Enhance BIBFRAME for non-print material (AV, etc.) Shift the huge MARC-based automation infrastructure to one that is compatible with BIBFRAME Duplicate the acceptance and success of MARC BIBFRAME was first suggested as a standard framework for linked data to get libraries beyond (away from) MARC. MARC has been an extremely successful and broadly implemented standard within libraries. Frankly, it’s a hard act to follow Long-term plans to develop and implement a new standard (or model) involve: Transforming MARC data into linked data consistent with the BIBFRAME model; this means 18 million bibliographic records at LC, 8 million authority records at LC, and as many as 300 million records for OCLC (if they choose to transform all of their database records to linked data) Enhancing BIBFRAME to accommodate non-print material better, especially sound recordings, still and moving images Helping the hugely successful and important MARC-based automation infrastructure to evolve so as to embrace BIBFRAME. Hopefully, the success of MARC will be duplicated and surpassed.

Handling of Nonroman Data (CJK, and others)
The BIBFRAME editor accommodates nonroman data in most elements of bibliographic description and access There is nothing equivalent to the “880” embedded alternate graphic representation field in BIBFRAME Creators of linked bibliographic data have been imaginative in how and where they record information in nonroman scripts Test data should not be interpreted as prescriptive for how nonroman data will be represented once BIBFRAME is finalized Policies on handling data involving nonroman scripts (or any script) need to be agreed upon and implemented BIBFRAME testing includes staff working with nonroman data, such as that created for Chinese, Japanese, Korean, and other languages. The BIBFRAME model accommodates information in any language or script, just as the World Wide Web does, thanks to Unicode. Unicode is the universal character standard that now allows us to create, search, display, and print information in almost any modern script. A link can go to any information available through a browser. The BIBFRAME model and editor allow for nonroman data to be included but there is nothing like the MARC 880 “embedded field” technique for including alternate graphic representations of specific elements. BIBFRAME guidelines haven’t been developed yet specifying which elements to use for data in the original script and in roman transliteration. Testers are often choosing different places to record each. The nonroman data you find in LC’s test data right now is NOT intended to be prescriptive. It will guide developers in establishing policy. Decisions will be made related to the application of RDA instructions, use of the BIBFRAME model, and how best to handle information needing to be represented in more than one script. Since linked data can be accessed anywhere, and libraries may want to link to data globally, what script is recorded where is an important factor.

BIBFRAME Pilot Phase II (2017)
BIBFRAME Vocabulary 2.0 will be used 2.0 is expanded for library material, cultural heritage environment, & unique material (archival collections and museum objects) Testing will focus more on non-print material (especially still images and cartographic material) and serials Incorporate experience and input form partners in LD4P Explore the relationships between BIBFRAME & RDA Make improvements in the BIBFRAME editor to facilitate creation linked data in BIBFRAME Improvements for handling of nonroman data will be made I’ve mentioned the second BIBFRAME pilot several times. BIBFRAME Pilot Phase II is planned for the Library of Congress fiscal year 2017 which begins October 1, 2016. Phase II will not begin before October 2016. After the Phase I test, LC needs time to consider the results. As mentioned, enhancements to the BIBFRAME vocabulary are needed. Vocabulary 2.0 will be used in Phase II. The next phase of testing will likely expand to include other library material, & the cultural heritage environment (archival & museum objects) There will be more focus on non-print material, whose description and linkages are complex and often now involve links to digital objects. Phase II will also incorporate what we learn from LD4P partners. The relationships between BIBFRAME and RDA need to be explored. RDA is also a developing standard, although further along. Improving in the BIBFRAME editor will facilitate creation of linked data. Improved policies and techniques for creating and linking to nonroman data will be developed.

Conclusion BIBFRAME: represents a new distributed model for sharing linked bibliographic data through the web Will allow libraries to get away from centralized, monolithic data stores (cataloging databases) Traditional authority control will yield to broad-based identify management Cataloging will shift from primarily data transcription to focus on the linking of useful information Libraries and service providers must work together to develop and implement standards and protocols to promote the shift to BIBFRAME and linked data In conclusion: I hope that I’ve helped you to understand what BIBFRAME is and why libraries are working to develop it. BIBFRAME represents a new distributed model for sharing linked data, not just between libraries, but with any users of the World Wide Web. It will finally allow libraries to free their cataloging data from the centralized, isolated, monolithic data stores (OPACs) that have been built since the MARC formats were introduced in 1969. MARC databases have a wealth of information that has not been fully exploited. BIBFRAME will allow that to happen. Name and subject authority control as we know it now will yield to broad-based “identity management”. Cataloging as an art, will shift from being primarily the transcription of information to focusing on creating links. A key to the success of BIBFRAME, will be the cooperation of libraries and service providers (like library system vendors).

For more information about ongoing work at LC
Contact information: Randall Barry Library of Congress Asian & Middle Eastern Division 101 Independence Ave., SE Washington, DC For the BIBFRAME Pilot Project: Sally McCallum Beacher Wiggins If you’d like to learn more about LC’s testing of BIBFRAME, you can write me, but I am not a BIBFRAME expert. I’ve included my contact information here, but there are two other important people in the development of BIBFRAME at LC I recommend. Beacher Wiggins is the director of Acquisition & Bibliographic Access. That is the large unit at LC where material is acquired and cataloged. He is also acting chief of LC’s Policy & Standards Division (PSD) where cataloging policy is developed. BIBFRAME could have a large impact on ABA and cataloging. Test of BIBFRAME is being done in ABA now. BIBFRAME and RDA are key to PSD’s work, thus Beacher’s involvement. Sally McCallum is chief of the Network Development & MARC Standards Office. As the name of her office suggests, it has been the maintenance agency for the MARC formats for decades. It made sense that Sally head up the BIBFRAME development effort. She and her staff are handling the technical development of the BIBFRAME model, vocabularies, and the editor used for testing. Feel free to contact us. THANK YOU ALL FOR LISTENING!

Website – http://www.loc.gov/bibframe/
This last slide has an image to the official BIBFRAME website, where you can find links to a variety of things related to BIBFRAME and testing. I highly recommend you visit this site, if you want to know more. THANK YOU FOR GIVING ME THE OPPORTUNITY TO TALK ABOUT BIBFRAME.

Development & Testing of BIBFRAME at the Library of Congress

Similar presentations

Presentation on theme: "Development & Testing of BIBFRAME at the Library of Congress"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Development & Testing of BIBFRAME at the Library of Congress

Similar presentations

Presentation on theme: "Development & Testing of BIBFRAME at the Library of Congress"— Presentation transcript:

Similar presentations

About project

Feedback