OCLC Online Computer Library Center V irtual I nternational A uthority F ile Ed O’Neill Prepared with the assistance of Rick Bennett Australian Committee on Cataloguing Seminar Sydney, Australia, January 31, 2005
Background The IFLA Section on Cataloguing recognized the need for a international authority file: Where authority records from the world’s national bibliographic agencies could be linked Would be available via the Internet Would be a practical expansion of the concept of universal bibliographic control Would build on the work done by each national bibliographic agency Allowing national or regional variations in authorized form to co-exist Supporting worldwide user’s needs for variations in preferred language, script, and spelling
Background The VIAF could be one of the basic building blocks for a “semantic web” When combined with other controlled vocabularies and authority files from such sources as abstracting and indexing services, archives, museums, publishers, etc. Libraries now have an opportunity to make a great contribution to this future and should help make this vision a reality The VIAF be made freely available on the Web to users worldwide
Joint Project A project to test the concept of a VIAF is being jointly undertaken by: Die Deutsche Bibliothek (DDB) The Library of Congress (LC) OCLC Online Computer Library Center (OCLC)
VIAF Formally Approved in Berlin Beacher Wiggins Barbara Tillett Christel Renate Hengel-Dittrich Gömpel Elisabeth Neggemann Jay Jordan Ed O’Neill
Project Goal Demonstrate the feasibility of VIAF by linking the personal names authority records between: Personennormdatei (PND) Library of Congress Name Authority File (LCNAF)
What is the VIAF? The VIAF will be a file of metadata to link users from records in one national bibliographic agency’s personal name authority file to matching records in other national authority files The VIAF will provide for web access through a specially designed user interface The VIAF will support for multi-lingual and multi-script capability The VIAF will use Open Archive Initiative (OAI) protocols to harvest metadata from the agencies’ authority files, which would then be added to the shared servers to keep the file updated The system is being designed so that any number of authority files can be linked
The Problem In the LCNAF and PND authority files: A person may have the same established form in both authority files Different people may be assigned the same established form Different forms of the name may be established for the same person An particular person may not be established in both files
Two People – One Name Adams, Mike In the PND, the name is established for a golfer In LCNAF, the name is established for an author of a Beatles collector's guide
Two Names – One Person LC: Morel, Pierre PND: Morellus, Petrus
Brief LC Authorty 010 n DLC $c DLC $d DLC Larson, Jack. 670 Thomson, V. The cat, c1982: $b t.p. (Jack Larson)
Information in Bibliographic Records From the bibliographic records we gain significant additional information about Jack Larson: He is a lyricist His primary subject area is music He was published in the 80s and 90s by G. Schirmer and Belwin Mills in New York Worked with Virgil Thomson and Gerhard Samuel Jack Larson is the only name he has used on his publications Etc.
Project Phases Phase 1: Build enhances authority files for both PND and LC person names Phase 2: Match PND and LC enhances authority records to create the initial version of the VIAF Phase 3: Build OAI Server Phase 4: Ongoing maintenance and metadata harvesting using OAI protocols Phase 5: Build end user interface with unicode displays
Phase 1 Building the Enhanced Authority Files Authority records generally include very few, if any, details about the person and/or their publishing history The information is rarely sufficient to determine if two different authority records represent the same person To provide additional information to unambiguously match authority records for same author, information from bibliographic records is used to enhance the authority record
Enhancing the Authorities Bibliographic Record Derived Authority Record Enhanced Authority
Mining the Bibliographic Record LDR 00826ccm a ocm s1982 nyuuua n eng 10 $a $a DLC $c DLC 19 $a $c $ $a $b G. Schirmer 45 2 $b d $b d $b va01 $b ve01 $a ka $a M $b.T $a Thomson, Virgil, $d $a The cat : $b duet for soprano and baritone / $c Virgil Thomson ; [words by Jack Larson]. 260 $a New York : $b G. Schirmer, $c c $a 1 score (11 p.) ; $c 31 cm. 500 $a For soprano, baritone, and piano $a Vocal duets with piano $a Larson, Jack $x Musical settings $a Larson, Jack. Authors LC Control Number LC Classification Title Material Type Publisher Place of Publication Language Date of Publication Usage
Derived Authority Record 00525nz n xlc OCoLC nneanz||abbn n and d 4 40 $a OCoLC $b eng $c OCoLC $f viaf $a Larson, Jack $a $a the cat $b duet for soprano and baritone $a g schirmer $a nyu $a jack larson $a eng $a $a 198x $a cm $a thomson, virgil $d 1896 All text is normalized Subjects are grouped into broad subject areas Material type is codedPublication date is by decadeCoauthor
90x Control numbers 901 ISBN $a Numeric portion of ISBN 902 ISSN $a Numeric portion of ISSN 903 LCCN $a Numeric portion of LCCN
91x Title fields 910 Title from 245, Subfields a & b 911 Abbreviated title from 210, Subfields a & b 913 Uniform title from 240, Subfields a & b 914 Translated title from 242, Subfields a & b 915 Collective uniform title from 243, All subfields 916 Variant title from 246, Subfields a & b 917 Uniform Title Extracted from Name/Title authorities, field 100 $t
92x Publisher fields 920 Publisher number (Publisher number from ISBN) 921 Publisher name (Publisher name from the 260 $b or 533 $c) 922 Place of publication (Country of publication code from 008 field)
93x Usage 930 Name Usage (Form of name found in the statement of responsibility, 245 subfield $c)
94x Attributes 940 Language (Language code from the 008 or 041 subfield $a) 941 Author's role (Relater code from 700, subfields $e and/or $4) 942 North American Title Count subject (NATC survey line number) 943 Decade of publication 944 Format (Type and bib level) 945 Broader Subject Area
95x Joint Authors 950 Personal Authors (From either the 100 or 700 fields) 951 Corporate Authors
96x Names as Subjects 960 Name as Subject
99x Number of Records 999 Number of Associated bibliographic records –$a Total number of associated bibliographic records –$b Bibliographic Record Control Number –$2 Source of Bibliographic Record
Enhanced Authority Record 00824nz n oca n| acannaab| |n aaa ||| 3 10 $a n $a DLC $c DLC $d DLC $a Larson, Jack $a Thomson, V. The cat, c1982: $b t.p. (Jack Larson) $a $ $a $ $a the cat $b duet for soprano and baritone $ $a sun like $b on a poem by jack larson $ $a g schirmer $ $a belwin mills publ corp $ $a nyu $ $a jack larson $ $a eng $ $a 234 $ $a 198x $ $a 197x $ $a cm $ $a thomson, virgil $d 1896 $ $a samuel, gerhard $9 1
LC Bibliographic Records Number of records: 7,612,979 Personal Names assigned: 6,318,094 Unique Personal Names: 2,554,266
LCNAF Personal Name Authorities Differentiated names: 3,834,162 Undifferentiated names: 37,990 Total authority records:3,872,152
LC Names Established Names 3,834,162 Names from Bib Records 2,554,266 Uncontrolled Names 394,951 Orphaned Names 1,674, 847 Active Established Names 2,159,315
DDB Bibliographic Records Die Deutsche Bibliothek (DDB): 6,316,675 Bibliotheksverbund Bayern (BVB): 5,022,316 Total number of records: 11,338,991 Number of assignments: 12,080,387 Number of unique names: 2,371,461
DDB Names Established Names 2,498,071 Names from Bib Records 2,371,461 Uncontrolled Names 313,931 Orphaned Names 440,541 Active Established Names 2,057,530
Phase 2 Matching the Enhanced Authorities
Linking Retrospective Files Matching Algorithms Enhanced LCNAF Authorities Enhanced PND Authorities VIAF Authorities
Matching Objectives Each distinct author should be uniquely identified. Author: An individual person responsible for the intellectual or artistic content of a work. Established Names: A symbol (character string) used to represent an author. Names will not necessarily be the same in the LCNAF and the PND authority files.
Matching LCNAF PND ‑ ‑
Name Matching To be considered for a match, two names must be consistent: Smith, J. William Are Consistent Smith, John Smith, J. William Are Inconsistent Smith, John Q.
Strong Matching Attributes A work (title) in common Common controls numbers (ISBN, ISSN, or LCCN) Dates; the combination of birth and death year--A moderate match score value is given for matching birth dates Joint Authors Distinct form alternate name For example, LC has 100 Schade, Peter, $d Mosellanus, Petrus, $d While PND has 100 Mosellanus, Petrus, $d Schade, Peter, $d
Weaker Attributes Role (Author, Illustrator, composer, etc. Subject Area of Publications Format (Books, Films, Musical scores, etc.) Language Country Date of publications
Similarity Measure The total similarity measure, is a weighted sum of the of the individual attribute matches A similarity measure is only computed for consistent names The weighting factor is lower for the weaker attributes and higher for the stronger attributes Care is taken to avoid double counting or using scores that are correlated
Similarity Metric oca | X | DDB n| acannaab| |n aaa ||| | n | |||az|nnaa|||||||||||| a|aba|||| d DLC $c DLC $d DLC | X $2 GyFmDB Tarrant, John, $d | DDB $b ger $d 9999 $f RAK-PND The light inside the dark, 1998: $b CIP t.p. (John | Tarrant, John Tarrant) data sheet (John M. Tarrant; b. 1949) | $ $9 1 | licht im herzen der dunkelheit $b die nacht der seele $9 1 | und der weg zur erleuchtung $ the light inside the dark $b zen soul and the | the light inside the dark $9 1 spiritual life $9 1 | $ $9 1 | goldmann $ harpercollins publishers $9 1 | gw $ nyu$9 1 | john tarrant $ john tarrant $9 1 | ger$ eng$9 1 | x$ $9 1 | am$ x$9 1 | $b $2 DDB am$ $b ocm $2 DLC Tarrant, John, $d Tarrant, John the light inside the dark $b zen soul and the spiritual life the light inside the dark harpercollins publishersgoldmann Similarity Metric = 0.89
Future of VIAF? If the proof-of-concept is successful, the VIAF will be expanded: To include other authority files for personal names, To include other types of authorities – Corporate names, – Geographic names, – etc.
First VIAF Record
Phase 3: Build OAI Server LCNAF DDB/PND OAI Server(s) Slide Courtesy of Barbara Tillett, Library of Congress
Phase 4: Ongoing maintenance and metadata harvesting using OAI protocols Slide Courtesy of Barbara Tillett, Library of Congress
Phase 5: Build End User Interface with unicode displays User’s cookie specifies hongul is preferred. Display 700 form, building on local system’s authority structure Slide Courtesy of Barbara Tillett, Library of Congress
Questions? Thank you