Presentation is loading. Please wait.

Presentation is loading. Please wait.

OCLC Online Computer Library Center V irtual I nternational A uthority F ile Ed O’Neill Prepared with the assistance of Rick Bennett Australian Committee.

Similar presentations


Presentation on theme: "OCLC Online Computer Library Center V irtual I nternational A uthority F ile Ed O’Neill Prepared with the assistance of Rick Bennett Australian Committee."— Presentation transcript:

1 OCLC Online Computer Library Center V irtual I nternational A uthority F ile Ed O’Neill Prepared with the assistance of Rick Bennett Australian Committee on Cataloguing Seminar Sydney, Australia, January 31, 2005

2 Background The IFLA Section on Cataloguing recognized the need for a international authority file: Where authority records from the world’s national bibliographic agencies could be linked Would be available via the Internet Would be a practical expansion of the concept of universal bibliographic control Would build on the work done by each national bibliographic agency Allowing national or regional variations in authorized form to co-exist Supporting worldwide user’s needs for variations in preferred language, script, and spelling

3 Background The VIAF could be one of the basic building blocks for a “semantic web” When combined with other controlled vocabularies and authority files from such sources as abstracting and indexing services, archives, museums, publishers, etc. Libraries now have an opportunity to make a great contribution to this future and should help make this vision a reality The VIAF be made freely available on the Web to users worldwide

4 Joint Project A project to test the concept of a VIAF is being jointly undertaken by: Die Deutsche Bibliothek (DDB) The Library of Congress (LC) OCLC Online Computer Library Center (OCLC)

5 VIAF Formally Approved in Berlin Beacher Wiggins Barbara Tillett Christel Renate Hengel-Dittrich Gömpel Elisabeth Neggemann Jay Jordan Ed O’Neill

6 Project Goal Demonstrate the feasibility of VIAF by linking the personal names authority records between: Personennormdatei (PND) Library of Congress Name Authority File (LCNAF)

7 What is the VIAF? The VIAF will be a file of metadata to link users from records in one national bibliographic agency’s personal name authority file to matching records in other national authority files The VIAF will provide for web access through a specially designed user interface The VIAF will support for multi-lingual and multi-script capability The VIAF will use Open Archive Initiative (OAI) protocols to harvest metadata from the agencies’ authority files, which would then be added to the shared servers to keep the file updated The system is being designed so that any number of authority files can be linked

8 The Problem In the LCNAF and PND authority files: A person may have the same established form in both authority files Different people may be assigned the same established form Different forms of the name may be established for the same person An particular person may not be established in both files

9 Two People – One Name Adams, Mike In the PND, the name is established for a golfer In LCNAF, the name is established for an author of a Beatles collector's guide

10 Two Names – One Person LC: Morel, Pierre PND: Morellus, Petrus

11 Brief LC Authorty 010 n 84044261 040 DLC $c DLC $d DLC 100 1 Larson, Jack. 670 Thomson, V. The cat, c1982: $b t.p. (Jack Larson)

12 Information in Bibliographic Records From the bibliographic records we gain significant additional information about Jack Larson: He is a lyricist His primary subject area is music He was published in the 80s and 90s by G. Schirmer and Belwin Mills in New York Worked with Virgil Thomson and Gerhard Samuel Jack Larson is the only name he has used on his publications Etc.

13 Project Phases Phase 1: Build enhances authority files for both PND and LC person names Phase 2: Match PND and LC enhances authority records to create the initial version of the VIAF Phase 3: Build OAI Server Phase 4: Ongoing maintenance and metadata harvesting using OAI protocols Phase 5: Build end user interface with unicode displays

14 Phase 1 Building the Enhanced Authority Files Authority records generally include very few, if any, details about the person and/or their publishing history The information is rarely sufficient to determine if two different authority records represent the same person To provide additional information to unambiguously match authority records for same author, information from bibliographic records is used to enhance the authority record

15 Enhancing the Authorities Bibliographic Record Derived Authority Record Enhanced Authority

16 Mining the Bibliographic Record LDR 00826ccm 2200289 a 4500 1 ocm10025532 5 20031229650847.0 8 840627s1982 nyuuua n eng 10 $a 84758340 40 $a DLC $c DLC 19 $a 17706440 20 $c $2.95 28 22 $a 48418 $b G. Schirmer 45 2 $b d198006 $b d198007 48 $b va01 $b ve01 $a ka01 50 00 $a M1529.3 $b.T 100 1 $a Thomson, Virgil, $d 1896- 245 14 $a The cat : $b duet for soprano and baritone / $c Virgil Thomson ; [words by Jack Larson]. 260 $a New York : $b G. Schirmer, $c c1982. 300 $a 1 score (11 p.) ; $c 31 cm. 500 $a For soprano, baritone, and piano. 650 0 $a Vocal duets with piano. 600 10 $a Larson, Jack $x Musical settings. 700 1 $a Larson, Jack. Authors LC Control Number LC Classification Title Material Type Publisher Place of Publication Language Date of Publication Usage

17 Derived Authority Record 00525nz 2200229n 4500 0 1 xlc 1 1 3 OCoLC 2 5 20040721111415.0 3 8 040721nneanz||abbn n and d 4 40 $a OCoLC $b eng $c OCoLC $f viaf 5 100 1 $a Larson, Jack. 6 903 $a 84758340 7 910 14 $a the cat $b duet for soprano and baritone 8 921 $a g schirmer 9 922 $a nyu 10 930 $a jack larson 11 940 $a eng 12 942 $a 234 13 943 $a 198x 14 944 $a cm 15 950 1 $a thomson, virgil $d 1896 All text is normalized Subjects are grouped into broad subject areas Material type is codedPublication date is by decadeCoauthor

18 90x Control numbers 901 ISBN $a Numeric portion of ISBN 902 ISSN $a Numeric portion of ISSN 903 LCCN $a Numeric portion of LCCN

19 91x Title fields 910 Title from 245, Subfields a & b 911 Abbreviated title from 210, Subfields a & b 913 Uniform title from 240, Subfields a & b 914 Translated title from 242, Subfields a & b 915 Collective uniform title from 243, All subfields 916 Variant title from 246, Subfields a & b 917 Uniform Title Extracted from Name/Title authorities, field 100 $t

20 92x Publisher fields 920 Publisher number (Publisher number from ISBN) 921 Publisher name (Publisher name from the 260 $b or 533 $c) 922 Place of publication (Country of publication code from 008 field)

21 93x Usage 930 Name Usage (Form of name found in the statement of responsibility, 245 subfield $c)

22 94x Attributes 940 Language (Language code from the 008 or 041 subfield $a) 941 Author's role (Relater code from 700, subfields $e and/or $4) 942 North American Title Count subject (NATC survey line number) 943 Decade of publication 944 Format (Type and bib level) 945 Broader Subject Area

23 95x Joint Authors 950 Personal Authors (From either the 100 or 700 fields) 951 Corporate Authors

24 96x Names as Subjects 960 Name as Subject

25 99x Number of Records 999 Number of Associated bibliographic records –$a Total number of associated bibliographic records –$b Bibliographic Record Control Number –$2 Source of Bibliographic Record

26 Enhanced Authority Record 00824nz 2200301n 4500 0 1 oca01144962 1 5 19840809154202.7 2 8 840702n| acannaab| |n aaa ||| 3 10 $a n 84044261 4 40 $a DLC $c DLC $d DLC 5 100 1 $a Larson, Jack. 6 670 $a Thomson, V. The cat, c1982: $b t.p. (Jack Larson) 7 903 $a 84758340 $9 1 8 903 $a 93710923 $9 1 9 910 11 $a the cat $b duet for soprano and baritone $9 1 10 910 11 $a sun like $b on a poem by jack larson $9 1 11 921 $a g schirmer $9 1 12 921 $a belwin mills publ corp $9 2 13 922 $a nyu $9 2 14 930 $a jack larson $9 1 15 940 $a eng $9 2 16 942 $a 234 $9 2 17 943 $a 198x $9 1 18 943 $a 197x $9 1 19 944 $a cm $9 2 20 950 11 $a thomson, virgil $d 1896 $9 1 21 950 11 $a samuel, gerhard $9 1

27 LC Bibliographic Records Number of records: 7,612,979 Personal Names assigned: 6,318,094 Unique Personal Names: 2,554,266

28 LCNAF Personal Name Authorities Differentiated names: 3,834,162 Undifferentiated names: 37,990 Total authority records:3,872,152

29 LC Names Established Names 3,834,162 Names from Bib Records 2,554,266 Uncontrolled Names 394,951 Orphaned Names 1,674, 847 Active Established Names 2,159,315

30 DDB Bibliographic Records Die Deutsche Bibliothek (DDB): 6,316,675 Bibliotheksverbund Bayern (BVB): 5,022,316 Total number of records: 11,338,991 Number of assignments: 12,080,387 Number of unique names: 2,371,461

31 DDB Names Established Names 2,498,071 Names from Bib Records 2,371,461 Uncontrolled Names 313,931 Orphaned Names 440,541 Active Established Names 2,057,530

32 Phase 2 Matching the Enhanced Authorities

33 Linking Retrospective Files Matching Algorithms Enhanced LCNAF Authorities Enhanced PND Authorities VIAF Authorities

34 Matching Objectives Each distinct author should be uniquely identified. Author: An individual person responsible for the intellectual or artistic content of a work. Established Names: A symbol (character string) used to represent an author. Names will not necessarily be the same in the LCNAF and the PND authority files.

35 Matching LCNAF PND   ‑   ‑ 

36 Name Matching To be considered for a match, two names must be consistent: Smith, J. William Are Consistent Smith, John Smith, J. William Are Inconsistent Smith, John Q.

37 Strong Matching Attributes A work (title) in common Common controls numbers (ISBN, ISSN, or LCCN) Dates; the combination of birth and death year--A moderate match score value is given for matching birth dates Joint Authors Distinct form alternate name For example, LC has 100 Schade, Peter, $d 1493-1524 400 Mosellanus, Petrus, $d 1493-1524 While PND has 100 Mosellanus, Petrus, $d 1493-1524 400 Schade, Peter, $d 1493-1524

38 Weaker Attributes Role (Author, Illustrator, composer, etc. Subject Area of Publications Format (Books, Films, Musical scores, etc.) Language Country Date of publications

39 Similarity Measure The total similarity measure, is a weighted sum of the of the individual attribute matches A similarity measure is only computed for consistent names The weighting factor is lower for the weaker attributes and higher for the stronger attributes Care is taken to avoid double counting or using scores that are correlated

40 Similarity Metric 1 001 oca04693556 | 1 001 12231638X 2 005 19980327132122.5 | 2 003 DDB 3 008 980327n| acannaab| |n aaa ||| | 3 005 20000926224921.0 4 010 n 98029633 | 4 008 000825|||az|nnaa|||||||||||| a|aba|||| d 5 040 DLC $c DLC $d DLC | 5 016 12231638X $2 GyFmDB 6 100 1 Tarrant, John, $d 1949- | 6 040 DDB $b ger $d 9999 $f RAK-PND 7 670 The light inside the dark, 1998: $b CIP t.p. (John | 7 100 1 Tarrant, John Tarrant) data sheet (John M. Tarrant; b. 1949) | 8 901 344221568 $9 1 8 901 006017219 $9 1 | 9 910 11 licht im herzen der dunkelheit $b die nacht der seele 9 903 98017676 $9 1 | und der weg zur erleuchtung $9 1 10 910 11 the light inside the dark $b zen soul and the | 10 913 11 the light inside the dark $9 1 spiritual life $9 1 | 11 920 3-442 $9 1 11 920 0-06 $9 1 | 12 921 goldmann $9 1 12 921 harpercollins publishers $9 1 | 13 922 gw $9 1 13 922 nyu$9 1 | 14 930 john tarrant $9 1 14 930 john tarrant $9 1 | 15 940 ger$9 1 15 940 eng$9 1 | 16 943 200x$9 1 16 942 26$9 1 | 17 944 am$9 1 17 943 199x$9 1 | 18 999 1$b 959703160 $2 DDB 18 944 am$9 1 19 999 1$b ocm38948253 $2 DLC 100 1 Tarrant, John, $d 1949- 100 1 Tarrant, John the light inside the dark $b zen soul and the spiritual life the light inside the dark harpercollins publishersgoldmann Similarity Metric = 0.89

41 Future of VIAF? If the proof-of-concept is successful, the VIAF will be expanded: To include other authority files for personal names, To include other types of authorities – Corporate names, – Geographic names, – etc.

42 First VIAF Record          

43 Phase 3: Build OAI Server LCNAF DDB/PND OAI Server(s) Slide Courtesy of Barbara Tillett, Library of Congress

44 Phase 4: Ongoing maintenance and metadata harvesting using OAI protocols Slide Courtesy of Barbara Tillett, Library of Congress

45 Phase 5: Build End User Interface with unicode displays User’s cookie specifies hongul is preferred. Display 700 form, building on local system’s authority structure Slide Courtesy of Barbara Tillett, Library of Congress

46 Questions? Thank you oneill@oclc.org http://www.oclc.org/research/projects/viaf


Download ppt "OCLC Online Computer Library Center V irtual I nternational A uthority F ile Ed O’Neill Prepared with the assistance of Rick Bennett Australian Committee."

Similar presentations


Ads by Google