Authorities Futures T Hickey OCLC
Why authorities?
Searching
Browsing
Variations on Tchaikovsky NACO: Tchaikovsky, Peter Ilich, German: Čajkovskij, Pëtr I French: ČajkovskijPiotr Ilʹič Cyrillic: Чайкoвский, Пётр Ильич ( )
More ways to say Chajkowskii Ciaikovsky, Piotr Ilic Tschaikowsky, Peter Iljitch Tchaikowsky, Peter Iljitch Ciaikovsky, Pjotr Iljc Cajkovskij, P. I Tsjaikovsky, Peter Iljitsj Czajkowski, Piotr Chaikovsky, P. I Csajkovszkij, Pjotr Iljics Tsjaïkovskiej, Pjotr Iljietsj Tjajkovskij, Pjotr Ilitj Čaikovskis, P Chaĭkovskiĭ, Petr Ilʹich Tchaikovski, P Tchaikovski, Piotr Ilyitch Chaĭkovskiĭ, P Tchaikovsky, P Tchaïkovsky, Piotr Ilitch Tschaikowsky, Pjotr Iljitsch Tschajkowskij, Pjotr Iljitsch Tchaïkovski, P. I Ciaikovskij, Piotr Ciaikovskji, Piotr Ilijich Tschaikowski, P. I Tschaikowski, Peter Illic Tjajkovskij, Peter Chaĭkovski, Pʹotr Ilich Tschaikousky Tschaijkowskij, P. I Tschaikowsky, P. I Chaĭkovski, P. I Tchaïkovski, Petr Ilitch Ciaikovski, Peter Ilic Tschaikowski, Pjotr Tchaikowsky, Pyotr Sinopov, P Tchaikovskij, Piotr Ilic
Wider coverage Published, unpublished, objects, licensed, archival Multiple sources Machine generated Info. professionals, scholars, researchers, enthusiasts Broader use of APIs Multiple views Better context Better navigation More mashups Authorities touch everything
33 Nodes 132 CPUs 528 Gigabytes memory 33 Terabytes disk 100-fold speed up 1 hour <1 minute 1 day 15 minutes 1 month 8 hours
Controlling WorldCat Virtual International Authority File WorldCat Identities
Controlling names in WorldCat Has been done semi-manually – Encourages review of all links For Identities we did this automatically – Research copy of WorldCat – Very aggressive matching How to move links to WorldCat?
Pretend you are a Connexion Client Program to: – Log in – Search for record – Verify heading hasnt changed – Insert authorized form – Add link – Do replace
Then just replace 26 million records Each update takes two transactions – Retrieve the record – Replace the record If it takes 2 seconds/update – 52,000,000 seconds – ~ 2 years
But, we can run multiple clients Connexion can handle 40+ of these clients – ~ 20 records/second Offline processing has limited capacity – Run 32 clients for 12 hours for 16 updates/second – ~700,000 overnight – Up to a million/day 3 million/week 2-3 months elapsed time
Virtual International Authority File
VIAF DNB Bib & AuthorityBnF Bib & AuthorityLC Bib & Authority VIAF ~7.5 million personal name authority records ~25 million bibliographic records ~1.2 million links between files
Match on Names and dates in headings Standard numbers Titles Coauthors Publishers Personal name as subject
Matching situations
Hickey, Thomas Butler, d 1947-
Dempsey, Lorcan
Tchaikovsky, Peter Ilich C ̌ ajkovskij, Pe ̈ tr I. C ̌ ajkovskij, Pe ̈ tr I./ Tchaikovsky, Peter Ilich/ Чайкoвский, Пётр Ильич
Fournier, Marcel Fournier, Marcel,1946- Fournier, Marcel,1945-
What makes a match? 1,338,606 Title 526,234 Double date 67,749 Joint author 47,499 LCCN 15,867 Partial date and partial title 6,454 Partial date and publisher 4,673 Partial title and publisher 4,116 Name as subject 2,158 Standard number
Next steps for VIAF Merged display Better documentation More participants Geographics
New Zealand Identities (in WorldCat) 82,868Mahy, Margaret 73,871Mansfield, Katherine 53,779Marsh, Ngaio 52,876Cowley, Joy 23,009Frame, Janet 11,986Park, Ruth
Australian Identities (in WorldCat) 51,399Keneally, Thomas 42,679Fox, Mem 30,301Travers, P. L. 28,998Lindsay, Jack 19,179Marsden, John 16,688Stead, Christina 15,041Malouf, David 14,717Jennings, Paul 13,769Lawson, Henry 12,612Winton, Tim
Editing
Merged result Immediately visible in Identities Persistent in Identities Information fed into established channels
Name Finder
Implementation SRU/SRW server (Z39.50 for the Web) XML returned XSLT style sheets transform it to HTML
Syndication Searchable via SRU, OpenURL Sitemaps for harvesters HTML for harvesters and mobile devices Links in Wikipedia
More Identities
Gods
Other identities
Thomas Hickey Chief Scientist OCLC