IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz
WorldCat Today Resources in nearly all languages Contributed by more than 20,000 libraries worldwide More than half the database is for works not in English
Bibliographic Records – Hybrid records – Parallel records Clustered at Work level (FRBR) WorldCat Today
Existing Architecture Authors Subj Classif Subj Classif Subj Classif Holding Holdings Bibliographic record Work cluster Content cluster Manifes tation cluster
Complementary Initiatives Work Level Record GLIMIR Manifestation & Content Clusters GLIMIR Manifestation & Content Clusters Multi-lingual Bibliographic Structure
Objective: Work Level Record Create a consolidated metadata summary for the content of a work
Work Level Record Coming Q1 2015
GLIMIR: Objective Create better work presentations
The Content Cluster – Enables better work record displays by reducing the number of lines that display for large works – Enables a choice of format and presents the formats that could be acceptable substitutes – Consolidates holdings for identical content The Manifestation Cluster is important – Consolidates holdings at manifestation level – In the short term allows the record catalogued in the language of the interface to be chosen for display – Reduces apparent duplication – Allows a more accurate count of the number of manifestations in WorldCat (as opposed to the number of records) GLIMIR Users like Cataloguers & scholars like
Manifestation Clustering So far 103 million records processed (about 30%)
Manifestation Cluster Opened
SRU Search: Loti Pêcheur d’islande (Work ID ) RecordsHoldings Work18148 Content14143 Manifestation7115
Objective: Improve displays; surface translations Multilingual Bibliographic Structure Project
Creates true multi-lingual displays – At work and manifestation levels – Using all available data instead of “most appropriate record” – Generates data Corrects many of the 28 million records coded “und” Better control and linking of translations Input to refinement of work clusters Smarter data storage Multilingual Bibliographic Structure Project
Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why “Most appropriate” questioned
Which record is better to present to a German speaker?
Incomplete Swedish Record
Hybrid record
Build the display from all available data Most appropriate display
Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata. Multilingual Bibliographic Structure Project End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records
Proposed new architecture Work eng fre ger jpn Manif eng Manif eng Manif eng Manif eng Manif eng Manif engA o fre Notes Contents ++ Holding Subj sif Subj Classif eng fre ger jpn Authors eng fre ger jpn eng fre ger jpn eng fre ger jpn Translations (Language of work) Manif fre Holding
Language tagging of elements, particularly – Summaries (M21 520) – Subject headings Display in script preferred by the user if data is available Improve translated interfaces Show consolidated holdings as appropriate Important principles
Surfacing the “cream” Translations
The cream of the world’s cultural and knowledge heritage is shared by being translated WorldCat contains many rich cataloguing records for these translations Great works are translated GOAL: Data mine the really good records to improve clustering, presentation, authority records and linked data
Ιλιάδα The Iliad 紅樓夢 Dream of the Red Chamber Война и миръ War and Peace ঘরে বাইরে The Home and the World સત્યના પ્રયોગો અથવા આત્મકથા The Story of My Experiments with Truth [Gandhi autobiography] The Tale of Genji דער בעל-תשובה The Penitent زقاق المدق Midaq Alley
Leo Tolstoy: 32 languages Homer: 28 languages Rabindranath Tagore: 21 Isaac Bashevis Singer: 17 Naji ̄ b Maḥfu ̄ ẓ: 12 languages Cao Xueqin: 9 languages Mahatma Gandhi: 7 languages Murasaki Shikabu: 7 languages Translations
Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results – Titles without subtitles – Missing or different forms of uniform title – Inverted title – Different coding of original and translated information Improving work clustering Generated uniform title authority records will overcome most of these differences without needing to edit individual records
Addition of xR records to VIAF Before After
UNESCO Translation Database
XR VIAF Record VIAF ID for Author Translated title Translator
IFLA - Lyon, France 19 August 2014 VIAF Linked Data New Information
Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date: IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date: IsTranslationOf: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf:
# Original Work (in Chinese) a schema:CreativeWork; schema:creator ; # "Gao, Xingjian” schema:inLanguage "zh"; schema:name " 靈山 # Translated Work (in English) a schema:CreativeWork; schema:creator ; # "Gao, Xingjian“ [new]:translator ; # "Lee, Mabel" schema:inLanguage "en"; schema:name "Soul ; [new]:translationOfWork “ Markup for the Semantic Web
Understanding information sharing across cultures What percentage of non-English works are translations of English works, and vice-versa? Which authors are translated the most? Which works have been translated into the most languages? Which countries translate the most English works, the most non-English works? Which countries translate a new work the fastest? Etc.
Where are we now? Clustering Work clusters done; ongoing refinement GLIMIR clustering done for all [simple] text; – 103 million records have GLIMIR IDs Working on collected works Displays Working on VIAF expression displays Work level displays in WorldCat.org ++ Data Mining for translations
Explore. Share. Magnify. Janifer Gatenby EMEA Program Manager Metadata