Download presentation
Presentation is loading. Please wait.
Published byKatrina Olivia Hall Modified over 8 years ago
1
IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz
2
WorldCat Today Resources in nearly all languages Contributed by more than 20,000 libraries worldwide More than half the database is for works not in English
3
Bibliographic Records – Hybrid records – Parallel records Clustered at Work level (FRBR) WorldCat Today
4
Existing Architecture Authors Subj Classif Subj Classif Subj Classif Holding Holdings Bibliographic record Work cluster Content cluster Manifes tation cluster
5
Complementary Initiatives Work Level Record GLIMIR Manifestation & Content Clusters GLIMIR Manifestation & Content Clusters Multi-lingual Bibliographic Structure
6
Objective: Work Level Record Create a consolidated metadata summary for the content of a work
7
Work Level Record http://www.oclc.org/research/activities/workrecs.html Coming Q1 2015
8
GLIMIR: Objective Create better work presentations
9
The Content Cluster – Enables better work record displays by reducing the number of lines that display for large works – Enables a choice of format and presents the formats that could be acceptable substitutes – Consolidates holdings for identical content The Manifestation Cluster is important – Consolidates holdings at manifestation level – In the short term allows the record catalogued in the language of the interface to be chosen for display – Reduces apparent duplication – Allows a more accurate count of the number of manifestations in WorldCat (as opposed to the number of records) GLIMIR Users like Cataloguers & scholars like
10
Manifestation Clustering So far 103 million records processed (about 30%)
11
Manifestation Cluster Opened
12
SRU Search: Loti Pêcheur d’islande (Work ID 21536567) RecordsHoldings Work18148 Content14143 Manifestation7115
13
Objective: Improve displays; surface translations Multilingual Bibliographic Structure Project
14
Creates true multi-lingual displays – At work and manifestation levels – Using all available data instead of “most appropriate record” – Generates data Corrects many of the 28 million records coded “und” Better control and linking of translations Input to refinement of work clusters Smarter data storage Multilingual Bibliographic Structure Project
15
Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why “Most appropriate” questioned
16
Which record is better to present to a German speaker?
17
Incomplete Swedish Record
18
Hybrid record
19
Build the display from all available data Most appropriate display
20
Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata. Multilingual Bibliographic Structure Project End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records
21
Proposed new architecture Work eng fre ger jpn Manif eng Manif eng Manif eng Manif eng Manif eng Manif engA o fre Notes Contents ++ Holding Subj sif Subj Classif eng fre ger jpn Authors eng fre ger jpn eng fre ger jpn eng fre ger jpn Translations (Language of work) Manif fre Holding
22
Language tagging of elements, particularly – Summaries (M21 520) – Subject headings Display in script preferred by the user if data is available Improve translated interfaces Show consolidated holdings as appropriate Important principles
27
Surfacing the “cream” Translations
28
The cream of the world’s cultural and knowledge heritage is shared by being translated WorldCat contains many rich cataloguing records for these translations Great works are translated GOAL: Data mine the really good records to improve clustering, presentation, authority records and linked data
29
Ιλιάδα The Iliad 紅樓夢 Dream of the Red Chamber Война и миръ War and Peace ঘরে বাইরে The Home and the World સત્યના પ્રયોગો અથવા આત્મકથા The Story of My Experiments with Truth [Gandhi autobiography] The Tale of Genji דער בעל-תשובה The Penitent زقاق المدق Midaq Alley
30
Leo Tolstoy: 32 languages Homer: 28 languages Rabindranath Tagore: 21 Isaac Bashevis Singer: 17 Naji ̄ b Maḥfu ̄ ẓ: 12 languages Cao Xueqin: 9 languages Mahatma Gandhi: 7 languages Murasaki Shikabu: 7 languages Translations
31
Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results – Titles without subtitles – Missing or different forms of uniform title – Inverted title – Different coding of original and translated information Improving work clustering Generated uniform title authority records will overcome most of these differences without needing to edit individual records
32
Addition of xR records to VIAF Before After
33
UNESCO Translation Database
35
XR VIAF Record VIAF ID for Author Translated title Translator
39
IFLA - Lyon, France 19 August 2014 VIAF Linked Data New Information
40
Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date:1982-1984 IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date:1982-1984 IsTranslationOf: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf:
41
# Original Work (in Chinese) http://worldcat.org/entity/work/id/1215997 a schema:CreativeWork; schema:creator ; # "Gao, Xingjian”http://viaf.org/viaf/102266649 schema:inLanguage "zh"; schema:name " 靈山 "@zh;. # Translated Work (in English) http://worldcat.org/entity/work/id/145209748> a schema:CreativeWork; schema:creator ; # "Gao, Xingjian“http://viaf.org/viaf/102266649 [new]:translator ; # "Lee, Mabel"http://viaf.org/viaf/81663420 schema:inLanguage "en"; schema:name "Soul Mountain"@en ; [new]:translationOfWork “http://worldcat.org/entity/work/id/1215997 Markup for the Semantic Web
42
Understanding information sharing across cultures What percentage of non-English works are translations of English works, and vice-versa? Which authors are translated the most? Which works have been translated into the most languages? Which countries translate the most English works, the most non-English works? Which countries translate a new work the fastest? Etc. http://www.oclc.org/research/activities/multilingual-bib-structure.html
43
Where are we now? Clustering Work clusters done; ongoing refinement GLIMIR clustering done for all [simple] text; – 103 million records have GLIMIR IDs Working on collected works Displays Working on VIAF expression displays Work level displays in WorldCat.org ++ Data Mining for translations
44
Explore. Share. Magnify. Janifer Gatenby EMEA Program Manager Metadata Janifer.gatenby@oclc.orgoclc.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.