Presentation is loading. Please wait.

Presentation is loading. Please wait.

IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress,

Similar presentations


Presentation on theme: "IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress,"— Presentation transcript:

1 IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz

2 WorldCat Today Resources in nearly all languages Contributed by more than 20,000 libraries worldwide More than half the database is for works not in English

3 Bibliographic Records – Hybrid records – Parallel records Clustered at Work level (FRBR) WorldCat Today

4 Existing Architecture Authors Subj Classif Subj Classif Subj Classif Holding Holdings Bibliographic record Work cluster Content cluster Manifes tation cluster

5 Complementary Initiatives Work Level Record GLIMIR Manifestation & Content Clusters GLIMIR Manifestation & Content Clusters Multi-lingual Bibliographic Structure

6 Objective: Work Level Record Create a consolidated metadata summary for the content of a work

7 Work Level Record http://www.oclc.org/research/activities/workrecs.html Coming Q1 2015

8 GLIMIR: Objective Create better work presentations

9 The Content Cluster – Enables better work record displays by reducing the number of lines that display for large works – Enables a choice of format and presents the formats that could be acceptable substitutes – Consolidates holdings for identical content The Manifestation Cluster is important – Consolidates holdings at manifestation level – In the short term allows the record catalogued in the language of the interface to be chosen for display – Reduces apparent duplication – Allows a more accurate count of the number of manifestations in WorldCat (as opposed to the number of records) GLIMIR Users like  Cataloguers & scholars like 

10 Manifestation Clustering So far 103 million records processed (about 30%)

11 Manifestation Cluster Opened

12 SRU Search: Loti Pêcheur d’islande (Work ID 21536567) RecordsHoldings Work18148 Content14143 Manifestation7115

13 Objective: Improve displays; surface translations Multilingual Bibliographic Structure Project

14 Creates true multi-lingual displays – At work and manifestation levels – Using all available data instead of “most appropriate record” – Generates data Corrects many of the 28 million records coded “und” Better control and linking of translations Input to refinement of work clusters Smarter data storage Multilingual Bibliographic Structure Project

15 Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why “Most appropriate” questioned

16 Which record is better to present to a German speaker?

17 Incomplete Swedish Record

18 Hybrid record

19 Build the display from all available data Most appropriate display

20 Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata. Multilingual Bibliographic Structure Project End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records

21 Proposed new architecture Work eng fre ger jpn Manif eng Manif eng Manif eng Manif eng Manif eng Manif engA o fre Notes Contents ++ Holding Subj sif Subj Classif eng fre ger jpn Authors eng fre ger jpn eng fre ger jpn eng fre ger jpn Translations (Language of work) Manif fre Holding

22 Language tagging of elements, particularly – Summaries (M21 520) – Subject headings Display in script preferred by the user if data is available Improve translated interfaces Show consolidated holdings as appropriate Important principles

23

24

25

26

27 Surfacing the “cream” Translations

28 The cream of the world’s cultural and knowledge heritage is shared by being translated WorldCat contains many rich cataloguing records for these translations Great works are translated GOAL: Data mine the really good records to improve clustering, presentation, authority records and linked data

29 Ιλιάδα The Iliad 紅樓夢 Dream of the Red Chamber Война и миръ War and Peace ঘরে বাইরে The Home and the World સત્યના પ્રયોગો અથવા આત્મકથા The Story of My Experiments with Truth [Gandhi autobiography] The Tale of Genji דער בעל-תשובה The Penitent زقاق المدق Midaq Alley

30 Leo Tolstoy: 32 languages Homer: 28 languages Rabindranath Tagore: 21 Isaac Bashevis Singer: 17 Naji ̄ b Maḥfu ̄ ẓ: 12 languages Cao Xueqin: 9 languages Mahatma Gandhi: 7 languages Murasaki Shikabu: 7 languages Translations

31 Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results – Titles without subtitles – Missing or different forms of uniform title – Inverted title – Different coding of original and translated information Improving work clustering Generated uniform title authority records will overcome most of these differences without needing to edit individual records

32 Addition of xR records to VIAF Before After

33 UNESCO Translation Database

34

35 XR VIAF Record VIAF ID for Author Translated title Translator

36

37

38

39 IFLA - Lyon, France 19 August 2014 VIAF Linked Data New Information

40 Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date:1982-1984 IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date:1982-1984 IsTranslationOf: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf:

41 # Original Work (in Chinese) http://worldcat.org/entity/work/id/1215997 a schema:CreativeWork; schema:creator ; # "Gao, Xingjian”http://viaf.org/viaf/102266649 schema:inLanguage "zh"; schema:name " 靈山 "@zh;. # Translated Work (in English) http://worldcat.org/entity/work/id/145209748> a schema:CreativeWork; schema:creator ; # "Gao, Xingjian“http://viaf.org/viaf/102266649 [new]:translator ; # "Lee, Mabel"http://viaf.org/viaf/81663420 schema:inLanguage "en"; schema:name "Soul Mountain"@en ; [new]:translationOfWork “http://worldcat.org/entity/work/id/1215997 Markup for the Semantic Web

42 Understanding information sharing across cultures What percentage of non-English works are translations of English works, and vice-versa? Which authors are translated the most? Which works have been translated into the most languages? Which countries translate the most English works, the most non-English works? Which countries translate a new work the fastest? Etc. http://www.oclc.org/research/activities/multilingual-bib-structure.html

43 Where are we now? Clustering Work clusters done; ongoing refinement GLIMIR clustering done for all [simple] text; – 103 million records have GLIMIR IDs Working on collected works Displays Working on VIAF expression displays Work level displays in WorldCat.org ++ Data Mining for translations

44 Explore. Share. Magnify. Janifer Gatenby EMEA Program Manager Metadata Janifer.gatenby@oclc.orgoclc.org


Download ppt "IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress,"

Similar presentations


Ads by Google