IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress,


Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.

A short history of the evolution of the library catalogue record Gordon Dunsire 2009.
GL8 New Orleans December 4-5, 2006 INIST-CNRS (France) From SIGLE to OpenSIGLE and Beyond From SIGLE to OpenSIGLE and Beyond An In-Depth Look at Resource.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Implementing Effective Metadata Brian Lavoie Office of Research OCLC Online Computer Library Center, Inc. Intranets 99, San Francisco April 27, 1999.
OCLC Research OCLC Online Computer Library Center 2006 WebWise Los Angeles, CA 17 February 2006 FictionFinder: Don Quixote to Graphic Novels Diane Vizine-Goetz.
OCLC Research OCLC Online Computer Library Center ALA Midwinter 2006 San Antonio, TX OCLC FictionFinder & OCLC DeweyBrowser Eric Childress OCLC Research.
FRBR and Cataloguing Rules: Impact on IFLAs Statement of Principles and AACR/RDA by Barbara B. Tillett FRBR Workshop Dublin, Ohio May 4, 2005.
OCLC Online Computer Library Center Terminology Services Diane Vizine-Goetz OCLC Research.
Beyond the Record : OCLC & the Future of MARC Ted Fons Director WorldCat Global Metadata Network CCS Forum ALA - Chicago July 11, 2009 CDF MARC
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
VIAF Uniform Titles & Multilingual Bibliographic Structure Project Karen Smith-Yoshimura and Janifer Gatenby VIAF Council Conference Call
The world’s libraries. Connected. VIAF and ISNI Interoperability Janifer Gatenby EMEA Program Manager Metadata OCLC VIAF Council Meeting Singapore
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
AACR3: Resource Description and Access Presented by Dr. Barbara Tillett Chief, CPSO Library of Congress 2004.
Dongmei Cao 10/22/2008 class blog:
The world’s libraries. Connected. Reintroducing GLIMIR Plenary Session: WorldCat Local Panel Music OCLC Users Group Annual Meeting San Jose, California.
RDA AND AUTHORITY CONTROL Name: Hester Marais Job Title: Authority Describer Tel: Your institution's logo.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
OCLC PICA A general overview Janifer Gatenby Sébastien Vellay SELL Presentation Madrid 16/03.
NUKAT, Warsaw 23 January 2008 Janifer Gatenby Research Integration and Standards OCLC Hobart 6 th November 2009 Current status and future of the CBS system.
OCLC Online Computer Library Center OCLC Research Eric Childress OCLC Research SHARES Meeting NYU New York, NY
Session 4B – User Experience (The Catalogue and You) New display models of bibliographic data and resources: cataloguing/resource description and search.
Libraries Australia Cataloguing Parallel Session Bemal Rajapatirana / Rob Walls.
Multilingual Issues in the Representation of International Bibliographic Standards for the Semantic Web Gordon Dunsire Independent Consultant; Chair of.
Bibliotek.dk in Google Kirsten Larsen Head of Department for bibliotek.dk and DanBib Dansk BiblioteksCenter Danish Bibliographic Centre.
Federal Department of Home Affairs FDHA Swiss Federal Office of Culture FOC Swiss National Library SNL Multilingual Access to Subjects (MACS) Patrice Landry.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Bibliographic Framework and Future Scenarios for RDA Records Dr. Barbara B. Tillett Chief, Policy & Standards Division, Library of Congress & Chair, Joint.
VIAF (Virtual International Authority File) Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web Dr. Barbara B.
Future of Cataloging RDA and other innovations pt.1.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
OCLC Research OCLC Online Computer Library Center Members Council Research and New Technologies Interest Group Québec, Québec, Canada 6 February 2007 FictionFinder:
@LorcanD Lorcan Dempsey, OCLC 11 October 2013 ARL Fall Forum: Mobilizing the research enterprise #ARLforum13 SHARE : Discovery:Focus on papers.
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
OCLC Research OCLC Online Computer Library Center Research & New Technologies Interest Group 24 October 2005 DeweyBrowser & Curiouser Diane Vizine-Goetz.
IME ICC5 Report Working Group 3: Seriality Working Group Leader: Elise Roberts Co-leader: Martha de Waal Working Group Recorder: Marion Chibambo IME ICC5,
OCLC Research: Selected projects Eric Childress Larry Olszewski Presentation for Dpto. Biblioteconomía y Documentación Universidad Carlos III de Madrid.
Implementation scenarios, encoding structures and display Rob Walls Director Database Services Libraries Australia.
A Future for the Library Catalogue T. Hickey ACRL/DVC Bryn Mawr 3 November 2006.
The Future of Cataloging Codes and Systems: IME ICC, FRBR, and RDA by Dr. Barbara B. Tillett Chief, Cataloging Policy & Support Office Library of Congress.
OCLC Research Webinar 16 Sept Timothy J. Dickey, Ph.D. Post-Doctoral Researcher OCLC Research Global Book Publication: Books as an Expression of.
ELAG : Library Systems Seminar – 26 Roma – Biblioteca Nazionale Centrale, Aprile 2002 THE SEMANTIC WEB AND LIBRARIES.
What users want & how FRBR can help Diane Vizine-Goetz Research Scientist OCLC Research.
Libraries in the History of Print Culture 10 Sept Timothy J. Dickey, Ph.D. Post-Doctoral Researcher OCLC Research Mining Global Library Records for.
The physical parts of a computer are called hardware.
FRBR: Cataloging’s New Frontier Emily Dust Nimsakont Nebraska Library Commission NCompass Live December 15, 2010 Photo credit:
LITA National Forum 2015 Data Designed for Discovery Roy Tennant OCLC Research.
Technical Advances for Innovation in Cultural Heritage Institutions (TAI CHI) Webinar Series 5 November 2015 How You Can Make the Transition from MARC.
San Juan, Puerto Rico (21 October 2015) RDA, Linked Data, BIBFRAME Eric Childress Consulting Project Manager OCLC Membership & Research.
Sally McCallum Library of Congress
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
OCLC Asia Pacific Regional Council Conference Dec 2015 Moving towards True Multilingualism: Leveraging Global Cooperation through WorldCat Karen Smith-Yoshimura.
Metadata Services for Publishers Bruce A. Miller Publisher Services Executive April 27, 2010.
Current initiatives in developing library linked data Gordon Dunsire Presented at the Cataloguing and Indexing Group Scotland seminar “Linked data and.
Thomas Hickey Chief Scientist, OCLC Research 2015 August VIAF Council State of VIAF VI AF.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Challenges of Multilingualism
Linked Data—Bringing the World Closer Together
Using the Semantic Web to Improve Knowledge of Translations
A Future for the Library Catalogue
RDA in a non-MARC environment
Onboarding Webinar 13 April 2019 Presented by and.
Amplifying Metadata as Entities to Support Multilingualism
Taking Advantage of Multilingualism Support in Wikidata
ALA Midwinter 2006 San Antonio, TX
OCLC Research Works in Progress Webinar
Presentation transcript:

IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingualism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz

WorldCat Today Resources in nearly all languages Contributed by more than 20,000 libraries worldwide More than half the database is for works not in English

Bibliographic Records – Hybrid records – Parallel records Clustered at Work level (FRBR) WorldCat Today

Existing Architecture Authors Subj Classif Subj Classif Subj Classif Holding Holdings Bibliographic record Work cluster Content cluster Manifes tation cluster

Complementary Initiatives Work Level Record GLIMIR Manifestation & Content Clusters GLIMIR Manifestation & Content Clusters Multi-lingual Bibliographic Structure

Objective: Work Level Record Create a consolidated metadata summary for the content of a work

Work Level Record Coming Q1 2015

GLIMIR: Objective Create better work presentations

The Content Cluster – Enables better work record displays by reducing the number of lines that display for large works – Enables a choice of format and presents the formats that could be acceptable substitutes – Consolidates holdings for identical content The Manifestation Cluster is important – Consolidates holdings at manifestation level – In the short term allows the record catalogued in the language of the interface to be chosen for display – Reduces apparent duplication – Allows a more accurate count of the number of manifestations in WorldCat (as opposed to the number of records) GLIMIR Users like  Cataloguers & scholars like 

Manifestation Clustering So far 103 million records processed (about 30%)

Manifestation Cluster Opened

SRU Search: Loti Pêcheur d’islande (Work ID ) RecordsHoldings Work18148 Content14143 Manifestation7115

Objective: Improve displays; surface translations Multilingual Bibliographic Structure Project

Creates true multi-lingual displays – At work and manifestation levels – Using all available data instead of “most appropriate record” – Generates data Corrects many of the 28 million records coded “und” Better control and linking of translations Input to refinement of work clusters Smarter data storage Multilingual Bibliographic Structure Project

Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why “Most appropriate” questioned

Which record is better to present to a German speaker?

Incomplete Swedish Record

Hybrid record

Build the display from all available data Most appropriate display

Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata. Multilingual Bibliographic Structure Project End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records

Proposed new architecture Work eng fre ger jpn Manif eng Manif eng Manif eng Manif eng Manif eng Manif engA o fre Notes Contents ++ Holding Subj sif Subj Classif eng fre ger jpn Authors eng fre ger jpn eng fre ger jpn eng fre ger jpn Translations (Language of work) Manif fre Holding

Language tagging of elements, particularly – Summaries (M21 520) – Subject headings Display in script preferred by the user if data is available Improve translated interfaces Show consolidated holdings as appropriate Important principles

Surfacing the “cream” Translations

The cream of the world’s cultural and knowledge heritage is shared by being translated WorldCat contains many rich cataloguing records for these translations Great works are translated GOAL: Data mine the really good records to improve clustering, presentation, authority records and linked data

Ιλιάδα The Iliad 紅樓夢 Dream of the Red Chamber Война и миръ War and Peace ঘরে বাইরে The Home and the World સત્યના પ્રયોગો અથવા આત્મકથા The Story of My Experiments with Truth [Gandhi autobiography] The Tale of Genji דער בעל-תשובה The Penitent زقاق المدق Midaq Alley

Leo Tolstoy: 32 languages Homer: 28 languages Rabindranath Tagore: 21 Isaac Bashevis Singer: 17 Naji ̄ b Maḥfu ̄ ẓ: 12 languages Cao Xueqin: 9 languages Mahatma Gandhi: 7 languages Murasaki Shikabu: 7 languages Translations

Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results – Titles without subtitles – Missing or different forms of uniform title – Inverted title – Different coding of original and translated information Improving work clustering Generated uniform title authority records will overcome most of these differences without needing to edit individual records

Addition of xR records to VIAF Before After

UNESCO Translation Database

XR VIAF Record VIAF ID for Author Translated title Translator

IFLA - Lyon, France 19 August 2014 VIAF Linked Data New Information

Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:Anthony C. Yu Date:1977 IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date: IsTranslationOf: Title:Journey to the West Language:English Translator:W. J. F. Jenner Date: IsTranslationOf: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:西遊記 Language:Chinese Author:吳承恩 Created:1592 HasTranslation: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title:Ta ̂ y du ký bình khảo Language:Vietnamese Translator:Phan Qua ̂ n Date:1980 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title: 西遊記 Language:Japanese Translator: 中野美代子 Date:1986 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf: Title:Monkeys Pilgerfahrt Language:German Translator:Georgette Boner Date:1983 IsTranslationOf:

# Original Work (in Chinese) a schema:CreativeWork; schema:creator ; # "Gao, Xingjian” schema:inLanguage "zh"; schema:name " 靈山 # Translated Work (in English) a schema:CreativeWork; schema:creator ; # "Gao, Xingjian“ [new]:translator ; # "Lee, Mabel" schema:inLanguage "en"; schema:name "Soul ; [new]:translationOfWork “ Markup for the Semantic Web

Understanding information sharing across cultures What percentage of non-English works are translations of English works, and vice-versa? Which authors are translated the most? Which works have been translated into the most languages? Which countries translate the most English works, the most non-English works? Which countries translate a new work the fastest? Etc.

Where are we now? Clustering Work clusters done; ongoing refinement GLIMIR clustering done for all [simple] text; – 103 million records have GLIMIR IDs Working on collected works Displays Working on VIAF expression displays Work level displays in WorldCat.org ++ Data Mining for translations

Explore. Share. Magnify. Janifer Gatenby EMEA Program Manager Metadata