6 th Annual Hong Kong Innovative Users Group Meeting 8-9 December 2005, Hong Kong HKIUG’s Unicode Projects Untangling the Chaotic Codes Philip Wong City.

Slides:



Advertisements
Similar presentations
In the Home Stretch: final stages of RDA LC Committee on East Asian Libraries March 2013 Tom Yee LC Policy & Standards Division.
Advertisements

CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,
Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.
OCLC Online Computer Library Center Connexion Overview Session OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
Last revised: 8 April 2006 EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of Science and Technology Library
OCLC Online Computer Library Center Connexion Client 1.30 for Multiscripts Cataloging CJK User Group Meeting, Chicago April 2, 2005 David Whitehair and.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features INIS Training Seminar 7-11 October 2013, Vienna Domenico.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
Using a Vendor’s System to Streamline Book Selection and Ordering Thomas Hung University of Hong Kong Libraries 3 rd HKIUG Meeting.
BIBFLOW: An IMLS Project
1 7th Annual HKIUG Meeting HKUST Library HKLIS - A Synergy of Hong Kong Library Collections & Librarians' Achievements 7th Annual HKIUG Meeting.
Last revised: 8-Dec-2005 JURO : Creating the Journal Usage Report Online System Presented by Ki Tat LAM Head of Library Systems The Hong Kong University.
InnoFace InnoFace: Extra functions and interface for Innopac Library System – Fung Ping Shan Library experiment LO Tin-king 2nd Hong Kong Innovative Users.
Managing OPACs: approaches to the process of OPAC change and development in ECU Lisa Billingham Innopac Systems Librarian ECU.
1 Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library - Tsutomu SUZUKI Waseda University.
HKALL: Overview and Status Update Tony Ferguson University of Hong Kong Librarian Acting Director of IT in Learning.
7th Annual Hong Kong Innovative Users Group Meeting 11th & 12th December 2006.
City University of Hong Kong Chinese University of Hong Kong The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting,
香港中文名稱規範數據庫 Hong Kong Chinese Authority Name JULAC-HKCAN Samson Soong, Ph.D. Chair, JULAC Bibliographic Services Committee University Librarian, HKUST.
Last revised: Nov Library Collaboration and Digital Exploration Presented by Ki Tat LAM, Head of Library Systems and Edward Spodick, IT Manager.
Hong Kong Chinese Authority (Name) Project Latest developments CEAL 2002 Annual Meeting Washington, D.C. Maria Lau HKCAN Workgroup.
Millennium Statistics : Beta Testing Experience Presented by Edward So, Run Run Shaw Library City University of Hong Kong HKIUG, Dec. 2002CityU Library.
Firewalling Proxy Server for Innopac
Last revised: 10 December 2006 HKIUG Unicode Task Force and the EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Development of the next DDI Tools Catalog Stefan Kramer Research Data Management Librarian Cornell Institute for Social and Economic Research (CISER) 2nd.
East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands,
The world’s libraries. Connected. Batchload Process for Alberta Libraries Carol Ritzenthaler Customer Support OCLC July 2013.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
OCLC Online Computer Library Center MFHD Local Holdings Project Status (a.k.a. UL Migration) Myrtle Myers Product Manager, Holdings and Local Data.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
LI 804.  What MARC is  How MARC was developed  How MARC is used  The future of MARC  Our evaluation of MARC.
Cataloging 12.3 to 14.2 Seminar. Cataloging 2 -New check routines -Cataloging authorizations -Other innovations -Fix and expand routines -Floating keyboard.
University Library System, CUHK 香港中文大學圖書館系統 University Library System The Chinese University of Hong Kong Simple, Flexible and Informative - Personalised.
Library needs and workflows Diane Boehr Head of Cataloging National Library of Medicine, NIH, DHHS
CyberTools ® for Libraries User’s Group Meeting 2001 Tri-Chapter MLA Review of Past Five Months Technology Business Imminent Tasks Future Plans Closed.
OCLC Online Computer Library Center Annual Report: New Enterprises & Development News Marty Withrow, Director Product Development Division oclc.org.
1 Preparations for Implementing RDA in Ex Libris’ Products ALA Annual Conference | Anaheim, CA | 24 June 2012 Mike Dicus, Product Manager Ex Libris (USA),
The Future of Cataloging Codes and Systems: IME ICC, FRBR, and RDA by Dr. Barbara B. Tillett Chief, Cataloging Policy & Support Office Library of Congress.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Planning for Life after OCLC Passport for Cataloging An overview of the new OCLC cataloging service Revised April 2002.
Connexion Comparison Client or Browser? Fran Juergensmeyer Waukegan Public Library 2 nd Annual WILIUG Conference June 16, 2006 Cataloging from A (Authority)
Demonstration of HKCAN database Outline Database system overview Software characteristics Database status.
MARCIt records for e-journals project to implement MARCIt service McGill University Library Feb
Converting Millennium ILS Bibliographic records into Dublin- Core XML format for DSpace Alan Ng Hong Kong University Libraries PNC 2009 Annual Conference.
11 ALCTS RDA Forum American Library Association Annual Conference Anaheim, California, June 23, 2012 U.S. RDA Test Coordinating Committee Update Beacher.
The physical parts of a computer are called hardware.
ARABIC SCRIPT CATALOGUING at Georgetown University in Qatar Stefan Seeger MENA-IUG 5 th Annual Conference, Dubai 2010.
The Catalog of the Future: Integrating Electronic Resources By Dana M. Caudle Cataloging Librarian Auburn University Libraries
Sally McCallum Library of Congress
7-1 Holdings Session 7 Trends & Issues in MARC 21 Holdings CONSER Publication Patterns Initiative Publication history Current issues with MARC 21 Holdings.
Jeanne Piascik Principal Cataloger University of Central Florida Technical Services Member Group FLA 2014 Annual Conference.
The ___ is a global network of computer networks Internet.
1 ABCD as a digital library tool An introduction on the concept and implementation by Egbert de Smet Univ. of Antwerp.
Chinese MARC (Taiwan) and its bibliographic database Ching-Chen Anthony Mao (Fu Jen Catholic University) Ching-fen Frances Hsu (National Central Library)
7th Annual Hong Kong Innovative Users Group Meeting
The Koha Experience: An Academic Library Perspective
HKIUG Unicode Task Force and the EACC to Unicode Migration
Exploring IR Technologies
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Metadata Editor Introduction
Workshop on XML-Based Library Applications 5
IsoveraDL Performance Enhancements
Maintaining the integrity of e-book titles in CityU library catalogue
WP 4 - Revision of Natura 2000 dataflow
EACC to Unicode Migration
Geant4 Documentation Geant4 Workshop 4 October 2002 Dennis Wright
Presentation transcript:

6 th Annual Hong Kong Innovative Users Group Meeting 8-9 December 2005, Hong Kong HKIUG’s Unicode Projects Untangling the Chaotic Codes Philip Wong City University of Hong Kong Library K.T. Lam Hong Kong University of Science and Technology Library

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam2 Content Chaos in 2003 Collaborative effort at HKIUG HKIUG CJK Code Table TSVCC linking Towards native Unicode catalog

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam3 Chaos in 2003 Local libraries were using BIG5 Chinese character encoding system INNOPAC was in the transition towards Unicode support, with the development of the Millennium software Dual Web OPAC interfaces existed: Big5 and UTF-8 (Unicode) Some libraries (HKUST and CUHK) began releasing UTF-8 Web OPAC to their users

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam4 Chaos in 2003 [cont.] INNOPAC’s EACC to Unicode mapping is problematic: multiple mappings incorrect mappings missing codes duplicated EACC and CCCII mapping to different EACCs in BIG5 and UTF-8 interfaces

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam5 Chaos in 2003 [cont.] CJK support in Millennium software was buggy Millennium Editor – involuntarily replacing characters with preferred EACC Individual libraries communicated with the vendor not fruitful – fixes were in piece-meal fashion Some libraries conducted their own CJK / Unicode study with attempts to propose to the vendor how to tackle these problems – again without much progress HKUST (April 2003) City University of Hong Kong (July 2003)

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam6 Collaboration Effort at HKIUG June 2003 – HKIUG Standing Committee agreed that a joint proposal was essential for gaining acceptance from the vendor July 2003 – seminar organized by CUHK to solicit ideas and comments July 2003 – III-UTF-8 Working Group established, members consisted of catalogers and systems librarians from CITYU, CUHK, HKUST and HKU

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam7 Collaboration Effort at HKIUG [cont.] Sep 2003 – Working Group completed the study and submitted the proposal to the vendor together with a HKIUG version of the EACC to Unicode Mapping Table Oct 2003 – vendor accepted the proposal Dec 2003 – presentation of the work in 4 th Annual HKIUG Meeting Jan 2004 – HKUST representative was invited to vendor’s Headquarters to help resolve outstanding CJK issues

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam8 Collaboration Effort at HKIUG [cont.] Results of the HKIUG effort, by February 2004: Millennium Editor problem fixed HKIUG Code Table for CJK Characters adopted Began development of TSVCC Linking 25 February 2005 – established HKIUG Unicode Task Force to maintain the Unicode and TSVCC code tables and to assist the vendor on Unicode migration; members from CUHK, CITYU, HKUST and HKU.

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam9 Millennium Editor Problem EACC Unicode Mapping Table failed in round-trip crosswalk 历 (Simplified form of 曆 ) EACC-based INNOPAC Catalog Unicode-based Millennium Editor U+5386 历 27462A 历 (Simplified form of 歷 ) Incorrect! Case “li”

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam10 Millennium Editor Problem [cont.] Problem: EACC character in INNOPAC Catalog would be incorrectly replaced by 27462A when it was saved in Millennium Editor Fixed by suppressing Millennium Editor from converting (i.e. non-preferred code multi-mapping) to U+5386 when it was retrieved from the catalog for editing By using a one-to-one mapping table

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam11 Millennium Editor Problem [cont.] Side effect The affected character is displayed as braced-code, not as character, in the Editor

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam12

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam13 HKIUG CJK Code Table First released in September 2003; last revised in August 2005 Contains: EACC characters 7043 pure CCCII characters 160 multi-mapping linked cases 49 multi-mapping unlinked cases

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam14 HKIUG CJK Code Table [cont.] Mapping for EACC characters - follows LC as much as possible Does not contain CCCII characters that have EACC equivalent - sites adopting HKIUG CJK code table must convert these CCCII in their Catalog to the EACC equivalents Contains 7043 “Pure CCCII” that have no EACC equivalent - includes them to avoid too many missing characters

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam15 HKIUG CJK Code Table [cont.] Multiple mappings Linked case “ling” Unlinked case “li” HKIUG decides on the preferences

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam16 HKIUG CJK Code Table [cont.] Also available in XML format, conforming to LC’s code tables schema Implementation November 2003 – Pilot testing at HKUST February 2004 – CUHK July 2004 – PolyU October 2004 – CityU, HKU November 2004 – LU, HKBU March 2005 – HKIED December 2005 – HKAPA (scheduled)

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam17 TSVCC Linking TSVCC stands for “Traditional, Simplified and Variant Chinese Characters”. Example – “guo” 國 (U+570B) – Traditional form of “country” 国 (U+56FD) – Simplified form of “country” 囯 (U+56EF) – Variant form of “country” (used in Japanese) Example – “xi” 係 (U+4FC2) – Traditional form of “relationship” 繫 (U+7E6B) – Traditional form of “linking” 系 (U+7CFB) – Traditional form of “system”, simplified form of “relationship”, and simplified form of “linking” Why TSVCC?

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam18 TSVCC Linking [cont.] In EACC, traditional, simplified and variant characters can be linked by internal codes “gan” 乾 (21304C) linked to 干 (27304C ) “feng” 峰 (213B78) linked to 峯 (2D3B78 ) and 峄 (393B78) However, some multi-mapping cases remain unlinked “gan” 干 (27304C ) not linked to 干 (273C67) “li” 历 (274349) not linked to 历 (27462A)

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam19 TSVCC Linking [cont.] Consider the following multi-mapping case: Searching 历法 (27462A)(21472A) will not retrieve 曆法 (2D4349)(21472A) EACCUnicode 27462A 历 (Simplified form of 歷 ) U+5386 历 历 (Simplified form of 曆 )

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam20 TSVCC Linking [cont.] Native Unicode catalog – all internal linkings will be gone 乾 (U+4E7E), 干 (U+5E72) 峰 (U+5CF0), 峯 (U+5CEF), 峄 (U+5CC4) 历 (U+5386), 曆 (U+66C6), 歷 (U+6B77) How to maintain the linkings?

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam21 TSVCC Linking [cont.] In October 2004, HKIUG constructed the TSVCC Linking Tables and proposed to the vendor Table M – linking relationship is not purely from EACC 曆 | 历 | 2D4349 暦 | 21462A 歷 | 27462A 历 | 4B462A 歴 | #U+5386 multi-mapped 27462A, Table V – linking relationship is purely from EACC 21306C 仇 | 2D306C 讎 | 33306C 讐 | 4B306C 雠

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam22

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam23

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam24

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam25 TSVCC Linking [cont.] Implementation October 2004 – created the TSVCC Tables; installed on HKUST’s testing database November 2004 – endorsed by HKIUG, first release November 2004 – TSVCC linking capability was enabled at CityU and HKU (using vendor’s original tables; i.e. not HKIUG’s version) Lingnan uninstalled after a short period of trial due to high recall rate August 2005 – HKIUG second release November 2005 – CityU installed second release

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam26 TSVCC Linking [cont.] HKALL has also enabled the TSVCC Linking feature – but using hybrid EACC/Unicode tables (using normalized EACC values to maintain default ordering for CJK) Drawback: Unicode is a much bigger set than EACC; and again, need to maintain the legacy EACC mappings Vendor should put in programming effort to support Unicode Version of TSVCC tables.

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam27 TSVCC Linking [cont.] Results of implementation Improvement in searching Trade-off: higher recall, lower precision

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam28 TSVCC Linking [cont.] Results: improvement in searching Search 历法 “Li fa”

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam29 TSVCC Linking [cont.] Results: higher recall, lower precision Search 甦齋 “Suzhai” TSVCC on TSVCC off irrelevant relevant

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam30 TSVCC Linking [cont.] Problems found during testing and implementation They are not the problems of TSVCC, but are software problems which require software enhancement from vendor

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam31 TSVCC Linking [cont.] Problem 1 Incorrect “duplicate headings error” in authority heading verification Duplicate authority RECORDS > FIELD: |a何迺欣 INDEXED AS AUTHOR: 何乃欣 MESSAGE: DUPLICATE AUTHORITY FROM: a x 何乃欣 and 何迺欣 are actually two different authors 乃 {21303A} and 迺 {33303A} are linked EACC but this problem does not happen in non-TSVCC indexing

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam32 TSVCC Linking [cont.] Problem 2 Interfiling of indexed characters becomes worse in TSVCC when recall is higher. Ideal is to separate indexing and sorting. U+5386 历 U+5386 U+66C6 曆 U+66C6 U+6B77 歷 U+6B77

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam33 Towards Native Unicode Catalog How far are we? LC has issued MARC-8 to Unicode mapping tables OCLC Connexion client 1.5 begins to support MARC record import and export in UTF-8 encoding Intensive discussion of Unicode implementation in MARC at UNICODE- MARC Discussion List ( ) Most ILS vendors claim to support Unicode

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam34 Towards Native Unicode Catalog [cont.] INNOPAC is almost there, but not fully ready yet. There is option for sites to convert their catalogs to Unicode (e.g. HKALL has done so in Oct 2004) It was noted from the HKALL catalog that the implementation of Unicode is only partially completed - there are still EACC dependency in the data store and indexes INNOPAC/Millennium has not yet supported exporting and importing of records in UTF-8 CJK searching and sorting require more work

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam35 Towards Native Unicode Catalog [cont.] Bibliographic data interchange involves multiple partners. OCLC Library Catalog 1 EACC/ Unicode EACC Step 3: or 21506E 系 or 系 or 系 or (Traditional 系 or simplified of 係 or 繫 )? Round-trip Crosswalk Failure Library Catalog 2 Unicode Step 2: U+7CFB 系 Step 1: 系 (simplified of 繫 )

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam36 Towards Native Unicode Catalog [cont.] The failure of round-trip crosswalk between systems will continue to be a problem until all systems are capable of importing and exporting data in Unicode and no one are interchanging MARC records in non-Unicode encoding

6th HKIUG Meeting, Dec , Lingnan University. HKIUG's Unicode Project, Philip Wong and KT Lam37 Thank You! Contact Information Philip Wong K.T. Lam