City University of Hong Kong Chinese University of Hong Kong The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting,

Slides:



Advertisements
Similar presentations
CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,
Advertisements

Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.
OCLC Online Computer Library Center Connexion Overview Session OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
Last revised: 8 April 2006 EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of Science and Technology Library
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
OCLC Online Computer Library Center Connexion Client 1.30 for Multiscripts Cataloging CJK User Group Meeting, Chicago April 2, 2005 David Whitehair and.
University Library System, CUHK Chinese Name Authority Portal - One Stop Search 1 Chinese Name Authority Portal 香港中文大學圖書館系統 University Library System The.
CatWork: Practical Experiences in Automation for Retrospective Conversion, Reclassification and Backlog Reduction LO Tin King The University Of Hong Kong.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
香港中文名稱規範數據庫 Hong Kong Chinese Authority (Name) Database (HKCAN) Latest Development Presented by the HKCAN Workgroup CEAL Annual Meeting in Chicago (March.
香港中文大學圖書館系統 University Library System The Chinese University of Hong Kong Power up your browser: an example of using LibX at CUHK Libraries Kevin Leung.
HKUL & CityUL HKIUG – Dec03 InnReach Project in Hong Kong David Palmer Hong Kong University Libraries University of Hong Kong Eva Wong Run Run Shaw Library.
Millennium Cataloging in Release 2005 Georgia Fujikawa Manager, Training Programs.
InnoFace InnoFace: Extra functions and interface for Innopac Library System – Fung Ping Shan Library experiment LO Tin-king 2nd Hong Kong Innovative Users.
1 Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library - Tsutomu SUZUKI Waseda University.
HKALL: Overview and Status Update Tony Ferguson University of Hong Kong Librarian Acting Director of IT in Learning.
1 Maintaining the integrity of e-book titles in CityU library catalogue 7 th HKIUG, 12 Dec 2006, HKUST Joanna Pong, Philip Wong Run Run Shaw Library City.
香港中文名稱規範數據庫 Hong Kong Chinese Authority Name JULAC-HKCAN Samson Soong, Ph.D. Chair, JULAC Bibliographic Services Committee University Librarian, HKUST.
6 th Annual Hong Kong Innovative Users Group Meeting 8-9 December 2005, Hong Kong HKIUG’s Unicode Projects Untangling the Chaotic Codes Philip Wong City.
Hong Kong Chinese Authority (Name) Project Latest developments CEAL 2002 Annual Meeting Washington, D.C. Maria Lau HKCAN Workgroup.
Millennium Statistics : Beta Testing Experience Presented by Edward So, Run Run Shaw Library City University of Hong Kong HKIUG, Dec. 2002CityU Library.
Firewalling Proxy Server for Innopac
HKALL Hong Kong Academic Library Link (HKALL) 香港高校圖書聯網 ( 港書網 ) An Accelerated Resource Sharing Project in Hong Kong.
The Chinese University of Hong Kong Digital Library Initiatives – CUHK Library For the 3 rd HKIUG Annual Meeting December 10, 2002 香港中文大學圖書館系統 University.
Use of Smart Card and Patron API in CUHK Libraries
Last revised: 10 December 2006 HKIUG Unicode Task Force and the EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
香港中文名稱規範數據庫 Hong Kong Chinese Authority Name JULAC-HKCAN International Seminar on Bibliographic Services 28th August, 2006 Cho Yiu Conference Hall, CUHK.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
WILIUG 1. June 2, 2005 Using Review Files with Millennium Rapid & Global Update jenny schmidt SWITCH Library Consortium.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands,
Address Refer to Slide 2 for instructions on how to view the full-screen slideshow.Slide 2.
Electronic Serials Invoicing with Innovative Presenter: Kathy Peters Accounts Payable, Acquisitions University of Missouri.
Vended Authority Control --Procedures and issues.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
OCLC Online Computer Library Center MFHD Local Holdings Project Status (a.k.a. UL Migration) Myrtle Myers Product Manager, Holdings and Local Data.
Link Resolvers: An Introduction for Reference Librarians Doris Munson Systems/Reference Librarian Eastern Washington University Innovative.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
End HomeWelcome! The Software Development Process.
Cataloging 12.3 to 14.2 Seminar. Cataloging 2 -New check routines -Cataloging authorizations -Other innovations -Fix and expand routines -Floating keyboard.
University Library System, CUHK 香港中文大學圖書館系統 University Library System The Chinese University of Hong Kong Simple, Flexible and Informative - Personalised.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure.
OCLC Online Computer Library Center Annual Report: New Enterprises & Development News Marty Withrow, Director Product Development Division oclc.org.
Triggers and Stored Procedures in DB 1. Objectives Learn what triggers and stored procedures are Learn the benefits of using them Learn how DB2 implements.
Problem Statement: Users can get too busy at work or at home to check the current weather condition for sever weather. Many of the free weather software.
香港大學圖書館 Upstream Content Management in an ILS Downstream Integrated Access, Authentication, Portals & Statistics - Dr. Ku Kam-ming - David Palmer.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Planning for Life after OCLC Passport for Cataloging An overview of the new OCLC cataloging service Revised April 2002.
Milstats IUG 2008 Milstats 102: Beyond the Basics with Milstats Innovative Users Group 2008 Annual Conference Washington, D.C. Corey Seeman Kresge.
Enhancing Forms with OLE Fields, Hyperlinks, and Subforms – Project 5.
© ABB Group November 12, 2015 | Slide 1 ICV Implementation in Region ERP- Status update March 2011 & Plan for Go-Live REMSC, 2011.
Demonstration of HKCAN database Outline Database system overview Software characteristics Database status.
New Millennium Enhancements SEE HANDOUT. Release 2002 Improved record editor Easier to navigate to NEXT and PREVIOUS records (Ctrl [ and Ctrl ]) More.
I. Understanding Record Loading and EDIS II. Database Statistics & Top 10 Search III. Problem with merging records IV. Pseudo Tag (Special 035 Tag ) V.
The Hashemite University Computer Engineering Department
Chapter – 8 Software Tools.
Chinese MARC (Taiwan) and its bibliographic database Ching-Chen Anthony Mao (Fu Jen Catholic University) Ching-fen Frances Hsu (National Central Library)
Millennium Create Lists in Action
HKIUG Unicode Task Force and the EACC to Unicode Migration
RCM Turbo SQL Version.
Workshop on XML-Based Library Applications 5
Maintaining the integrity of e-book titles in CityU library catalogue
EACC to Unicode Migration
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
‘Splitting’ the MUSIC format
Presentation transcript:

City University of Hong Kong Chinese University of Hong Kong The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 1 The HKIUG Unicode Project Fourth Annual HKIUG Meeting 8-9 Dec, 2003 Lingnan University, Hong Kong Philip WONG, CityU Library HO Yee Ip, CUHK Library

City University of Hong Kong Chinese University of Hong Kong The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 2 Overview Part I Background Problems Objective & Methodology Procedures Deliverables and Actions Part II Follow-up Are the problems solved Future work

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 3 The HKIUG Unicode Project - Part I by Philip Wong City University of Hong Kong Library December 8, 2003

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 4 Background There are different character sets that support CJK. Big5 is common in HK and Taiwan, GB is used in Mainland. CCCII and EACC are mainly used in libraries. EACC is LC standard Unicode is widely supported in OS, applications and W3C. No. of CJK char Code space ReleasedSupportProvide linking feature BIG513,05314, TraditionalNo GB , million 2000Trad. & Simplified No CCCII75,684830, Trad. & Simplified Yes EACC15,728830, Trad. & Simplified Yes Unicode82, million 2000 (v. 3)Trad. & Simplified No Reference: KT Lam, “Overview of Chinese Character Encoding”, character sets

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 5 Background Different character sets assigned different code points to the same character (more precisely, the same glyth) Character Set Code Point for 余 (yu) BIG5A745 GB CCCII EACC Unicode4F59 code points

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 6 Background Innovative supports CJK by storing the CJK internally in EACC and CCCII The internal code is not Unicode based | |aYu, Guangzhong,|d | |aYu Guangzhong shi xuan | /$1|a 余光中,|d | /$1|a 余光中詩選 [edit mode ctrl-w] | |aYu, Guangzhong,|d | |aYu Guangzhong shi xuan | /$1|a{213131}{213272}{213034},|d | /$1 |a{213131}{213272}{213034}{21585c}{215c4f} internal codes

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 7 Background Mapping table is required to convert internal codes to and from client encodings Once a good solution, but also created many problems. Many issues have been raised and discussed over the years Seminar on Chinese Information Processing in Libraries, HKUST Jan 1998 Seminar on Chinese Information Processing in Libraries Good discussion list: LIB-CHINESE ListservLIB-CHINESE mapping table InterfaceClient encoding code Internal code Telnet Big5 WebPAC Big5 Big5EACC/CCCII Millennium WebPAC UTF-8 UTF-8EACC/CCCII

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 8 Problems Problem 1 Multiple mapping of internal codes to one client code The code searched for or input to may not be the one desired Order of mappings may be different among local sites, thus inconsistent results in Z39.50 searching In III UTF-8 table, there are 1150 multiple mapping cases (2232 characters), including EACC and CCCII, some with high usage frequency. e.g. 台 (U+53F0), 漢 (U+6F22) Multiple mapping of 台 (tai) in UTF-8 EACC/CCCIIUnicodeMeaning 283b7d53F0 simplified form of the tai in “table” 檯 27605d53F0 simplified form of the tai in “typhoon” 颱 F0“tai” in its proper form 27542b53F0 simplified form of the tai in “Taiwan” 臺 multiple mapping

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 9 Problems E.g. in UTF (CCCII)U+65E6 旦 (dan) 27565A (EACC)U+65E6 旦 Problem 2 In multi-mapping cases, there may be overlapping use of EACC and CCCII Overlapping introduces more multiple mappings Create workload when exchanging records with international bibliographic services which only accept EACC overlapping eacc & cccii E.g. in Big (CCCII)A745 余 (yu) (EACC)A745 余

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 10 Problems Problem 3 III mapping table contains other problems In UTF-8 (Release 2002 Phase 3) errors 27615F is mapped to U+53CB 友, it should be U+53D1 发 missing cases 212F30 for U+3007 〇 is missing wrong types (U+53F0; 台 ) is typed as non-EACC, it should be EACC errors & missing

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 11 Problems Analysis done by local sites on UTF-8 mapping between April and June 2003 Questions: Can preferences be selected by local sites for multiple mappings? Can non-EACC codes be abandoned, those with EACC equivalents be converted to EACC in database? Can correct type of EACC/CCCII be re-assigned based on standard? analysis of UTF-8 Total entries: 23,669 (Rel Phase 3) According to IIIStudied by local sites * by UST # by CityU EACC15,290 (65%)15,665 (66%)* multi-mapping linked: 224 multi-mapping unlinked: 47 Non-EACC7,954 (34%)8,004 (33%)* 954 have EACC equivalents “may be invalid internal code” 425 (1%)EACC 188# Non-EACC 237

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 12 Problems Problem 4 What triggered the HKIUG Unicode Project is the inconsistent software mapping between Big5 and UTF-8 in multiple mapping cases: Big5 client – mapped to the first entry UTF-8 client – mapped to the last entry software inconsistency

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 13 Problems Searching 才 (cai) in WebPAC Big5 (or Telnet Big5) Mapped to the first Internal Big5 213f7b A47E 28736d A47E software inconsistency (cont)

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 14 Problems Searching 才 (cai) in WebPAC UTF-8 (or Millennium) Mapped to the last Internal UTF-8 213f7b 624D 28736d 624D software inconsistency (cont)

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 15 Objective & Methodology A seminar was organized by CUHK in July A HKIUG Working Group on Unicode Project was formed. Members: CUHK, CityU, HKU, HKUST Objective Solve software inconsistency between Big5 and UTF-8 Decide on One-to-one mapping or Many-to-one mapping Decide on Pure EACC or EACC and CCCII Clean up errors, wrong types and missing cases Prepare to transfer to Unicode based database

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 16 Objective & Methodology The working group further decided: Not to fix Big5 table (small character set, support only traditional Chinese, more multiple mappings, …, etc.) Propose a new UTF-8 mapping table to Innovative For EACC mapping, follow LC standard Allow multiple mappings of EACC; for unlinked cases, decide on the preferences For multiple mappings of EACC and CCCII, remove the CCCII Covert CCCII in database to EACC equivalents Avoid missing characters, include pure CCCII (though low percentage in database) (cont)

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 17 Procedures diac.utf8.hkiug created diac.utf8.hkiug diac.utf8 LC EACC EACC/CCCII Subtracted 66 Substitutes for Missing (U+3013) EACC 7044 pure CCCII + Remapped 287 PUA Selected preferences in multi-mapping linked and unlinked cases Corrected LC mappings prepared list for CCCII to EACC data conversion Subtracted 955 with EACC equivalent EACC merged 7999 CCCII extracted

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 18 Procedures source from LC Merged tables from LC's EACC to UCS/Unicode Mappings

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 19 Procedures Included pure CCCII from UTF-8 table (Rel 2002 Phase 3) CCCII with no EACC equivalents (pure CCCII) e.g 坓 22483E 洣 7,044Added to new table CCCII with EACC equivalents e.g (CCCII) 余 (EACC) 余 955Excluded from new table. Sent to III for data conversion source from diac.utf8

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 20 Procedures re-mapped PUA Re-mapped 297 Private User Area (PUA) to suggested alternates

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 21 Procedures Selected preference in multiple mapping EACC Multiple mapping Example# of cases Enhanced indexing? Labeled as Preference Linked same lower order bytes 4B3178 倩 倩 160 (320 char) Yes"multi- mapping linked" not matter Unlinked different lower order bytes 283B7D 台 27605D 台 台 27542B 台 49 (108 char) No"multi- mapping unlinked" selected case by case (based on HKUST study on word frequency & meaning) selected preference

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 22 Procedures Linked cases: HKIUG preference indicated selected preference (cont) Selected preference in EACC multiple mapping linked

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 23 Procedures Unlinked cases: HKIUG preference indicated selected preference (cont) Selected preference in EACC multiple mapping unlinked

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 24 Procedures Updated LC mappings Referenced from other sources Unihan OCLC USMARC Character Set for Chinese, Japanese, Korean (printed) Examples: 273C67LC mapped to U+E9D8 Remapped to U+5E72 ( 干 ) 4B3C2bLC mapped to U+E9C7 Remapped to U+67C3 ( 柃 ) updated LC mapping

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 25 Procedures CCCII with EACC Equivalents - for data conversion CCCII EACC list for conversion Prepared list for data conversion

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 26 Deliverables and Actions Deliverables to Innovative 1.diac.utf8.hkiug - HKIUG version of UTF-8 mapping table EACC 15,673 Pure CCCII 7,044 Total 22,717 2.hasEACC.txt - CCCII with EACC equivalents Final Report - Hong Kong Innovative Users Group (HKIUG) III-UTF8 Working Group Report Actions for Innovative 1.Endorse and install diac.utf8.hkiug 2.Replace CCCII listed in hasEACC.txt with their EACC equivalents in the database Note: local sites have the choice to implement the above actions or not (e.g. while adopting the new table, CUHK chose to run their own data conversion )

City University of Hong Kong Library 香港城市大學圖書館 The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 27 The HKIUG Unicode Project - End of Part I

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 28 The HKIUG Unicode Project - Part II by Ho Yee Ip CUHK University Library Systems December 8, 2003

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 29 Are the problems solved Resolve Big5 and UTF8 software inconsistency?  Yes (if abandon Big5 interfaces) Use the same preferred mappings among local sites?  Yes (if all sites adopt the new table) Able to search the desired code in multiple mapping?  Yes (if added entries are created) No overlapping of EACC and CCCII in multiple mapping?  Yes Clear up all errors and missing cases?  No (no-going job) Switch 100% to Millennium?  No (unfortunately, 2002 Phase 3 created more problems …)

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 30 Are the problems solved New problems in Release 2002 Phase 3 In Millennium Edit, implicitly convert non preferred entries to the preferred entry (may be an old problem in Phase 2) Worse, this “preferred” entry may not be the HKIUG preferred one. It is always mapped to the 2nd entry, which is wrong for multiple mappings > 2 Testing 1. in Millennium Cataloguing, input 台 in braced code {283B7D} 2. save record 3. check in telnet edit mode (Crt-W): still {283B7D} 4. re-save record in Millennium with no further editing 5. re-check in telnet: become {27542b} Note: Global update or amending attached records will not invoke this converting Millennium not yet ready for CJK editing! new problem

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 31 Are the problems solved Report from sites who have installed the new UTF-8 mapping table and run the data conversion successful? failed? unexpected outcome? installed sites

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 32 Follow-up Continue to clean up and supplement the mapping table Recommend updates and changes of EACC mapping to LC and III There are 169 difference mappings between III and LC. HKIUG followed LC Consider this case  III choice: 2D552EU+82FA 苺  LC choice: 2D552EU+8393 莓  Obviously different Consult: USMARC character set for Chinese, Japanese, Korean. Washington, D.C. : Library of Congress,  the glyth of 2D552E is 苺 (the same as III) Is III right or LC right? Others:  232D42, 396B33, 23355C mapping table

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 33 Follow-up Other differences between LC and III 232D42  III choice:232D42U+8842 衂  LC choice:232D42U+4610 ( 2 dots)  minor variation  US MARC (printed): 232D42 衂 (same as III) 396B33  III choice:396B33U+524F 剏  LC choice:396B33U+5259 剙 (2 dots)  minor variation  US MARC (printed): 396B33 剏 (same as III) 23355C  III choice:23355CU+8C63 豣  LC choice:23355CU+86C3 蛃  Obviously different  US MARC (printed): 23355C 豣 (same as III) mapping table (cont)

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 34 Follow-up Continue to clean up and supplement the mapping table Supplement diac.utf8.hkiug with additional CCCII  source: Unihan database file latest data ( e.g. ftp://ftp.unicode.org/Public/4.0-Update1/Unihan d3b.zip)ftp://ftp.unicode.org/Public/4.0-Update1/Unihan d3b.zip Amend diac.utf8.hkiug when LC update its code standard  source: LC MARC 21 code standard ( mapping table (cont)

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 35 Follow-up Change of cataloguing practice Provide added entries for unlinked multi-mapping codes Source data may not be the preferred code (by meaning) Transcription should be faithful to the source Added entries enhance retrieval e.g. 历 U+5386 历 {274349} 曆 {214349} 历 {27462A}preferred 歷 {21462A} Source: 万年历 added entries

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 36 Follow-up Source: 万年历 历 {274349} 曆 {214349} 历 {27462A}preferred 歷 {21462A} Action: About 29 cases out of the 49 unlinked cases need attention Data InputData storedRetrieval by glyphsHit? Input the non preferred one in braced format: 万年 {274349} {274F22} {213C65} {274349} 萬年曆 (i.e. by traditional glyphs: {214F22}{213C65} {214349}) Yes Create the added entry by inputting the glyphs: 万年历 {274F22} {213C65} {27462A} 万年历 (i.e. by simplified glyphs: {274F22}{213C65} {27462A}) Yes added entries (cont)

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 37 Follow-up Since Big5 mapping table is not fixed, cannot use Telnet Big5 mode any more; explore software: AnzioWin, putty In Telnet mode, INNOPAC UTF-8 port cannot support full screen editing, only line editing is feasible staff mode CJK display corrupted in full screen editing

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 38 Follow-up For some local sites, e.g. CUHK, AnzioWin is used. When AnzioWin is set to CCCII mode, its mapping table CCCII.UNI can be used for Unicode mapping. Deficiency: CCCII.UNI is one-to-one, non preferred entries cannot be included, e.g., # D1 # not preferred 274C7B 53D1 Better to use Innopac UTF-8 port when it is ready for editing staff mode (cont)

University Library System, CUHK The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 39 Future To migrate to pure Unicode environment…. Abandoning EACC/CCCII will lose the linking of traditional, simplified and variant forms. 历 U+5386 曆 U+66C6 how to link? 歷 U+6B77 Linking information is available from Unihan website. Only if this linking is maintained by the vendor, migration can be considered.

City University of Hong Kong Chinese University of Hong Kong The HKIUG Unicode Project, Philip Wong (CityU) & Ho Yee Ip (CUHK), Fourth HKIUG Meeting, 8-9 Dec 2003 Slide 40 {21387D} {215938} U+591A U+8B1D 多 謝 Thank You The HKIUG Unicode Project - The End