Last revised: 8 April 2006 EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of Science and Technology Library

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Repaso: Unidad 2 Lección 2
1 A B C
Simplifications of Context-Free Grammars
Variations of the Turing Machine
3rd Annual Plex/2E Worldwide Users Conference 13A Batch Processing in 2E Jeffrey A. Welsh, STAR BASE Consulting, Inc. September 20, 2007.
AP STUDY SESSION 2.
1
& dding ubtracting ractions.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2013 Elsevier Inc. All rights reserved.
CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,
Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.
David Burdett May 11, 2004 Package Binding for WS CDL.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Create an Application Title 1Y - Youth Chapter 5.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Custom Services and Training Provider Details Chapter 4.
CALENDAR.
1 Making Changes to Existing Name and Work/Expression Authority Records Module 7. Making Changes to Existing Name and Work/Expression Authority Records.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
Chapter 7: Steady-State Errors 1 ©2000, John Wiley & Sons, Inc. Nise/Control Systems Engineering, 3/e Chapter 7 Steady-State Errors.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
Turing Machines.
ETS4 - What's new? - How to start? - Any questions?
PP Test Review Sections 6-1 to 6-6
Employee & Manager Self Service Overview
1 IMDS Tutorial Integrated Microarray Database System.
User Friendly Price Book Maintenance A Family of Enhancements For iSeries 400 DMAS from Copyright I/O International, 2006, 2007, 2008, 2010 Skip Intro.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
© Copyright by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Outline 24.1 Test-Driving the Ticket Information Application.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
GIS Lecture 8 Spatial Data Processing.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Item Class User Friendly Maintenance  Copyright.
Adding Up In Chunks.
FAFSA on the Web Preview Presentation December 2013.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
2004 EBSCO Publishing Presentation on EBSCOadmin.
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Prof.ir. Klaas H.J. Robers, 14 July Graduation: a process organised by YOU.
Types of selection structures
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
Chapter 12 Working with Forms Principles of Web Design, 4 th Edition.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
& dding ubtracting ractions.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Import Tracking and Landed Cost Processing An Enhancement For AS/400 DMAS from  Copyright I/O International, 2001, 2005, 2008, 2012 Skip Intro Version.
1.step PMIT start + initial project data input Concept Concept.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
6 th Annual Hong Kong Innovative Users Group Meeting 8-9 December 2005, Hong Kong HKIUG’s Unicode Projects Untangling the Chaotic Codes Philip Wong City.
Last revised: 10 December 2006 HKIUG Unicode Task Force and the EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of.
HKIUG Unicode Task Force and the EACC to Unicode Migration
EACC to Unicode Migration
Presentation transcript:

Last revised: 8 April 2006 EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of Science and Technology Library OCLC CJK Users Group 2006 Annual Meeting April , San Francisco

EACC to Unicode Migration – K.T. Lam, HKUST Library 2 Contents Migrating systems from EACC to Unicode environments Migrating systems from EACC to Unicode environments Why migrating? Why migrating? What has been done? What has been done? HKIUG Unicode Initiatives HKIUG Unicode Initiatives Issues Issues EACC/Unicode mapping table EACC/Unicode mapping table Round-trip cross-walk Round-trip cross-walk Improving searching with TSVCC Linking Improving searching with TSVCC Linking Font display Font display

EACC to Unicode Migration – K.T. Lam, HKUST Library 3 An Observation …

EACC to Unicode Migration – K.T. Lam, HKUST Library 4 [Calendar] [History] Simplified form of and [System for determining the beginning, length and divisions of a year]

EACC to Unicode Migration – K.T. Lam, HKUST Library 5 was incorrectly displayed as. Is it a data entry error? a display problem? or what?

EACC to Unicode Migration – K.T. Lam, HKUST Library 6 Why Migrating? EACC (East Asian Character Code, ANSI Z ) was introduced into the CJK library community by RLG in the early 1980s (known as REACC at that time) EACC (East Asian Character Code, ANSI Z ) was introduced into the CJK library community by RLG in the early 1980s (known as REACC at that time) Its was an important milestone – for the first time, we began to have a C-J-K unified standard with a relatively large character set (about 16,000) for use in bibliographic records Its was an important milestone – for the first time, we began to have a C-J-K unified standard with a relatively large character set (about 16,000) for use in bibliographic records

EACC to Unicode Migration – K.T. Lam, HKUST Library 7 Why Migrating? [cont.] By adopting EACC as an alternate character set in MARC 21 (at that time it was called USMARC), libraries with East Asian collections were able to share and use CJK cataloging records via the OCLC and RLIN cataloging platforms By adopting EACC as an alternate character set in MARC 21 (at that time it was called USMARC), libraries with East Asian collections were able to share and use CJK cataloging records via the OCLC and RLIN cataloging platforms However, great effort is required for integrated library systems (ILS) to make use of the EACC- based CJK data in the records However, great effort is required for integrated library systems (ILS) to make use of the EACC- based CJK data in the records

EACC to Unicode Migration – K.T. Lam, HKUST Library 8 Why Migrating? [cont.] To communicate in EACC is extremely difficult because EACC failed to be supported in the mainstream IT environment To communicate in EACC is extremely difficult because EACC failed to be supported in the mainstream IT environment Hardly you can find EACC supported by operating systems, fonts, input methods, editors, etc., both in the old days and today Hardly you can find EACC supported by operating systems, fonts, input methods, editors, etc., both in the old days and today It will also be unlikely to see EACC supported in web browsers in the current Internet era It will also be unlikely to see EACC supported in web browsers in the current Internet era Why? – EACCs three-byte coding structure is alien to the binary computing world

EACC to Unicode Migration – K.T. Lam, HKUST Library 9 Why Migrating? [cont.] Due to its unpopularity, EACC became a frozen standard and there is no way to fix errors and add characters Due to its unpopularity, EACC became a frozen standard and there is no way to fix errors and add characters If EACC is stored natively in the bibliographic database, then in order to input and display CJK characters at the application layers (such as OPAC and record editor), ILS will have to rely on lossy mapping tables to map EACC to other character encodings (e.g. BIG5, GB, JIS, KSC and UTF-8) If EACC is stored natively in the bibliographic database, then in order to input and display CJK characters at the application layers (such as OPAC and record editor), ILS will have to rely on lossy mapping tables to map EACC to other character encodings (e.g. BIG5, GB, JIS, KSC and UTF-8)

EACC to Unicode Migration – K.T. Lam, HKUST Library 10 Why Migrating? [cont.] Unicode comes to the rescue Unicode comes to the rescue Single standard for written texts of almost all languages in the world Single standard for written texts of almost all languages in the world Has more than 96,000 characters, most of them are CJK Has more than 96,000 characters, most of them are CJK An active standard, with constant updates An active standard, with constant updates Widely adopted and supported in the current IT environment – major operating systems and web browsers, plus many devices and applications, speak the Unicode language Widely adopted and supported in the current IT environment – major operating systems and web browsers, plus many devices and applications, speak the Unicode language

EACC to Unicode Migration – K.T. Lam, HKUST Library 11 Why Migrating? [cont.] With more than 25 years influence by EACC, it is unlikely that all library systems and data can be migrated overnight to the Unicode mainstream With more than 25 years influence by EACC, it is unlikely that all library systems and data can be migrated overnight to the Unicode mainstream It is anticipated that there will be a period of parallel operation, with co-existing EACC and Unicode bibliographic data interchanging among systems, resulting in confusion and data loss It is anticipated that there will be a period of parallel operation, with co-existing EACC and Unicode bibliographic data interchanging among systems, resulting in confusion and data loss Even if systems have migrated to Unicode, there are still problems that require attention Even if systems have migrated to Unicode, there are still problems that require attention

EACC to Unicode Migration – K.T. Lam, HKUST Library 12 What has been done? MARC 21 specifications for MARC-8 and UCS/Unicode environment MARC 21 specifications for MARC-8 and UCS/Unicode environment LCs code tables for mapping between MARC-8 and Unicode LCs code tables for mapping between MARC-8 and Unicode OCLC WorldCat migration to Unicode platform OCLC WorldCat migration to Unicode platform OCLC Connexions Unicode support OCLC Connexions Unicode support LCs Voyager upgrade LCs Voyager upgrade INNOPAC/Millennium INNOPAC/Millennium HKIUG Unicode Initiatives HKIUG Unicode Initiatives

EACC to Unicode Migration – K.T. Lam, HKUST Library 13 MARC 21 Specifications In 2000, the Library of Congress issued: In 2000, the Library of Congress issued: Specifications to distinguish the encoding of MARC 21 records in the original (MARC-8) environment and in the new UCS/Unicode environment [ MARC-8 means characters are encoded in one 8-bit byte (e.g. ASCII) and three 8-bit bytes (e.g. EACC) MARC-8 means characters are encoded in one 8-bit byte (e.g. ASCII) and three 8-bit bytes (e.g. EACC)

EACC to Unicode Migration – K.T. Lam, HKUST Library 14 A MARC 21 bibliographic record in ISO2709 format viewed in Notepad, showing CJK characters encoded in EACC in MARC-8 environment

EACC to Unicode Migration – K.T. Lam, HKUST Library 15 MARC 21 Specifications [cont.] UCS/Unicode Environment [ UCS/Unicode Environment [ Use UTF-8 as character encoding Use UTF-8 as character encoding Leader position 9 contains value a Leader position 9 contains value a Field 066 (Character Sets Present) is not needed Field 066 (Character Sets Present) is not needed The script identification information in subfield 6 (Linkage) can be dropped The script identification information in subfield 6 (Linkage) can be dropped Lengths specified by number of 8-bit bytes, rather than number of characters. Lengths specified by number of 8-bit bytes, rather than number of characters.

EACC to Unicode Migration – K.T. Lam, HKUST Library 16 MARC 21 Specifications [cont.] Unicode combining rule for diacritics, i.e. combining marks follow rather than precede the character they modify Unicode combining rule for diacritics, i.e. combining marks follow rather than precede the character they modify

EACC to Unicode Migration – K.T. Lam, HKUST Library 17 A MARC 21 bibliographic record in ISO2709 format viewed in Notepad, showing CJK characters encoded in UTF-8 in UCS/Unicode environment

EACC to Unicode Migration – K.T. Lam, HKUST Library 18 MARC 21 Specifications [cont.] LC issued code tables for mapping between MARC-8 and UCS/Unicode: LC issued code tables for mapping between MARC-8 and UCS/Unicode: Not only for EACC, but also for other Latin and non-Latin scripts such as ANSEL, Hebrew, Cyrillic, Arabic and Greek Not only for EACC, but also for other Latin and non-Latin scripts such as ANSEL, Hebrew, Cyrillic, Arabic and Greek Provide essential information for ILSs Unicode implementation Provide essential information for ILSs Unicode implementation

EACC to Unicode Migration – K.T. Lam, HKUST Library 19

EACC to Unicode Migration – K.T. Lam, HKUST Library 20

EACC to Unicode Migration – K.T. Lam, HKUST Library 21 MARC 21 Specifications [cont.] UNICODE-MARC Discussion List [ UNICODE-MARC Discussion List [ Since July 2005 Since July 2005 Active discussion on issues concerning Unicode implementation in MARC 21 Active discussion on issues concerning Unicode implementation in MARC 21 Some of the discussion was summarized as MARC Proposal , "Technique for conversion of Unicode to MARC-8, and was approved by MARBI in January 2006, with changes. [ Some of the discussion was summarized as MARC Proposal , "Technique for conversion of Unicode to MARC-8, and was approved by MARBI in January 2006, with changes. [

EACC to Unicode Migration – K.T. Lam, HKUST Library 22 OCLC WorldCat and Connexion WorldCat – migrated to Oracle with Unicode support WorldCat – migrated to Oracle with Unicode support Released Connexion client software Released Connexion client software Unicode-based, running on Windows Unicode-based, running on Windows Comprehensive CJK support Comprehensive CJK support Rely on Windows IME for input of CJK characters Rely on Windows IME for input of CJK characters Export and import of records in both MARC-8 and UCS/Unicode environments. Export and import of records in both MARC-8 and UCS/Unicode environments.

EACC to Unicode Migration – K.T. Lam, HKUST Library 23 LCs Catalog Its Voyager system was upgraded recently to provide Unicode support Its Voyager system was upgraded recently to provide Unicode support Capable of displaying and searching CJK data in 880 fields Capable of displaying and searching CJK data in 880 fields Allows export of records in MARC-8 and Unicode environments Allows export of records in MARC-8 and Unicode environments Issued a cataloging policy position paper for the Unicode implementation at LC (March 2006), with details on current implementation and future opportunities [ Issued a cataloging policy position paper for the Unicode implementation at LC (March 2006), with details on current implementation and future opportunities [

EACC to Unicode Migration – K.T. Lam, HKUST Library 24 INNOPAC/Millennium INNOPAC has been supporting EACC, and CJK in general, since its implementation at HKUST Library 15 years ago INNOPAC has been supporting EACC, and CJK in general, since its implementation at HKUST Library 15 years ago Millennium clients run on Windows XP with Unicode support Millennium clients run on Windows XP with Unicode support CJK records are stored in EACC internally; but provides option to migrate the storage to Unicode CJK records are stored in EACC internally; but provides option to migrate the storage to Unicode HKIUG Unicode Task Force is working with the vendor to improve the Unicode storage HKIUG Unicode Task Force is working with the vendor to improve the Unicode storage

EACC to Unicode Migration – K.T. Lam, HKUST Library 25 HKIUG Unicode Initiatives HKIUG – Hong Kong Innovative Users Group HKIUG – Hong Kong Innovative Users Group Founded in 1996 Founded in 1996 Members from all 15 INNOPAC libraries in Hong Kong and Macau, including the eight Hong Kong government-funded universities Members from all 15 INNOPAC libraries in Hong Kong and Macau, including the eight Hong Kong government-funded universities HKIUG Unicode Initiatives – since 2003, to work closely with the ILS vendor (Innovative Interfaces Inc.) to improve INNOPAC / Millenniums CJK support HKIUG Unicode Initiatives – since 2003, to work closely with the ILS vendor (Innovative Interfaces Inc.) to improve INNOPAC / Millenniums CJK support

EACC to Unicode Migration – K.T. Lam, HKUST Library 26 HKIUG Unicode Initiatives [cont.] Achievements: Achievements: Developed HKIUG Version of the EACC to Unicode mapping table Developed HKIUG Version of the EACC to Unicode mapping table Resolved EACC to Unicode multi-mapping problem Resolved EACC to Unicode multi-mapping problem Developed TSVCC (Traditional, Simplified, Variant Chinese Characters) linking tables Developed TSVCC (Traditional, Simplified, Variant Chinese Characters) linking tables HKIUG Unicode Task Force - to maintain the Unicode and TSVCC tables and to assist the vendor on Unicode migration; members from CUHK, CITYU, HKUST and HKU HKIUG Unicode Task Force - to maintain the Unicode and TSVCC tables and to assist the vendor on Unicode migration; members from CUHK, CITYU, HKUST and HKU

EACC to Unicode Migration – K.T. Lam, HKUST Library 27 Migration Issues The need of EACC/Unicode mapping table The need of EACC/Unicode mapping table Multi-mapping and round trip failure problems Multi-mapping and round trip failure problems TSVCC linking TSVCC linking Font display problem Font display problem

EACC to Unicode Migration – K.T. Lam, HKUST Library 28 HKIUG EACC/Unicode Table First released in September 2003; last revised in August 2005 First released in September 2003; last revised in August 2005 Contains: Contains: EACC characters EACC characters 7043 pure CCCII characters 7043 pure CCCII characters Mapping for EACC characters - follows LC as much as possible Mapping for EACC characters - follows LC as much as possible Contains 7043 Pure CCCII that have no EACC equivalent - includes them to avoid too many missing characters Contains 7043 Pure CCCII that have no EACC equivalent - includes them to avoid too many missing characters

EACC to Unicode Migration – K.T. Lam, HKUST Library 29

EACC to Unicode Migration – K.T. Lam, HKUST Library 30

EACC to Unicode Migration – K.T. Lam, HKUST Library 31 HKIUG EACC/Unicode Table [cont.] Identified: Identified: 160 multi-mapping linked cases, e.g. 160 multi-mapping linked cases, e.g. 49 multi-mapping unlinked cases, e.g. 49 multi-mapping unlinked cases, e.g. Causing failure in round-trip crosswalk Causing failure in round-trip crosswalk

EACC to Unicode Migration – K.T. Lam, HKUST Library 32 Library EACC Round-trip Crosswalk Failure Step 2: U+7CFB 1. Library contributes in EACC {274349}, which is the simplified form of 4. Library receives in EACC {27462A}, which is the simplified form of 2. Connexion finds {274349} in mapping table and stores in Unicode U+5386 OCLC WorldCat Export from OCLCImport to OCLC 3. Connexion finds {274349} and {27462A} in mapping table and decides to output in EACC {27462A} Unicode

EACC to Unicode Migration – K.T. Lam, HKUST Library 33 U+5386

EACC to Unicode Migration – K.T. Lam, HKUST Library 34 Export

EACC to Unicode Migration – K.T. Lam, HKUST Library 35 Export output is { A} – incorrect!

EACC to Unicode Migration – K.T. Lam, HKUST Library 36 TSVCC Linking When searching Li fa, you will prefer to retrieve records that have: When searching Li fa, you will prefer to retrieve records that have: where and have Traditional – Simplified relationship Similarly, when searching, you will prefer to retrieve its Variant Similarly, when searching, you will prefer to retrieve its Variant Requires linking T,S,V forms during searching Requires linking T,S,V forms during searching

EACC to Unicode Migration – K.T. Lam, HKUST Library 37 In LCs Online Catalog, searching title will retrieve 3 hits.

EACC to Unicode Migration – K.T. Lam, HKUST Library 38 Searching with the simplified form of, will however retrieve 3 other hits.

EACC to Unicode Migration – K.T. Lam, HKUST Library 39 ? Excuse me, are they typos! Shouldnt it be ?

EACC to Unicode Migration – K.T. Lam, HKUST Library 40 Google is capable linking and

EACC to Unicode Migration – K.T. Lam, HKUST Library 41 TSVCC Linking [cont.] HKIUG Unicode Task Force constructed two versions of TSVCC Linking tables HKIUG Unicode Task Force constructed two versions of TSVCC Linking tables EACC Version [released November 2004] EACC Version [released November 2004] Unicode Version [draft created March 2006] Unicode Version [draft created March 2006] for ILSs that store characters in EACC and in Unicode respectively

EACC to Unicode Migration – K.T. Lam, HKUST Library 42 TSVCC Linking [cont.] EACC Version EACC Version Table M (80 entries)– linking relationship is not purely from EACC, e.g. Table M (80 entries)– linking relationship is not purely from EACC, e.g | | 2D4349 | 21462A | 27462A | 4B462A | #U+5386 multi-mapped 27462A, Table V (3065 entries) – linking relationship is purely from EACC, e.g. Table V (3065 entries) – linking relationship is purely from EACC, e.g C | 2D306C | 33306C | 4B306C

EACC to Unicode Migration – K.T. Lam, HKUST Library 43

EACC to Unicode Migration – K.T. Lam, HKUST Library 44

EACC to Unicode Migration – K.T. Lam, HKUST Library 45 TSVCC Linking [cont.] Unicode Version Unicode Version Still in draft construction Still in draft construction So far has 3061 entries, e.g. So far has 3061 entries, e.g. U+5C5B | U+5C4F | U+6452 | #EACC link ([27/21]415A) AND Variant form of U+5C4F is U+5C5B U+965D | U+965C | U+9655 | #EACC link ([23/29]4A44) AND Simplified form of U+965D is U+9655 is

EACC to Unicode Migration – K.T. Lam, HKUST Library 46

EACC to Unicode Migration – K.T. Lam, HKUST Library 47

EACC to Unicode Migration – K.T. Lam, HKUST Library 48 TSVCC Linking [cont.] Plan to include linking of New/Old forms in the TSVCC Unicode Version, e.g. Plan to include linking of New/Old forms in the TSVCC Unicode Version, e.g.

EACC to Unicode Migration – K.T. Lam, HKUST Library 49 TSVCC Linking [cont.] Results of implementing TSVCC Linking: Results of implementing TSVCC Linking: Improvement in searching – higher recall Improvement in searching – higher recall Trade-off – lower precision Trade-off – lower precision If search results are sorted/displayed in TSVCC normalized form, misleading and inaccurate display may occur - such as the OCLC Connexion browse list display problem mentioned previously If search results are sorted/displayed in TSVCC normalized form, misleading and inaccurate display may occur - such as the OCLC Connexion browse list display problem mentioned previously

EACC to Unicode Migration – K.T. Lam, HKUST Library 50 Font Issues Do not believe in What you see is what you have, because What you see varies with fonts ! Do not believe in What you see is what you have, because What you see varies with fonts ! For example, the following glyphs have different code points in EACC: For example, the following glyphs have different code points in EACC:

EACC to Unicode Migration – K.T. Lam, HKUST Library 51 Font Issues But in Unicode, they are assigned the same code points. Depending on the font in use, you will see different glyphs: But in Unicode, they are assigned the same code points. Depending on the font in use, you will see different glyphs:

EACC to Unicode Migration – K.T. Lam, HKUST Library 52 Conclusion How far are we? How far are we? Both LC and OCLC have done enormous work in enabling and promoting the use of Unicode in MARC records Both LC and OCLC have done enormous work in enabling and promoting the use of Unicode in MARC records ILS vendors are working very hard to implement and enhance the Unicode support ILS vendors are working very hard to implement and enhance the Unicode support Libraries and CJK experts are providing advice and suggesting solutions Libraries and CJK experts are providing advice and suggesting solutions

EACC to Unicode Migration – K.T. Lam, HKUST Library 53 Conclusion [cont.] We have reviewed various migration issues: We have reviewed various migration issues: The need for an accurate EACC/Unicode mapping table The need for an accurate EACC/Unicode mapping table Extending to non-EACC characters Extending to non-EACC characters Multi-mappings and round-trip failure Multi-mappings and round-trip failure TSVCC Linking TSVCC Linking Font display issues Font display issues

EACC to Unicode Migration – K.T. Lam, HKUST Library 54 Conclusion [cont.] The failure of round-trip crosswalk between systems will continue to be a problem until everyone interchanges MARC records purely in Unicode. This will only happen when the majority of systems store and use data natively in Unicode. The failure of round-trip crosswalk between systems will continue to be a problem until everyone interchanges MARC records purely in Unicode. This will only happen when the majority of systems store and use data natively in Unicode. Unlike EACC, Unicode does not have a build-in linking relationship. Implementing TSVCC is essential for improving searching. Unlike EACC, Unicode does not have a build-in linking relationship. Implementing TSVCC is essential for improving searching.

EACC to Unicode Migration – K.T. Lam, HKUST Library 55 Additional References Assessment of Options for Handling Full Unicode Character Encodings in MARC Part 1: New Scripts ( January 2004) and Part 2: Issues (June 2005). [ Assessment of Options for Handling Full Unicode Character Encodings in MARC Part 1: New Scripts ( January 2004) and Part 2: Issues (June 2005). [ Joan M. Aliprand. The structure and content of MARC 21 records in the Unicode environment. Information technology and libraries, v.24, no.4, December 2005, p Joan M. Aliprand. The structure and content of MARC 21 records in the Unicode environment. Information technology and libraries, v.24, no.4, December 2005, p Wong, Philip and K.T. Lam. HKIUGs Unicode projects : untangling the chaotic codes. HKIUG Annual Meeting [ Wong, Philip and K.T. Lam. HKIUGs Unicode projects : untangling the chaotic codes. HKIUG Annual Meeting [

EACC to Unicode Migration – K.T. Lam, HKUST Library 56 Thank You!