Download presentation
Presentation is loading. Please wait.
Published byJocelyn Lucas Modified over 10 years ago
1
CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library, UC Berkeley April 5, 2006
2
EACC/MARC21 and Unicode East Asian Character Code (EACC) is MARC-8 CJK in MARC21 Migration to Unicode Library of Congress database RLGs Union catalog database OCLCs WorldCat database CJK Bibliographic records are restricted to EACC characters
3
Microsoft IME Variants Non-MARC21 characters Duplicate CJK characters (e.g., F937, and, 8DEF) Close variants (e.g., 6B65, and, 6B69) Typically one of these variants is a MARC21 character CJK character validation errors in OCLC OCLC XWC (Extended WorldCat) in Oracle database is built on Unicode OCLC online cataloging follows MARC21 standards CJK scripts are input by using Microsoft Global Input Method Editors (IMEs) Non-MARC21 characters cause CJK character validation errors
4
OCLC Connxion / IME Online Cataloging Examples Title: (simplified ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 2 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional ) Title: (simplified ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional ) Title: (traditional ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional )
5
OCLC Connxion / IME Online Cataloging Examples Title: (simplified ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters (traditional ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional ) Title: only can be found in the traditional list; this character does not exist in the simplified list
6
Solutions Unihan Database CJK Compatibility Database OCLC CJK E-dictionary
7
Unihan Database http://www.unicode.org/charts/unihan.html http://www.unicode.org/charts/unihan.html Unihan database index Unihan grid index Unihan radical-stroke index Unihan database information (I) Several different glyphs for the character (N) Different representations of the character's scalar value (N) Mappings to the IRG sources for the character (I) Mappings to major industrial and national standards and other character collections (N) Positions in the four dictionaries used by the IRG (I) Positions in other commonly-used dictionaries (I) Radical-stroke counts as derived from different sources (I) Phonetic data derived from various sources (I) Other dictionary data (I) Variants (with links to the variant forms) Compounds containing the character (I) Other information contained in the Unihan database
8
Unihan Database Search (U+6237)
9
Unihan Database Search (U+6236)
10
CJK Compatibility Database http://www.loc.gov/ils/cjk_search/cjk_cpso.html http://www.loc.gov/ils/cjk_search/cjk_cpso.html Replace a non-MARC21 character with its MARC21 equivalent Steps for using the CJK compatibility database 1) Copy the invalid character from your bibliographic record 2) Open the CJK Compatibility PageCJK Compatibility Page 3) Paste the invalid character in the white box and use the index "Invalid character" 4) Click "Submit" 5) Copy & Paste the valid alternative into your bibliographic record
11
CJK Compatibility Database Search
12
OCLC CJK E-Dictionary
13
OCLC CJK E-Dictionary Search
15
CJK Character Validation Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.