1 Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library - Tsutomu SUZUKI Waseda University.

Slides:



Advertisements
Similar presentations
CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,
Advertisements

OCLC Online Computer Library Center Connexion Overview Session OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston.
FROM RLIN TO OCLC CONNEXION DIFFERENT WORKFLOWS AND DIFFERENT PRACTICE Teresa Mei East Asian Catalog Librarian Cornell University Library.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
A Comparative Study of Searching Korean Scripts in OPACs: The Impact of Spaces Miree Ku Duke University.
OCLC Online Computer Library Center Connexion Client 1.30 for Multiscripts Cataloging CJK User Group Meeting, Chicago April 2, 2005 David Whitehair and.
Auto-Graphics Update Mary E. Jackson Product Manager, Resource Sharing October 20, 2010.
MARC 101 for Non-Catalogers Colorado Horizon Users Group Meeting Philip S. Miller Library Castle Rock, CO May 29, 2007.
Library Online Catalog Tutorial Pentagon Library Last Updated March 2008.
ALEPH 500 Union Catalogue Overview Judy Levi Senior Product Analyst Ex Libris Ltd. November 2004.
Basic Copy Cataloging (Books) Prepared by Lynnette Fields, Lori Murphy, Kathy Nystrom, Shelley Stone as an LSTA grant “Funding for this grant was awarded.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
Acquisitions and Serials in 2005 and beyond Georgia Fujikawa Manager, Training Programs.
Millennium Cataloging in Release 2005 Georgia Fujikawa Manager, Training Programs.
Library integrated system -Aleph Fang Peng Stony Brook University.
Hong Kong Chinese Authority (Name) Project Latest developments CEAL 2002 Annual Meeting Washington, D.C. Maria Lau HKCAN Workgroup.
1 INNOPAC at Waseda University Library: 3 years experience Masatsugu KANEKO Waseda University Library Hong Kong INNOPAC Users.
Last revised: 10 December 2006 HKIUG Unicode Task Force and the EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
PAMUG Pennsylvania Millennium Users’ Group. Issues we’ve had to resolve Authentication methods for My Millennium WebPAC icons for material type codes.
Voyager Developer Meeting March 9-10, 2011 Chicago, IL Voyager Developer Meeting March 9-10, 2011 Chicago, IL Michael Doran, Systems Librarian University.
SEARCHING AND COPY- CATALOGING MUSIC IN CONNEXION CLIENT CLA TECHNICAL SERVICES INTEREST GROUP & THE MUSIC LIBRARY ASSOCIATION, SOUTHERN CALIFORNIA CHAPTER,
AGent Demonstration Multi-Tier Solution Presented by Auto-Graphics Pomona, CA December 8-9, 2003 Version 2.0.
The Library of Congress CIP E-book Metadata Pilot 2012 American Library Association Annual Conference Anaheim, California June 23 rd and June 25 th, 2012.
New Innovative Access to Educational and Cultural Multimedia Contents Yuka Egusa Educational Resources Research Center, National Institute for Educational.
Global Update with Confidence Mary M. Strouse Innovative Users Group May 19, 2009.
WILIUG 1. June 2, 2005 Using Review Files with Millennium Rapid & Global Update jenny schmidt SWITCH Library Consortium.
East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands,
Getting started on informaworld™ How do I register my institution with informaworld™? How is my institution’s online access activated? What do I do if.
The world’s libraries. Connected. Batchload Process for Alberta Libraries Carol Ritzenthaler Customer Support OCLC July 2013.
Version 18 Upgrade: Web OPAC. Version 18 Upgrade: Web OPAC Customization 2 All of the information in this document is the property of Ex Libris Ltd. It.
The physical parts of a computer are called hardware.
Advanced Catalog Use Rich Edwards Innovative Coordinator Washington State Library.
CiNii Books is a service that provides information, which has been accumulated by NACSIS-CAT, on books and journals that are held in university libraries.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
WILIUG June 2015 Julie Woodruff Indianhead Federated Library System Eau Claire, WI.
The Transition to WMS: A Cataloger’s View from the Middle of Things MOUG Annual Meeting 2013, San Jose, CA Mac Nelson, The University of North Carolina.
Looking to the East: Challenges in Connecting Asian Libraries in the World of Information Karen T. Wei University of Illinois at Urbana-Champaign Hong.
Joyce Bell Catalog Division Coordinator Princeton University Bib Linking print and electronic records.
Basic Catalog Searching Rich Edwards Innovative Coordinator Washington State Library.
Goals for Shared ILS Development √ 4.10 upgrade (2/7/12) √ (Feb – April, 2012) √ 4.12 upgrade (5/31/12) 4.12 bug fix release – late.
Estonian overview Anneli Sepp Database Manager of ELNET Consortium.
WorldCat Local & World Cat Quick Start a new way to search your library’s resources and the world beyond.
University Library System, CUHK 香港中文大學圖書館系統 University Library System The Chinese University of Hong Kong Simple, Flexible and Informative - Personalised.
Highlights from recent MARC changes Sally McCallum Library of Congress.
OPAC Training aid (Library solutions & Library world)
OCLC Online Computer Library Center Annual Report: New Enterprises & Development News Marty Withrow, Director Product Development Division oclc.org.
RDA Toolkit is an integrated, browser-based, online product that allow user to interact with a collection of cataloging-related documents and resources.
Create Lists in Millennium Jenny Schmidt SWITCH Library Consortium.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Demonstration of HKCAN database Outline Database system overview Software characteristics Database status.
Converting Millennium ILS Bibliographic records into Dublin- Core XML format for DSpace Alan Ng Hong Kong University Libraries PNC 2009 Annual Conference.
The physical parts of a computer are called hardware.
1 RDA Day 2: Using the RDA Toolkit
12 Basic IQ Skills: Online Information about your Library Transactions.
ILL Inter-Library Loan. Inter-Library Loan Overview The ILL module is for the management of Inter-library loans received and sent by Your library.
A& M Libraries Voyager Training Basic Cataloging February 21, 2007 Janet H. Ahrberg Oklahoma State University Library.
Adding New Journal Issues to an Existing Bibliographic Record A tutorial for librarians who use Agent-Verso by Jennifer Carless.
The ___ is a global network of computer networks Internet.
Catalogs, MARC and other metadata Kathryn Lybarger March 25, 2009.
The Athens Regional Library System is proud to introduce the new statewide online public access catalog Developed and maintained by the: Georgia Public.
SIERRA CATALOGING BASICS. CONTACT INFORMATION Lynn Uhlman Systems Training and Support Librarian Ticket:
HKIUG Unicode Task Force and the EACC to Unicode Migration
Tools and Techniques to Clean Up your Database
Workshop on XML-Based Library Applications 5
Document Delivery of Chinese Language Materials
How to Evaluate a Library Service: Methods and Examples
EACC to Unicode Migration
Customization of Innovative’s Encore Discovery Solution
Presentation transcript:

1 Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library - Tsutomu SUZUKI Waseda University Library 4 th Hong Kong INNOPAC Users Group Meeting December 2003

2 WASEDA University Overview  Founded in 1882  Now has: undergraduate schools graduate schools -- 5 large campus libraries & 27 small libraries -- 2 university museums -- 44,576 undergraduate and 6,147 graduate students (as of end April, 2002)

3 Library Overview (as of March 31, 2002)  4,705,597 books (2,980,352 cjk books + 1,725,245 western books)  49,615 journal titles (Currently subscribing 19,509)  879,336 items checked out / year  ILL transactions : 13,951 requesets to other libraries : 18,491 requesets received from other libraries  Total number of Central Library visits : 1,197,731 ( – )

4 Current Status of Our INNOPAC Recent record numbers (Oct. 29, 2003) from M-I-F-S  1,752,690 bibliographic records  3,434,122 item records  52,133 check-in records Public Catalog Searches from “ANALYZE patron searches”  5,149,322 searches ( )

5 Unicode Port on WEBPAC  On November 17 th, Unicode OPAC was released to the public. ( some character code troubles still remain....)  Downloading Chinese & Korean bib data from OCLC.  Record Maitainance: AnzioWin  Number of the C & K bib records (as of 11 th Nov.) :15,971 bibs of Chinese materials :157 bibs of Korean materials

6 Appearance - Chinese record -

7 Appearance - Korean record -

8 Character code issues DisplaySearchGlyph Case1Mapping ErrorNG Case2Shift_JIS to EACC issueNG Case3 EACC layers related issue NG Case4 Duplication codes in EACC NG Case5 Not Unified character in UNICODE NG

9 Case1: Mapping Error The screen below shows my patron record on Millennium Circulation. One of Katakana character “Zu” is not displayed properly.

10 Case1: Mapping Error If I search “suzuki” on Unicode-OPAC, “zu” is ignored and “suki” hit.

11 Case1: Mapping Error SJIS: 253A EACC: 69253A SJISEACCUNICODE This EACC character is NOT mapped to any UNICODE character. It should be mapped to 30BA in UNICODE. UNICODE:30BA

12 Case2: Shift-JIS to EACC Issue When I search for this hanji on Shift_JIS OPAC, then Innopac returns only 9 records.

13 Case2: Shift-JIS to EACC Issue SJIS: 97E9 EACC: SJISEACCUNICODE The EACC character ”215D58” is not assigned any glyph, according to the OCLC CJK But the mapping from S-JIS to EACC works fine.

14 Case2: Shift-JIS to EACC Issue On the other hand, I searched this hanji on Unicode OPAC, then Innopac returned more than 2,000 records!

15 Case2: Shift-JIS to EACC Issue UNICODE: 6FDB EACC: SJISEACCUNICODE These Shift_JIS and Unicode characters have the same glyph, but Innopac stored them into two different EACC code positions. Therefore we can NOT search both characters at once. SJIS: 97E9 EACC: No relationship

16 Case2: Shift-JIS to EACC Issue UNICODE: 6FDB EACC: SJISEACCUNICODE SJIS: 97E9 EACC: One of the solutions Change the mapping of this Shift_JIS character from to

17 Case3: EACC Layers Related Issue Shift_JIS Telnet Screen Sample (my record). The data is displayed correctly.

18 Case3: EACC Layers Related Issue SJIS: 97E9 EACC: 215D58 SJISEACCUNICODE In Shift_JIS environment, there is no troubles in searching and displaying this character.

19 Case3: EACC Layers Related Issue We can see the same data properly on Millennium. {69253a} is other problem already mentioned in case 1.

20 Case3: EACC Layers Related Issue Reviewing the same data AFTER editing an element (NOTE) on Millennium. EACC character codes are displayed directly at one of name field and address.

21 Case3: EACC Layers Related Issue We can see the data correctly on Millennium even after editting.

22 Case3: EACC Layers Related Issue SJIS: 97E9 EACC: 215D58 EACC: 4B5D58 SJISEACCUNICODE UNICODE: 9234 Relationship Same code position on other layers

23 Case3: EACC Layers Related Issue SJIS: 97E9 EACC: 215D58 EACC: 4B5D58 SJISEACCUNICODE UNICODE: 9234 No character assigned {4B5D58} If records including this character are saved on Millennium, this hanji is NOT stored as original EACC code (215D58). Relationship Same code position on other layers

24 Case4: Duplication codes in EACC

25 Case4: Duplication codes in EACC There are more than 1,000 records by “matsu” on Shift_JIS OPAC.

26 Case4: Duplication codes in EACC There is ONLY one record by “matsu” on Unicode OPAC. (The below shows direct hit result.)

27 Case4: Duplication codes in EACC UNICODE: 677E EACC: 21442D SJISEACCUNICODE We can DISPLAY both 21442D and in Unicode OPAC, but only is searchable. Because of this EACC code duplication, the search results is NOT same between Shift_JIS OPAC and Unicode OPAC. SJIS: 8FBC EACC:

28 Case5: Not Unified characters in UNICODE Do you think these two characters are same or not? UNICODE: 5618UNICODE: 5653

29 The result of searching “uso” on Shift_JIS OPAC. Case5: Not Unified characters in UNICODE

30 The same search on Unicode OPAC. The result does not seem correct. Case5: Not Unified characters in UNICODE

31 Case5: Not Unified Characters in UNICODE Input the other “uso” by picking up from code table, the result is the same as Shift_JIS OPAC.

32 Case5: Not Unified Characters in UNICODE UNICODE: 5618 EACC: 21373B SJISEACCUNICODE UNICODE: 5653 SJIS: 8952 NOT HIT!

33 Case5: Not Unified Characters in UNICODE UNICODE: 5618 EACC: 21373B SJISEACCUNICODE UNICODE: 5653 SJIS: 8952 This 5618 should be normalized as 5653 in searching.

34 Normalization issue Some special characters are ignored at searching on Unicode OPAC. In this sample, “Cho-on”, Japanese prolonged sound symbol does not work. This search means “Harry Potter” in Katakana form.

35 Example of NOT unified characters (Case5) Unicode:6236,6237,6238

36 Related Documents & Information  The Library of Congress Homepage MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media -- CHARACTER SETS: Part 3 -- Code Table 9: EAST ASIAN (June 16, 2003)  The Unicode Standard Version 3.0. The Unicode Consortium. ISBN (Version 4.0 released now)  OCLC CJK and it’s contents in HELP

37 Unicode Opac in Japan  University of Tokyo Multilingual OPAC the University of Tokyo  National Diet Library NDL Asian Language Materials OPAC

38 Thank you!! The Best Solution Unicode + normalization scheme