Health Information Standardization and Asian Languages Michio Kimura M.D. Ph.D. Director and Professor of Medical Informatics Department Hamamatsu University.

Slides:



Advertisements
Similar presentations
Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
Advertisements

An Overview Of Windows NT System Student: Yifan Yang Student ID:
Unicode: A Grand Tour Character Encodings & Unicode.
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Multilanguage -Internationalization -The language is not enough Michio Kimura M.D. Ph.D. Director and Professor of Medical Informatics Department Hamamatsu.

Lecture 2 1 Encoding Schemes Encoding methods: a method of encoding at binary level to ensure identification and the use of a mixture of different character.
Administrivia Assignments Labs Questions?? Class questions – –Goes to dpd and the TA’s Hand in lab assignments.
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Addition : _________________ Binary Numbers (contd)
PZ01BX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ01BX - Standardization, Internationalization Programming.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
Introduction to XML Extensible Markup Language
Lecture 3 1 ISO/IEC and Unicode It is a coded character set(codeset) –Designed for text processing and exchange Features: –Universal: characters.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Computer Science and Software Engineering University of Wisconsin - Platteville Note 9. Internationalization Yan Shi SE 3730 / CS 5730 Lecture Notes Part.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
1 Adrian Rissoné Information Systems Manager Department of Palaeontology The Natural History Museum Introduction ISO and the.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Introduction to Character Encodings, Java and You.
East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands,
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
Unicode (and Java) Brice Giesbrecht.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Chapter 3 Representing Numbers and Text in Binary Information Technology in Theory By Pelin Aksoy and Laura DeNardis.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 10 This presentation © 2004, MacAvon Media Productions Characters & Fonts.
APPX Unicode Support APPX Release 6.0 will support Unicode APPX will support languages worldwide.
Spring /6.831 User Interface Design and Implementation1 Lecture 22: Internationalization.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Web page - A Web page is a simple text file that contains a set of HTML tags (code) that describe (to the browser) what should go on a web page. It may.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
Discussion on Chinese Domain Name technology including encoding, testing.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Character representation.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Anlab ( ) Kim, Yangjung Characters & Fonts.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
Charset to UTF. Good Old Old Days Is there any other language but American ?? EBCDIC ASCII.
Lis508 lecture 2: characters to textual documents Thomas Krichel
Beatriz de Faria Leao, Jun Nakaya IMIA Standards in Health Care Informatics MEDINFO at Copenhagen, Denmark Standardization for the Next.
Introduction of XML & XHTML Webmaster - Fort Collins, CO Copyright © XTR Systems, LLC Overview of XML & XHTML Instructor: Joseph DiVerdi, Ph.D., MBA.
Text encoding: or how to get 黃慧儀 and Ίων Ανδρουτσόπουλος into the same document. Chris Brew Linguistics, Ohio State.
1 Problem Solving using Computers “Data....Representation, and Storage.
M204 - Data Representation
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Characters CS240.
Basics of Unicode (base upon a presentation by NRSI, SIL International)
1 Non-Numeric Data Representation V1.0 (22/10/2005)
Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text
Unit 2.6 Data Representation Lesson 2 ‒ Characters
Machine level representation of data Character representation
Characters & Fonts Digital Multimedia, 2nd edition
Representing Information as bit patterns
Workshop on XML-Based Library Applications 5
Zebra Technologies Technical Support CBT
Devanagari Font Support For Linux
XML Problems and Solutions
Characters & Fonts Digital Multimedia, 2nd edition
How Computers Store Data
Text Encoding.
INFOCODING BASICS & EXAMPLES OF CURRENT USE
Presentation transcript:

Health Information Standardization and Asian Languages Michio Kimura M.D. Ph.D. Director and Professor of Medical Informatics Department Hamamatsu University School of Medicine HL7 Japan chair

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine Three types of representation -- We have 2 patient names in HIS zAlphabetic zIdeographic zPhonetic yIdeographic names xhave many ways to pronounce xare difficult to sort

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine Multi-Byte Character Codes in Use in Asia zKorea: KS X 1001, and 1001 annex 3 yHanguls(phonetic) and Ideographics zChina(PR): GB zTaiwan(ROC): CNS 11643, and Big-5 zJapan: JIS X yKatakana, Hiragana(Ph.) and Ideographics yJunior school pupils must read/write 810 letters. zVarieties: 6879(JIS) to 48711(CNS)

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine ISO Multi-Byte Extension Technique zBase set is usually ASCII 1-byte(ISO 646) zDefines ESCAPE sequence to set character set to G0 or G2 yNot necessarily multi-byte, to set ISO8859-1: ESC. A yIf the set is 2-byte, it is assumed that following codes are recognized 2 bytes each. yTo set JIS X 0208: ESC $ B yTo set KS C 5601: ESC $ ( C yTo set GB 2312: ESC $ A yTo come back to ASCII: ESC ( B

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine Byte-wise Representation of ISO2022

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine RFC 1468: Japanese Character Encoding for Internet Messages zISO-2022-JP zWithin 7-bit, safe for most nodes zEvery line starts/ends with ASCII yNo carryover shifting zISO-2022-KR is also used in Korea zSame method is in DICOM(Supplement 9), and HL7 v.2.3.1

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine UNICODE: ISO10646 z“Allocating 2 bytes for every character, UNICODE can represent every character in the world without any status nor shifting technique.” z16 bits=65,536 y-> CJK unified ideographics

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine CJK Unified Ideographics

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine Why we do not use UNICODE as Message? (I know it is used inside, but, we do not like it go outside as message format.) zIf Chinese “Bone” and our “Bone” are to be recognized same, because of symmetry, how about using these? zUNICODE consortium says “Introduction of Language information”. xWe cannot write “Chinese language textbook written in Japanese. xWe cannot accommodate Koreans living in Japan with their name properly in Korean letter, but their address is Japanese, of course. yOriginal UNICODE dream is gone.

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine UTF-8: Transformation format of UNICODE zUNICODE is originally 2 byte for every character. z F: 0xxxxxxx z FF: 110xxxxx 10xxxxxx z0800-FFFF: 1110xxx 10xxxxxx 10xxxxxx z1 Byte: ASCII z2 Bytes: Latin extensions, Greek, Russian, Arabic, Thai, Hangul, Katakana, Hiragana, etc. z3 Bytes: CJK ideographics zASCII characters are compatible ASCII, ASCII users can say “we are universal, because we use UNICODE,” in the demerit of ideographic users.

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine

HL7 Japan’s answer to HL7 v.3 zIn XML, UNICODE will be default in zEven in UNICODE v3.1, “over-unification” problem is not solved. zBut with XML schema and XML namespace, font information can be set in each tag. yBy this, Korean name in Japanese address can be described. zOriginal UNICODE dream (all languages in the same time) is gone, but “many 1 byte languages + one 2 byte language” is not bad. yPokémon zAnswer: “UNICODE can be default, provided that we can continue to use each local practice now being used.”

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine Language representation is not the only issue zLanguage used in; yConversation with patients ySchool education xMedical, Nurse, Technicians yMedical record xSigns and symptoms xReports zStructure of data types yAddress x250 Wu-Hsing street x Handa cho

Michio Kimura M.D. Ph.D. Hamamatsu University School of Medicine Final Remarks zSome OS (Windows NT 4.0 or later) are using UNICODE inside. zI do not blame their ignorance, maybe they just didn’t know. zI oppose any proposals with “UNICODE is the only way”. zWhen using UNICODE, pay attention to each language’s proper fonts zLet’s collaborate and agree on XML namespace for language to be used, and submit to standards. zPlease take part in APAMI census for healthcare languages