OUTREACHING WITH PRINT RESOURCES IN THE DIGITAL AGE Jidong Yang University of Michigan.

Slides:



Advertisements
Similar presentations
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Advertisements

1 Use of Electronic Resources in Research Prof. Dr. Khalid Mahmood Department of Library & Information Science University of the Punjab.
Texts and Digital Objects What seems to have changed.
Important Cultural Heritage Collections – Yale Yale University Library Ann Okerson Doha – WDL – December 2010.
CrossAsia at the Staatsbibliothek zu Berlin an approach to organise access to research material in the field of Asian studies.
Between the Print and Digital Worlds: Scholarship and Librarianship in Transition Yang Jidong University of Michigan.
Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.
OCLC Online Computer Library Center Connexion Overview Session OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
1 Post-1949 Chinese Local Gazetteers: Digitization and Collaborative Collection Susan Xue Electronic Resources Librarian University of California at Berkeley.
Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
LIBRARY WEBSITE, CATALOG, DATABASES AND FREE WEB RESOURCES.
Susan Xue University of California at Berkeley
Fact-finding Techniques Transparencies
A database is a collection of data that is stored in a computer system. Databases allow their users to enter, access, and analyze their data quickly and.
Database Management Systems and Enterprise Software
1 Survey Technology. Data Collection Tools Available in the Market 1. Paper Survey 2. Smart Paper 3. Cell Phones 4. Personal Digital Assistants - PDAs.
OCLC Online Computer Library Center Connexion Client 1.30 for Multiscripts Cataloging CJK User Group Meeting, Chicago April 2, 2005 David Whitehair and.
Metadata and presentation issues of Korean E-Resources relating to access and discovery ERMB Workshop presented by Erica Chang March 25, 2014 Philadelphia,
COUNTDOWN TO THE COLLEGE APPLICATION MARATHON MONDAY OCTOBER 20 Update your College Greenlight student profile : : you’ve signed up for the College Application.
Erudition 愛如生 Databases Yang Jidong Univ. of Michigan.
The Role of the Patent and Trademark Librarian in the Digital Age Karen Stanley Grigg North Carolina State University Libraries.
English Education in the Japanese school system Globalization has made English in to a great tool Japan is realizing importance of English education –
1 The Forest & the Trees: HKCAN beyond CJK Cataloging Presented by Charlene Chou Columbia University HKCAN Seminar & Opening Oct. 4, 2002.
Digital Text Primer Prepared for: AIEA Roundtable on Digitization of Armenian Documents Saturday 7 October 2006, University of Geneva, Switzerland Roland.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
Management Information Systems
Database Management Systems. This lesson includes the following sections  Databases and Management Systems Working with a Database Enterprise Software.
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
Chapter 6 Text and Multimedia Languages and Properties
APPX Unicode Support APPX Release 6.0 will support Unicode APPX will support languages worldwide.
Smart Searching, part one There is more out there than the “free web”
Revelation – A Book of Contrasts REVELATION 1:3 TEXT.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
ACCESS TO DIGITAL INFORMATION FOR THE PRINT DISABLED Gap analysis and solutions DIPENDRA MANOCHA Coordinator Developing Countries Program DAISY CONSORTIUM.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Character representation.
WorldCat Local & World Cat Quick Start a new way to search your library’s resources and the world beyond.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
1 Search Engine Basics Mr. Shaw. 2 Search Engine Basics Following is simplified tutorial on search engine basics. Following is simplified tutorial on.
IAEA International Atomic Energy Agency OCR at INIS Database Production & Imaging Group Yves Reynaud iaea.org.
Demonstration of HKCAN database Outline Database system overview Software characteristics Database status.
Paperless Publishing web publishing. ebooks. digital paper.
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
National Library of the Czech Republic as End-User of the Research Networks Adolf Knoll deputy director
An exercise in preservation and applied technology Making an Electronic Text.
OCR at INIS Branko Krznarić. Outline  What is OCR?  OCR Objectives  Principles  Techniques  Software INIS Training Seminar October 2015, Vienna,
Nikola Tesla Museum Clipping Library Saša Malkov Nenad Mitić Žarko Mijajlović 3 rd SEEDI Int.Conf. Cetinje, Montenegro 14. September 2007.
M204 - Data Representation
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Representing Characters in a Computer System Representation of Data in Computer Systems.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The purpose of a CPU is to process data Custom written software is created for a user to meet exact purpose Off the shelf software is developed by a software.
1 Non-Numeric Data Representation V1.0 (22/10/2005)
Conversion of information in different coding systems
What is Alchemy?.
Software Development Life Cycle
Software and Multimedia
Software and Multimedia
Central Processing Unit
Group 2 module 2 obj 15 explain the meaning of terms related to the security of Information Technology Systems.
Digitizing Arabic Text: Where are we today?
Indian Journals & Electronic Publishing: Convergence of Trade and Need
Digitizing Arabic Text: Where are we today?
Database Management Systems and Enterprise Software
ASCII and Unicode.
Presentation transcript:

OUTREACHING WITH PRINT RESOURCES IN THE DIGITAL AGE Jidong Yang University of Michigan

Problems with CJK encodings The slow expansion of Chinese encodings: from GB 2312 (about 6,700 characters), Big 5 (13,000 characters) to GBK (22,000 characters), GB (27,000 characters), GB (more than 70,000 characters) and Unicode Version 5 (similar to GB ). Not all computers have all the characters. Many existing databases are built on earlier encodings. Mainstream Japanese encodings: JIS and EUC, each has less than 7,000 kanji characters.

The issue of OCR accuracy When handling contemporary Chinese and Japanese publications in good conditions, the best OCR software can hardly achieve an accuracy rate better than 95%. When processing pre-modern CJK texts, the OCR accuracy drops down to 30-40% or even lower. Many database companies keep their OCR accuracy rate secret.

The early stage of digital scholarship New research methods and tools suitable for digital resources are still rare and need to be invented. A great number of research tools in print formats still retain their values, at least for now.

Databases vs. print indexes How to find information about Kumārajīva in the Gaoseng zhuan ? Search by Jiumoluoshi ? Not enough! Try: Jiumoluoqipo, Shi, Shigong, Shishi, Tongshou, and Luoshi. ––– All can be found in Ryō kōsō den sakuin, compiled by Makita Tairyō. Databases are not necessarily better than print indexes.

Conclusion The computer still cannot match the book in the capability of presenting the full range of East Asian languages and cultures. Print resources are still necessary for most serious researches on East Asia. Its our job to make the value of our print collections known to the patrons.