The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company chinadotcom corporation ©Copyright 2000 Unicode, Inc. and DAE Interactive Marketing, Inc.
17th International Unicode Conference San Jose, CA September
3 A Monolingual Chinese Web Site: Benefits from Unicode! We can add more language versions in the future. We can mix platforms and applications, each with its own language preferences.
17th International Unicode Conference San Jose, CA September Mixing and Matching We built the site on an English NT Server to facilitate software/hardware maintenance by the clients English- speaking engineers. Software tools used different language, locale, and sorting algorithms, e.g., Microsoft Internet Information Server, Microsoft Index Server, Site Server and Commerce Server.
17th International Unicode Conference San Jose, CA September Part I -- Setting Defaults Installation: Unicode was not an option for a default character encoding or sorting order of any system or application we used. ISO , the default encoding, was the best compromise: –It matches Unicode within its range. –It will be mapped to Unicode on NT4 for internal processing. –Data Transformation Services (DTS) on the system will be predictable -- default to either ISO for non-Unicode or to Unicode for Unicode data.
17th International Unicode Conference San Jose, CA September Data Table Settings We stored all Chinese data in SQL Server 7 We set Chinese data table columns to nchar, nvarchar, or ntext -- the Unicode data types. We created matching index keys for English and Chinese.
17th International Unicode Conference San Jose, CA September Data Indexing Index table lookups by English for programming ease and quality checking Index full text fields to search in Unicode with a binary or a Chinese sort order.
17th International Unicode Conference San Jose, CA September ASP Form Options Forms with pull-down menus were programmed to show the user Chinese but to set query values in English to match the English indexes:
17th International Unicode Conference San Jose, CA September ASP Menu Display
17th International Unicode Conference San Jose, CA September Part II Cataloging and Indexing Text and Data
17th International Unicode Conference San Jose, CA September Index and Search Tools IIS: Internet Information Server and Microsoft Search Services are use the default encoding and sort order as installed for all full-text file searches. It is not possible to index, sort, or search two languages on one server without Unicode! T-SQL (Transact SQL) within SQL Server index var andvarchar columns according to default (installation) language/encoding settings. Site Server 3.0s Search Catalog feature builds upon the above. Therefore, we built parallel (linked) indexes in English and Chinese for all keywords and specified Chinese sort order within full-text searches of explicitly defined Unicode ntext fields. Result: Because we stored Chinese in Unicode, we can submit a query in English (or Chinese) and get results back in Chinese, sorted in Chinese sort order.
17th International Unicode Conference San Jose, CA September Part III Since NT 4.0 does everything internally in Unicode, then we might expect to have a plug and play solution. We can turn on Autotranslation so that the client requests will be translated to Unicode for processing and translated back for display to the client. However, it will become a plug and pray solution unless we know exactly what is happening at every stage in the process. Data Translation between the Web Page and the SQL Database (round trip).
17th International Unicode Conference San Jose, CA September Data Translation Issues-- continued This means discovering the clients browser settings and setting a session code page before autotranslating to or from Unicode. We can assume that Chinese in China will be using GB , or can switch to it themselves. But that is often not valid. SQL Server should store no more than one encoding (plus Unicode). MSDE (Microsoft Date Exchange Services), using ODBC, will translate client data to Unicode before processing a query. The Client, in this case, is IIS. The IIS (Web Server) forms must get input data from the client browser, unless it is a drop-down menu.
17th International Unicode Conference San Jose, CA September Data Translation Issues-- continued The web forms must know the character encoding used by the clients browser to accept data in the proper format, convert it to Unicode, process the Query, sort it according the the clients preferences, and display the result back in the encoding and locale of the clients browser. The more explicit you can make each stage, the more pleasing the results. Any unknowns will eventually produce unexpected results. Conclusion: Use Unicode data types with one other default locale; Use, but do not depend upon, plug and play automatic data transformation services to handle the necessary conversions.
17th International Unicode Conference San Jose, CA September Conclusions from the Creation of Yaolan.com The most obvious value of Unicode on the WWW is confirmed: display multiple languages at once and easily add new languages to old web sites without reprogramming everything. If your web solution can be 100% Unicode compliant and Globalized (using I18N rules) from end-to-end, then you could simply translate, localize, and publish a new language version. In todays world, when we cannot be 100% Global, Unicode will help to integrate your back-end tools and services to ensure that your translated, localized content will not fail.