18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management Software Product Line to Unicode 27 April 2001 Donald Ziff
18 th International Unicode Conference Documentum Proprietary 2 Agenda What is Documentum? Documentums I18N Problem How Unicode UTF-8 Saved the Day Other Success Factors Demo Documentum Proprietary and Confidential
18 th International Unicode Conference Documentum Proprietary 3 About Documentum Documentum: NASDAQ DCTM The Leader in Web and Enterprise Content Management Solutions > $128M in revenue > 800 employees. Over 900+ Global 2000 customers with strong vertical focus Over 25 Offices in 10+ countries
18 th International Unicode Conference Documentum Proprietary 4 DCTMs I18N Problem Everyone agrees: we need I18N to fuel growth – especially in Asia Asian-certified product much more important than multi-lingual –Although demand for multi-lingual is growing… So why not I18N?
18 th International Unicode Conference Documentum Proprietary 5 I18N Perception Problems Too Difficult – wont fit into a development cycle Too much Overhead – multiplies QA and Support Not Sexy – no new functionality Lets look at these problems…
18 th International Unicode Conference Documentum Proprietary 6 I18N is too difficult Product Layers: Server (built on RDBMS + Verity) DMCL: Client Library (C++) DFC: Foundation Classes (Java) DTC: Desktop Client – Win32 end-user client WDK: Web Development Kit RightSite: Legacy Web-Server Integration Web Publisher: Web Content Management App Legacy clients: Workspace (Win32), Intranet
18 th International Unicode Conference Documentum Proprietary 7 History Lesson Server v3.1.6.INT, created by consultants for Japanese market, was expensive and time-consuming –3.1.6.INT attempted to internationalize all the layers in the DCTM architecture at once 4.0 was released without I18N changes 4.1 followed, the deltas from to INT became hard to apply…
18 th International Unicode Conference Documentum Proprietary 8 I18N requires too much overhead The DCTM server requires pharmaceutical-strength certification Dimensions of certifications: –3 RDBMS platforms: Oracle, Sybase, SQL- Server –4 Server OSs: NT, Solaris, HPUX, AIX The INT architecture introduced new dimensions, leading us to…
18 th International Unicode Conference Documentum Proprietary 9 Certification Hell! New certification dimensions: –5 DCTM Server code-pages –5 RDBMS code-pages Market requires another dimension: –5 Server OS Localizations 125 new times 12 old 1500 certs! Exaggeration, of course… But still…
18 th International Unicode Conference Documentum Proprietary 10 I18N not sexy DCTM is a growth company, needs sizzle as well as steak I18N grows markets, but doesnt add much to marketing message To be fair: new functionality is not just sexy – it is essential to DCTMs continued survival Other priorities will move to the top…
18 th International Unicode Conference Documentum Proprietary 11 DCTMs I18N Requirements Crucial need: support Asia from the main code-line. One binary for the world Backward compatibility essential Multi-lingual features would be a side-benefit. High on the wish list for a few key customers I18N project must be scoped down to be achievable
18 th International Unicode Conference Documentum Proprietary 12 How UTF-8 Saved the Day UTF-8 moves safely through the server because anything that looks like ASCII actually is Standardizing on UTF-8 as the only supported internal code-page cuts down certification matrix
18 th International Unicode Conference Documentum Proprietary 13 Lessons from Double- Byte Experiments EUC-KR: 4.1 server works (basically) SJIS: problems! double-byte characters whose second bytes are ASCII: \ ` | Lessons: –Non-ASCII moves through the server safely –String handling need not be double-byte aware, if ASCII always means ASCII Solution: UTF-8!
18 th International Unicode Conference Documentum Proprietary 14 UTF-8: ASCII is ASCII No need for special string handling –Server INT replaced all standard c string handling with calls to 3 rd -party library –With UTF-8, we stick with standard – yacc and other legacy tools work fine Greatly improved perception (and reality) of how difficult I18N would be –Now, its relatively low-impact
18 th International Unicode Conference Documentum Proprietary 15 Its UTF-8, dummy! Use UTF-8 everywhere, cut down on certification dimensions Provides safe character-handling for Asia Even though multi-lingual is not a requirement Easier to support
18 th International Unicode Conference Documentum Proprietary 16 Other Success Factors Rely on RDBMS services to translate between RDBMS code-page and UTF-8 Market research cut back on OS localization constraints Transcoding infrastructure
18 th International Unicode Conference Documentum Proprietary 17 RDBMS transcodes to/from UTF-8 Oracle and Sybase transcode automatically – SQL Server is a problem No need for new transcoding calls between Server and RDBMS – lower impact Upgrade customers have non-unicode RDBMS – no need for them to convert One less certification dimension!
18 th International Unicode Conference Documentum Proprietary 18 Cut back on Localized OS certs Limit RDBMS for Asia – for 4.2, just Oracle Localized OS certification not necessary for Europe
18 th International Unicode Conference Documentum Proprietary 19 Transcoding Infrastructure Server must be aware of interface code-pages Transcoding done at the interfaces 3 rd party transcoding used: Uniscapes GlobalC
18 th International Unicode Conference Documentum Proprietary 20 New I18N Architecture RDBMS(Unicode) Verity File System e-Content Server (UTF8) ( UTF8) DMCL (4.2) DFC (Unicode) WDK (Unicode) Intranet ClientAdministratorWeb Publisher WorkSpace Custom WebApp ARP(NCS) Web Cache Rightsite(NCS) DMCL 4.1 (NCS) Desktop Client Unicode National Character Set Legend:
18 th International Unicode Conference Documentum Proprietary 21 Demo Demo – multilingual WDK If theres time, a quick look at localized Desktop Client (Win32 Client)
18 th International Unicode Conference Documentum Proprietary 22 Conclusion UTF-8 was a crucial technology in DCTMs I18N strategy: Provided an easy path for legacy C++ Supported specific Asian languages consistently, minimizing certifications Prepared infrastructure for multi-lingual requirements