Download presentation
Presentation is loading. Please wait.
Published byStephany Cobble Modified over 9 years ago
1
26th Internationalization and Unicode ConferenceSan José, CA, September 2004 ICU Overview The Open-Source Unicode Library, v3.0 Markus Scherer ICU Manager IBM Globalization Center of Competency
2
ICU Overview: The Open-Source Unicode Library, v3.0 2 26th Internationalization and Unicode Conference San José, CA, September 2004 Agenda Background What is ICU? Architecture Overview ICU Features and recent additions References Q and A
3
ICU Overview: The Open-Source Unicode Library, v3.0 3 26th Internationalization and Unicode Conference San José, CA, September 2004 Why Globalization?
4
ICU Overview: The Open-Source Unicode Library, v3.0 4 26th Internationalization and Unicode Conference San José, CA, September 2004 Unicode All world languages Efficient and effective processing Lossless data exchange Enables single-binary global software But… all languages ⇒ large, complex standard –1,400 pages + Annexes + additional standards –90,000+ characters –Major update every 3 years –70 character properties, many multi-valued –Affects many processes: display, line-break, regex, …
5
ICU Overview: The Open-Source Unicode Library, v3.0 5 26th Internationalization and Unicode Conference San José, CA, September 2004 Locales Features vary widely across languages & countries –Sorting, line breaks, date/time/number/currency formatting, codepage conversion, … –Performance is key: easy to do the right thing; hard to do it fast
6
ICU Overview: The Open-Source Unicode Library, v3.0 6 26th Internationalization and Unicode Conference San José, CA, September 2004 What is ICU? Globalization / Unicode / Locales Mature, widely used set of C/C++ and Java libraries –Basis for Java 1.1 internationalization – but goes far beyond Very portable – identical results on all platforms / programming languages –C/C++: 30+ platforms/compilers –Java: IBM & Sun JDK Full threading model; customizable; modular Open source – but not viral ICU 3.0: 78 languages; 118 countries; 870 codepages
7
ICU Overview: The Open-Source Unicode Library, v3.0 7 26th Internationalization and Unicode Conference San José, CA, September 2004 Who uses ICU? Products Within IBM –PSD Print Architecture, DB2, COBOL, Host Access Client, InfoPrint Manager, Informix GLS version 4.0, iSeries, Lotus Notes, Lotus Extended Search, Lotus Workplace, MQ Integrator Endeavour, NUMA- Q, OTI, Pervasive Computing WECMS, SS&S Websphere Banking Solutions, Tivoli Presentation Services, WBI Adapter/ Connect/Modeler and Monitor/ Solution Technology Development/WBI-Financial TePI, Websphere Application Server/ Studio Workload Simulator/Transcoding Publisher, XML Parser Other Companies and Organizations –Adobe, Apple (Mac OS X), Avaya, BEA, BroadJump, Business Objects, caris, CERN, Cognos, Debian, Gentoo, HP, Inktomi, JD Edwards, Jikes, Macromedia, Mathworks, Mozilla, NCR, OpenOffice, Parrot, PayPal, Python, QNX, Rogue Wave, SAP, Siebel, SIL, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Virage, webMethods, Wine, Leica Geosystems GIS & Mapping, LLC.
8
ICU Overview: The Open-Source Unicode Library, v3.0 8 26th Internationalization and Unicode Conference San José, CA, September 2004 ICU Features Unicode text handling Charset conversions (870+) Collation & Searching Locales (170+) Resource Bundles Calendar & Time zones Complex-text layout engine Unicode Regular Expressions Breaks: word, line, … Formatting –Date & time –Messages –Numbers & currencies Transforms –Normalization –Casing –Transliterations
9
ICU Overview: The Open-Source Unicode Library, v3.0 9 26th Internationalization and Unicode Conference San José, CA, September 2004 Architecture Overview 1 Locale Based Services –Locale is an identifier, not a container –Keywords for variants: de@collation=phonebook Resource inheritance: shared resources root en USIE de DECH zh HantHans TWCNTWCN Language Script Country
10
ICU Overview: The Open-Source Unicode Library, v3.0 10 26th Internationalization and Unicode Conference San José, CA, September 2004 Architecture Overview 2 Open and Close Service Model –Better performance by avoiding setup costs per operation –Warning: use properly for maximum performace ICU Threading Model –Multiple versions in use simultaneously –Large resources shared in read-only cache
11
ICU Overview: The Open-Source Unicode Library, v3.0 11 26th Internationalization and Unicode Conference San José, CA, September 2004 Architecture Overview 3 Data Driven Services –Customize at build-time or run-time –Interchange with other platforms; same results on each –Rule-based Collation, Word-breaks, Transforms –Pattern-based Formats, UnicodeSet –Table-based Character Conversion
12
ICU Overview: The Open-Source Unicode Library, v3.0 12 26th Internationalization and Unicode Conference San José, CA, September 2004 Architecture Overview – ICU4C Simple Error Handling –C++ subset for portability –Support for multi-threaded environment Version Management –Multiple versions at the same time –Data and library versioning String Buffer Management –Preflighting and overflow protection Misc: Load/Unload ICU Recent Additions: – Runtime-settable memory allocation and mutex functions
13
ICU Overview: The Open-Source Unicode Library, v3.0 13 26th Internationalization and Unicode Conference San José, CA, September 2004 Architecture Overview – ICU4J Supplement for Java Core globalization (no char. conversion, no GUI components) –We do supply complex text support for Sun Modularized: products may add just needed functionality
14
ICU Overview: The Open-Source Unicode Library, v3.0 14 26th Internationalization and Unicode Conference San José, CA, September 2004 ICU4J vs. JDK CLDR 1.1 (Common Locale Data Repository) Up-to-date globalization: standards-compliant; latest Unicode –Supplementary character (GB 18030, JIS X 213, HKSCS) –Full properties – JDK has only a fraction –Local calendars (Thailand, Japan,…); ISO dates –Currencies, String Search, Int’l Domain Names –Transforms: Case, Scripts, Normalization Much faster turn-around on bug-fixes, enhancements
15
ICU Overview: The Open-Source Unicode Library, v3.0 15 26th Internationalization and Unicode Conference San José, CA, September 2004 Unicode Text Handling C –UChar*: null-terminated or with length C++ –UnicodeString: full featured string class Java –Uses normal JDK String, adds utilities All handle supplementary characters –Required for GB 18030/JIS X 0213/HKSCS repertoires
16
ICU Overview: The Open-Source Unicode Library, v3.0 16 26th Internationalization and Unicode Conference San José, CA, September 2004 Unicode Text Handling 2 All Unicode 4.0 properties –Direct API Values, names, enumerations –UnicodeSet Fast, compact set operations Pattern-based (both Perl & POSIX syntax for properties) – \p{greek} vs. [:greek:] All properties: – [\p{lowercase}-[a-z]] – [\p{greek} & \p{uppercase}]
17
ICU Overview: The Open-Source Unicode Library, v3.0 17 26th Internationalization and Unicode Conference San José, CA, September 2004 Data: Recent Additions Conforms to CLDR 1.1 –50% more data than CLDR 1.0: adding many translated terms for languages, scripts, countries, currencies, and time zones. –improved collation for Eastern Europe, Chinese pinyin Reduced multiplatform install image size Improved XLIFF-ICU conversion tools Locale canonicalization spec defined and implemented (C+J) –Provides interoperability with POSIX and.NET locale IDs, more RFC 3066 support
18
ICU Overview: The Open-Source Unicode Library, v3.0 18 26th Internationalization and Unicode Conference San José, CA, September 2004 Character Set Conversion Precise alias information: –When you ask for “SJIS”, you can request the precise definition by platform: windows, ibm, solaris,… Buffer management –automatically handles characters that cross buffers Customizations allowed for: –illegal sequences –undefined characters Unicode Text Compression – SCSU, BOCU
19
ICU Overview: The Open-Source Unicode Library, v3.0 19 26th Internationalization and Unicode Conference San José, CA, September 2004 Collation and Searching Fast international comparison and string search; fully UCA compliant –Compressed sort keys, optimized string comparison, sublinear string search –incremental sortkeys for radix-sort Precise binary sortkey stability over time Fully data driven API / rule customizations –strength, normalization, upper vs. lowercase first, ignore punctuation, …
20
ICU Overview: The Open-Source Unicode Library, v3.0 20 26th Internationalization and Unicode Conference San José, CA, September 2004 Collation and Searching: Recent Additions Numeric sorting: sequences of digits can be sorted numerically instead of alphabetically –e.g., filenames would sort "ab-2" < "ab-10" –without material performance cost –with reduced sortkey length. Significantly improved sorting orders for many other languages Data in separate tree, for easier modularization and maintenance getFunctionalEquivalent API allows for better caching and UI support.
21
ICU Overview: The Open-Source Unicode Library, v3.0 21 26th Internationalization and Unicode Conference San José, CA, September 2004 Calendar & Time Zones International Calendars – Arabic, Buddhist, Hebrew, Japanese –Required for correct presentation of dates in some countries Olson timezone support, with localizations Recent Additions: –RFC822 time zone format support in DateFormat (C+J) for compatibility.
22
ICU Overview: The Open-Source Unicode Library, v3.0 22 26th Internationalization and Unicode Conference San José, CA, September 2004 Formatting Date & time: 8 formats per locale Messages –Completely localizable, Plural support Numbers & currencies –Scientific Notation, Spelled-out (checks, etc.) –Full Orthogonal Currency support INRIn Hindi: INRIn English:Rs. 1,234.57 INRIn German:Rs. 1.234,57 Recent Additions –POSIX migration library –Allows parsing multiple currencies with one formatter –Short and stand-alone month/day names
23
ICU Overview: The Open-Source Unicode Library, v3.0 23 26th Internationalization and Unicode Conference San José, CA, September 2004 Transforms Unicode Normalization –Highly optimized for performance –performance utilities: concatenation, detection, comparison Casing (upper, lower, title, folding) General Transforms –Script transliterations –Half-width/Full-width, Hex, etc. –Chain transforms together, filter source characters –Rule-based, customizable at runtime. IDNA: International Domain Names
24
ICU Overview: The Open-Source Unicode Library, v3.0 24 26th Internationalization and Unicode Conference San José, CA, September 2004 Segmentation: word, line & sentence Fast state-table implementation Customizable –Rule-based – customizable at runtime –Special customizations, e.g. Thai Recent Additions: – Greatly improved performance when going backwards (common case when doing line break) –Java The rules syntax has been extended. Rules can now return information about the types of characters they encountered. Common compiled (binary) rule format with ICU4C
25
ICU Overview: The Open-Source Unicode Library, v3.0 25 26th Internationalization and Unicode Conference San José, CA, September 2004 Unicode Regular Expressions Full Regex Implementation –C only: Java 1.4 has own package (though not as powerful) All Unicode 4.0 Properties –supported through UnicodeSet Good performance –competitive with non-Unicode regex Recent Additions –Now features a C API, instead of just C++.
26
ICU Overview: The Open-Source Unicode Library, v3.0 26 26th Internationalization and Unicode Conference San José, CA, September 2004 Complex-text layout engine Glyph processing, positioning & adjustment –ligature substitution, contextual forms, kerning, accent placement, Bidi scripts, etc. Support for: –Drawing –Caret Display –Hit Testing –Selection Highlighting –Caret Movement –Layout Metrics –Line Break ICU 3.0: Canonical Equivalence: a + ´ or á
27
ICU Overview: The Open-Source Unicode Library, v3.0 27 26th Internationalization and Unicode Conference San José, CA, September 2004 References ICU main site: –http://oss.software.ibm.com/icu/http://oss.software.ibm.com/icu/ –Links to Download ICU User Guide, Technical FAQ, Support, Bug Reports Unicode Consortium –http://www.unicode.orghttp://www.unicode.org Unicode glossary, Unicode character database
28
ICU Overview: The Open-Source Unicode Library, v3.0 28 26th Internationalization and Unicode Conference San José, CA, September 2004 Questions and Answers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.