Cupertino, CA, USA / September, 2000First ICU Developer Workshop1 ICU Low-level Utilities and Resource Management Vladimir Weinstein Globalization Center of Competency IBM Emerging Technology Center
Cupertino, CA, USA / September, 2000First ICU Developer Workshop2 Agenda What is the locale model in ICU? What is the string related interface in ICU and how does one use it? A quick summary of other low-level utility interface that provide overall support to ICU components? Quick summary of resource bundles support; What kind of data can be stored and retrieved natively? What’s the system locale format? What’s the future direction of this support? Migration from 1.4 model. What else can one use resource bundles for?
Cupertino, CA, USA / September, 2000First ICU Developer Workshop3 Introduction Applications have their own resources. If we want to globalize an application, we need to have a way to easily switch between different languages and customs. Translators should independently be able to customize an application for different markets. A way to uniquely identify the place of execution is also needed.
Cupertino, CA, USA / September, 2000First ICU Developer Workshop4 Locale Model in ICU Locale is synonymous with user community. In ICU locale is specified by a Locale object. Locale object is just an identifier, as opposed to the POSIX locale concept. Default locale is the locale user use for their machine. Applications should not change the system default locale. Locale can also be specified at run time, and users should be able to switch locales dynamically. Many functions of ICU are locale dependent.
Cupertino, CA, USA / September, 2000First ICU Developer Workshop5 Locale Naming Conventions “Language_Country_Variant”. Language is an ISO-639 identifier – en, es, fr. Country is an ISO-3166 identifier – ES, MX, US, FR. Variant is user defined – EURO, NY Example: en, en_IE, en_IE_EURO. The more parts locale name has, the more specific it is.
Cupertino, CA, USA / September, 2000First ICU Developer Workshop6 Example rootRoot locale | | | | | en de ja ruLanguage | | | | | | | | | | | | US IE DE AT JP RUCountry | | | EURO EURO EURO Variant
Cupertino, CA, USA / September, 2000First ICU Developer Workshop7 Resource Bundle Concept Resource bundle is a repository of data an application uses. They contain application data specific for different locales. All the items in a resource bundle can be accessed by the application. Resource bundles support is implemented in C (through UResourceBundle structure), while the C++ class (ResourceBundle) is only a thin wrapper.
Cupertino, CA, USA / September, 2000First ICU Developer Workshop8 Fallback Resource bundle should contain only the data specific for a locale resource bundle is used for. Root resource bundle contains all the data. It can contain the data that does not need to be localized. All locales descend from root and override information in root. Two types of fallback –Locale level –Resource level
Cupertino, CA, USA / September, 2000First ICU Developer Workshop9 Resource Bundles Support Several native data types: –Complex types: Tables Arrays –Scalar types: Strings Integers Binaries (imported files and hex strings)
Cupertino, CA, USA / September, 2000First ICU Developer Workshop10 Resource Bundle Format Resource bundles use their own format at the moment. On top level, there is a table: locale_name { … //data } XML format planned for future
Cupertino, CA, USA / September, 2000First ICU Developer Workshop11 Resource Bundle Format Tables: menu { file { name { "&File" } items { open { "&Open" } save { "&save" } exit { "&exit" } }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop12 Resource Bundle Format Arrays: errors { "Invalid Command", "Bad Value",... "Read the Manual" }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop13 Resource Bundle Format The most frequent type of data in a resource bundle is an array of Unicode characters (UChar). simplestring { "A string" } escapedstring { "This string has some unicode characters: \u0408\u0443\u043d\0438\u043a\u043e\u0434" } encodedstring { "This string has some encoded characters Јуникод" }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop14 Resource Bundle Format Integers: versionInfo { // a table major:int { 1 } // of integers minor:int { 4 } patch:int { 7 } }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop15 Resource Bundle Format Binaries: –Imported Files splash:import { "splash_root.gif" } –Typed Values pgpkey:bin { a1b2c3d4e5f67890 }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop16 Using Resource Bundles Initialization and disposal C++ ResourceBundle rb1((char *)0, Locale("root"), status); ResourceBundle rb2 = rb1.get("Countries", status); ResourceBundle rb3 = rb1.get(5, status); ResourceBundle rb4 = rb1.getNext(status); C UResourceBundle *rb1 = ures_open(NULL, "root", &status ); UResourceBundle *rb2 = ures_getByKey(rb1, "Countries", NULL, &status); ures_getByIndex(rb1, 5, rb2, &status); ures_getNextResource(rb1, rb2, &status); /*... */ ures_close(rb1); ures_close(rb2);
Cupertino, CA, USA / September, 2000First ICU Developer Workshop17 Using Resource Bundles Accessing individual strings: C++ UnicodeString s; UErrorCode status = U_ZERO_ERROR; s = rb.getStringEx(“LocaleString“, status); s = rb.getStringEx(5, status); s = rb.getString(status); // only working for string resources s = rb.getNextString(status); // iteration C UChar *s; UErrorCode status = U_ZERO_ERROR; s = ures_getStringByKey(rb, “LocaleString", &status ); s = ures_getStringByIndex(rb, 5, &status ); s = ures_getString(rb, &status); /* only with string resources */ s = ures_getNextString(rb, &status); /* iteration */
Cupertino, CA, USA / September, 2000First ICU Developer Workshop18 Using Resource Bundles Accessing resources within complex resources by key (can be used only on tables): C++ ResourceBundle rb2 = rb1.get("Countries", status); UnicodeString s = rb.getStringEx(“LocaleString“, status); C ures_getByKey(rb1, "Countries", NULL, &status); s = ures_getStringByKey(rb, “LocaleString", &status );
Cupertino, CA, USA / September, 2000First ICU Developer Workshop19 Using Resource Bundles Accessing resources within complex resources by index : C++ int32_t size = rb1.getSize(); for (int32_t i = 0; i<size; i++) { ResourceBundle rb3 = rb1.get(i, status); UnicodeString s = rb1.getStringEx(i, status); // if resource is a string } C int32_t size = ures_getSize(rb1); int32_t i = 0; for(i = 0; i<size; i++) { ures_getByIndex(rb1, 5, rb2, &status); s = ures_getStringByIndex(rb1, 5, &status ); /* if resource is a string */ }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop20 Using Resource Bundles Iterating over complex resources : C++ rb1.resetIterator(); while(rb1.hasNext(status)) { rb2 = rb1.getNext(); // or UnicodeString us=rb1.getNextString(status); } C ures_resetIterator(rb1); while(ures_hasNext(rb1)) { ures_getNextResource(rb1, rb2, &status); /* or uc = ures_getNextString(rb1, &status); */ }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop21 Using Resource Bundles Accessing other scalar types : C++ int32_t len = 0; rb2 = rb1.get("A_binary_resource", status); const uint8_t *binarydata = rb2.getBinary(len, status); C int32_t len = 0; const int8_t *binaryData = NULL; int32_t number; ures_getByKey(rb1, "A_binary_resource", rb2, &status); binaryData = ures_getBinary(rb2, &len, &status); ures_getByKey(rb1, "An_integer_resource", rb2, &status); number = ures_getInt(rb2, &status);
Cupertino, CA, USA / September, 2000First ICU Developer Workshop22 Preparing Resource Bundles During the build process, source Resource Bundles have to be compiled from the source format in binary format. Tool to use is genrb. Syntax: genrb [-s source_directory] [-d destination_directory] [-e encoding] [-v] [-V] [-h] genrb_source_file Example: genrb -s /home/weiv/dev/icu/data -d /home/weiv/icuinstall/data root.txt
Cupertino, CA, USA / September, 2000First ICU Developer Workshop23 Preparing Resource Bundles Encoding of Resource Bundle files: –invariant characters plus unicode values (ICU data files are stored like this, –UTF16-LE, UTF16-BE, UTF-8, provided that the BOM is written at the very beggining of the resource bundle file, –After ICU is built, resource bundles can use any encoding that ICU supports. Encoding must be specified during the build process Genrb compiles resource bundle from.txt format to.res format, which is already usable by ICU. Furthermore,.res files can be packed with other data files into memory mapped files or dlls.
Cupertino, CA, USA / September, 2000First ICU Developer Workshop24 Migration from 1.4 model const UnicodeString* get2dArrayItem(const char *resourceTag, int32_t rowIndex, int32_t columnIndex, UErrorCode& err) const; Equivalent code (Error checking intentionally ommited): int32_t row = 3, col = 4; ResourceBundle zonestrings = rb1.get("zoneStrings", status); ResourceBundle zone = zonestrings.get(row, status); UnicodeString data = zone.getStringEx(col, status);
Cupertino, CA, USA / September, 2000First ICU Developer Workshop25 Migration from 1.4 model const UnicodeString** get2dArray(const char *resourceTag, int32_t& rowCount, int32_t& columnCount, UErrorCode& err) const; Equivalent code (for rectangular array): int32_t zonesize = 0, zoneis = 0, j = 0, i = 0; UnicodeString** zones = NULL; ResourceBundle zonestrings = rb1.get("zoneStrings", status); zonesize = zonestrings.getSize(); zoneis = 0; zones = new UnicodeString*[zonesize]; for(i = 0; i<zonesize; i++) { ResourceBundle zone = zonestrings.get(i, status); zoneis = zone.getSize(); zones[i] = new UnicodeString[zoneis]; for(j = 0; j<zoneis; j++) { zones[i][j] = zone.getStringEx(j, status); }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop26 Migration from 1.4 model const UnicodeString* getTaggedArrayItem(const char *resourceTag, const UnicodeString& itemTag, UErrorCode& err) const; Equivalent code: char *item = "US"; ResourceBundle countries = rb1.get("Countries", status); UnicodeString countryUS = countries.getStringEx(item, status); void getTaggedArray(const char *resourceTag, UnicodeString*& itemTags, UnicodeString*& items, int32_t& numItems, UErrorCode& err) const; Equivalent code: ResourceBundle tagarray = rb1.get("Countries", status); int32_t tagsize = tagarray.getSize(); UnicodeString *items = new UnicodeString[tagsize]; const char ** itemTags = new const char*[tagsize]; const char *key = 0; int32_t i = 0; for(i = 0; i<tagsize; i++) { ResourceBundle tagitem = tagarray.get(i, status); items[i] = tagitem.getString(status); itemTags[i] = tagitem.getKey(); }
Cupertino, CA, USA / September, 2000First ICU Developer Workshop27 Resource Bundle Usage Rewrite array getters to use iteration Short complete example
Cupertino, CA, USA / September, 2000First ICU Developer Workshop28 Low-level Utility Interface Different data can be used by ICU applications using udata API. Purpose is creating portable and fast data access. Several ways of organizing data: –memory mapped files –DLLs Data should be written out according to portability rules
Cupertino, CA, USA / September, 2000First ICU Developer Workshop29 Low-level Utility Interface Data is accessed through the following APIs: –UDataMemory* udata_open (const char *path, const char *type, const char *name, UErrorCode *pErrorCode) –UDataMemory* udata_openChoice (const char *path, const char *type, const char *name, UDataMemoryIsAcceptable *isAcceptable, void *context, UErrorCode *pErrorCode) –void udata_close (UDataMemory *pData) –const void* udata_getMemory (UDataMemory *pData) –void udata_getInfo (UDataMemory *pData, UDataInfo *pInfo) –void udata_setCommonData (const void *data, UErrorCode *err)
Cupertino, CA, USA / September, 2000First ICU Developer Workshop30 Udata API usage Short complete example, writing out data and retrieving it.