Download presentation
Presentation is loading. Please wait.
Published byEric Bryant Modified over 9 years ago
1
Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 1 www.brianhitchcock.net
2
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 2 www.brianhitchcock.net CRM Unicode Conversion Three separate presentations – 1) The overall conversion process What we had, what we wanted, how to get there Issues that come up during conversion – 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem – 3) The actual conversion What really happened Issues that came up and how they were resolved Focus on DBA issues, not Siebel application
3
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 3 www.brianhitchcock.net How Did I Get Involved? Sleeping in a meeting… Heard someone say – “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped” Woke up, said – “I’ve done that before…” – See “Case of the Missing Kanji” Don’t wake up in meetings…
4
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 4 www.brianhitchcock.net What’s The Issue? Existing Siebel CRM system – Oracle 8.1.7.4 – Single-byte character set (WE8ISO8859P1) Interface systems – Multi-byte character set(s) (UTF8) – Handle data between single,multi-byte apps Want to convert to Unicode – Siebel, database, interfaces all should be UTF8 – Eliminate interface systems
5
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 5 www.brianhitchcock.net What we had Siebel CRM Oracle Db Custdb Apac Users Tcustdb Apac Custdb Emea Custdb Amer Tcustdb Emea Amer Emea Apac UTF8 WE8ISO8859P1 UTF8 WE8ISO8859P1 8859P1 Ordering System
6
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 6 www.brianhitchcock.net What we wanted Siebel CRM Oracle Db Custdb Apac Users Custdb Emea Custdb Amer Amer Emea Apac WE8ISO8859P1 UTF8 AL32UTF8 UTF8 Ordering System
7
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 7 www.brianhitchcock.net What We Wanted All data in one database – All languages – Unicode Eliminate interface systems – Reduce support costs Support increased CRM functionality – All data in one place – Supports new business functionality
8
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 8 www.brianhitchcock.net Would you like fries with that? Unicode conversion includes – Oracle db Convert to AL32UTF8 character set Required by Siebel for Unicode Upgrade to 9.2.0.4 Required to get AL32UTF8 character set – Remove Tcustdb databases Modify triggers that link source db to Tcustdb
9
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 9 www.brianhitchcock.net And A Shake? And, while you’re at it… – Application GUI Retrieve different data, multi-byte, local language – Clients Upgrade to Oracle 9.2.0.4 (SQL*Plus) Lots of changes all at once – Testing – How to know impact of each change?
10
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 10 www.brianhitchcock.net Converting to Unicode It’s easy – right? – Siebel CRM make some configuration changes – Oracle database Export from single-byte database Import into new db created with UTF8 char set – Testing – Done This is the ‘management’ view
11
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 11 www.brianhitchcock.net What Is Unicode? International standard Collection of characters – Covers most of the world’s languages Chinese poetry? – All characters have unique byte-code Application developers – Support Unicode – No need to worry about specific languages
12
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 12 www.brianhitchcock.net You Make This Stuff Up! What follows can be found in – Oracle9i Database Globalization Support Guide – Release 2 (9.2) – Part Number A96529-01 Or, you can trust me… Character sets, Unicode – Consist of set of characters – Encoding of the characters to byte-codes
13
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 13 www.brianhitchcock.net Single Byte Encoding Schemes 7-bit encoding schemes – Single-byte 7-bit up to 128 characters – normally support just one language – US7ASCII 8-bit encoding schemes – Single-byte 8-bit up to 256 characters – often support a group of related languages – WE8ISO8859P1
14
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 14 www.brianhitchcock.net 8859P1 Character set Oracle Character Set WE8ISO8859P1Hex 0x41 is A
15
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 15 www.brianhitchcock.net Multi-byte Encoding Schemes Fixed-width – each character occupies a fixed number of bytes – Faster text processing – AL16UTF8 Variable-width – one or more bytes to represent a single character – Saves disk space (typically lots of disk space) – UTF8, AL32UTF8 Shift-sensitive variable-width – use control codes to differentiate single-byte multi-byte characters with the same code values
16
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 16 www.brianhitchcock.net UTF8 Byte Storage Different characters occupy 1, 2, 3 or 4 bytes
17
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 17 www.brianhitchcock.net AL32UTF8 UTF8 – Supports Unicode 3.0 since 8.1.7.4 – Up to 3 bytes per character – Supplemental characters Pairs of 3 byte character codes AL32UTF8 – Supports Unicode 3.1 (latest version?), since 9i – Up to 4 bytes per character Supplemental characters
18
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 18 www.brianhitchcock.net Confused? Unicode, a set of characters Character set, encoded set of characters Encoding scheme, UTF-8, ISO standard for variable width encoding of Unicode character set UTF8, Oracle implementation of UTF-8 If you’re not confused, you aren’t paying attention!
19
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 19 www.brianhitchcock.net Changing Character Set You can simply alter the database (right?) Only works if – new character set is strict superset of existing character set – For all characters in existing character set All exist in new character set All have exact same code in new character set Example – WE8MSWIN1252 (superset, includes euro) – WE8ISO8859P (subset)
20
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 20 www.brianhitchcock.net Complexities Even for the same character – Different encoding in different character set Example – Latin (Western European) character á – E1 in WE8ISO8859P1 – C391 in UTF8 If existing character not in new char set – ? (replacement character) displayed
21
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 21 www.brianhitchcock.net Cure Create new database – Using new character set Extract data from old database Insert data into new database Export/import is most often used – Could use other methods Extract data to flat files SQL*Loader
22
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 22 www.brianhitchcock.net Database Conversion Serial – Upgrade source, export, drop schemas, import Parallel – Create target – Export source – Import to target Chose Parallel – Source still available after target in use User tablespace issue for example
23
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 23 www.brianhitchcock.net Impact of Unicode Table columns must be widened Existing column – Holds up to 20 Latin characters – WE8ISO8859P1, each Latin character 1 byte – VARCHAR2(20) New column – UTF8 – Each Latin character occupies 2 bytes – Need VARCHAR2(40)
24
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 24 www.brianhitchcock.net Impact of Unicode Worst case – UTF8 can have up to 4 bytes per character – For all existing character columns – Need to expand by 4x Disk space – CHAR – 4x disk space – VARCHAR2 – 1x to 4x Depends on specific characters inserted
25
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 25 www.brianhitchcock.net Impact of Unicode Tables – Columns must be wider – Each character can be up to 4 bytes Triggers, PL/SQL code – Modify to handle multi-byte data End-user front-end (browser) – Reconfigure to Display multi-byte data, accept multi-byte data All app components must handle Unicode
26
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 26 www.brianhitchcock.net User Impact VARCHAR2, AL32UTF8 – 4000 byte limit How many characters can I enter? – Latin, 2000 – Japanese, 4000/3 If moving from Japanese character set 2 bytes per character Max characters reduced by 1/3 – Supplemental characters, 1000 Characters like ‘treble clef’
27
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 27 www.brianhitchcock.net Disk Space How much multi-byte data do you have? – We found all of ours – Typically, 5-10% – See 2) Multi-byte data in the existing CRM db Compute disk space requirement – If you have 5% multi-byte character data – Need maximum of 20% more disk space Will you add more multi-byte data? – Once you have converted to Unicode…
28
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 28 www.brianhitchcock.net Expanding Columns Need to expand lots of columns – Individual SQL statements – Lots of SQL to generate How to make Oracle do this for us? – Export existing database – New database has init.ora parameter NLS_LENGTH_SEMANTICS = CHAR – Import into new database All character columns widened as tables created VARCHAR(10) becomes VARCHAR(40)
29
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 29 www.brianhitchcock.net Character Semantics – 9i Change column data types – VARCHAR2(10 byte) – VARCAHR2(10 char) – Requires SQL statement for each column NLS_LENGTH_SEMANTICS – Init.ora parameter – What happens if init.ora changed? – BYTE or CHAR – All character columns created with byte or char – Handles PL/SQL code as well
30
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 30 www.brianhitchcock.net The Siebel Process Create target database Shutdown app Upgrade Oracle client Source db character set Run migrate.sh script Full export source Import to target db Modify target db
31
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 31 www.brianhitchcock.net Create target database Oracle 9.2.0.4 Character set AL32UTF8 Character semantics CHAR Tablespace names same as source db – 15% more space than source db Locally managed, uniform 130k Auto UNDO, tablespace
32
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 32 www.brianhitchcock.net Shutdown app Shutdown various app servers Shutdown source db Cold backup Upgrade source db to 9.2.0.4 – Migrate 8.1.7.4 to 9.2.0.4
33
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 33 www.brianhitchcock.net Upgrade Oracle client Upgrade Oracle client software to 9.2.0.4 – For all machines that have SQL*Plus – Upgrade to 9.2.0.4 – Install 9.2.0.4 Client install only – Tar up 9.2.0.4 client ORACLE_HOME – ftp, untar on machines that need SQL*Plus
34
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 34 www.brianhitchcock.net Source db character set Fix any user tablespace issues – Import won’t fix them for you Change source db character set – WE8MSWIN1252 Siebel requirement Contains euro symbol Is a strict superset of WE8ISO8859P1
35
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 35 www.brianhitchcock.net Run migrate.sh script Siebel supplied script – Generates various scripts Expand.ksh Widen columns for Unicode Impexp06.ksh Import individual tables for large dbs We use full export/import instead Run sun_expand.sql – Widen columns in tables outside Siebel schemas
36
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 36 www.brianhitchcock.net Export Source, Import Target Full export of source db – Source db is now 9.2.0.4 NLS_LANG AMERICAN_AMERICA.AL32UTF8 Import into target db – Target db created as 9.2.0.4 NLS_LANG AMERICAN_AMERICA.AL32UTF8
37
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 37 www.brianhitchcock.net The conversion setup Source Db Target Db export import Source Db WE8ISO8859P1 WE8MSWIN1252 AL32UTF8
38
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 38 www.brianhitchcock.net Modify target db Run impexp06.ksh – Handles sequences etc. Run check_schema.sql – Find columns that didn’t get widened Various changes on Siebel App side Verify db links to Custdb databases
39
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 39 www.brianhitchcock.net Conversion Complete? Siebel process is done Fix any data issues – Multi-byte character data in source db – Convert properly to AL32UTF8 Testing Unicode changes – GUI changes – Performance Unicode processing Users accessing different data
40
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 40 www.brianhitchcock.net Multi-byte Data In Source Db? Source db is WE8ISO8859P1 – Single-byte character set – Doesn’t support multi-byte characters That’s the official story The reality is somewhat different What, if any multi-byte data is in source db? – How to determine correct character set? – How to find, how to fix? – Japanese, Russian, others?
41
DCSIT Technical Services DBA Brian Hitchcock September 15, 2004Page 41 www.brianhitchcock.net CRM Unicode Conversion Three separate presentations – 1) The overall conversion process What we had, what we wanted, how to get there Issues that come up during conversion – 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem – 3) The actual conversion What really happened Issues that came up and how they were resolved Focus on DBA issues, not Siebel application
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.