Presentation is loading. Please wait.

Presentation is loading. Please wait.

Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems DCSIT Technical.

Similar presentations


Presentation on theme: "Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems DCSIT Technical."— Presentation transcript:

1 Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 1 www.brianhitchcock.net

2 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 2 www.brianhitchcock.net CRM Unicode Conversion  Three separate presentations – 1) The overall conversion process  What we had, what we wanted, how to get there  Issues that come up during conversion – 2) Multi-byte data in the existing CRM db  What’s the issue, how did it happen  A general method to find and fix this problem – 3) The actual conversion  What really happened  Issues that came up and how they were resolved  Focus on DBA issues, not Siebel application

3 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 3 www.brianhitchcock.net How Did I Get Involved?  Sleeping in a meeting…  Heard someone say – “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped”  Woke up, said – “I’ve done that before…” – See “Case of the Missing Kanji”  Don’t wake up in meetings…

4 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 4 www.brianhitchcock.net What’s The Issue?  Existing Siebel CRM system – Oracle 8.1.7.4 – Single-byte character set (WE8ISO8859P1)  Interface systems – Multi-byte character set(s) (UTF8) – Handle data between single,multi-byte apps  Want to convert to Unicode – Siebel, database, interfaces all should be UTF8 – Eliminate interface systems

5 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 5 www.brianhitchcock.net What We Had Siebel CRM Oracle Db Custdb Apac Users Tcustdb Apac Custdb Emea Custdb Amer Tcustdb Emea Amer Emea Apac UTF8 WE8ISO8859P1 UTF8 WE8ISO8859P1 8859P1 Ordering System

6 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 6 www.brianhitchcock.net What We Wanted Siebel CRM Oracle Db Custdb Apac Users Custdb Emea Custdb Amer Amer Emea Apac WE8ISO8859P1 UTF8 AL32UTF8 UTF8 Ordering System

7 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 7 www.brianhitchcock.net What We Wanted  All data in one database – All languages – Unicode  Eliminate interface systems – Reduce support costs  Support increased CRM functionality – All data in one place – Supports new business functionality

8 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 8 www.brianhitchcock.net Multi-byte Data In Source Db?  Source db is WE8ISO8859P1 – Single-byte character set – Doesn’t support multi-byte characters  That’s the official story  The reality is somewhat different  What, if any multi-byte data is in source db? – How to determine correct character set? – How to find, how to fix? – Japanese, Chinese, others?

9 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 9 www.brianhitchcock.net But Wait, There’s More…  Not just multi-byte data to look for  Non-p1 character data also – Non multi-byte character data – Could be WE P1 (western European)  German, Italian, French etc. – Could be WE Pn  Polish, Greek, Russian etc.  How to find?

10 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 10 www.brianhitchcock.net How Polish Was Handled  Use separate app that sends polish (P2) to CRM database  Stored in P1 db  Triggers move this polish data to TWCD  Triggers in TWCD – Know that it’s polish (P2) – Convert to UTF8 and send to WCD db  Therefore, multiple languages in Siebel P1 db

11 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 11 www.brianhitchcock.net What’s the Problem?  Character data from multiple languages – Stored in oracle db – Db configured for P1  P1 supports multiple WE languages  Does not support polish, Russian, etc.  Need to find all such character data  Non-p1 can be – Single-byte (polish, Russian, etc.) – Multi-byte (Japanese, Chinese, etc.)

12 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 12 www.brianhitchcock.net Single-byte Character Sets  All Pn (8859-1, 8859-2, etc.) character sets – Share same range of byte codes, 0 to 255 – Above 0xA1 (decimal 161)  Same byte codes represent different characters  Example – WE8ISO8859P1 (8859-1)  Byte code 0xA3 (decimal 163) is character £ – EE8ISO8859P2 (8859-2)  Same byte code, 0xA3 is character Ł

13 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 13 www.brianhitchcock.net Finding Non-p1 Char Data?  Logically – Examine db design, Siebel docs, figure out which tables designed to store language specific (local language) data – Some column (country code) in these tables to tell you which country data is from – Determine correct character set for data from each country – Convert these tables manually to AL32UTF8 as part of overall Unicode conversion process

14 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 14 www.brianhitchcock.net Not Good  Want general method – No need to analyze the meaning of existing data – Need automated way to find all non-P1 char data  Can’t do it – No general way to determine if char data is P1 or P2 or Pn  As shown before, byte code 0xa3 (decimal 163) ­Character £ in P1 ­Character Ł in P2

15 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 15 www.brianhitchcock.net Good  But, can find non-ASCII data in general – And then find multi-byte character data  Use separate approach to find non-P1  Use PL/SQL code – Examine every table – Examine every column that holds character data – Determine which rows if any are ASCII – Rows that aren’t ASCII are ‘suspect’ – Identify tables that have any non-ASCII character data

16 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 16 www.brianhitchcock.net Why Look For ASCII?  Character data that is ASCII – Only 7 bits used to encode character – 8 th bit of every byte is 0 – For non-ASCII, 8 th byte is set  WE8ISO8859Pn  Multi-byte, Japanese, Chinese, etc.  By eliminating all tables that are ASCII – No need to ask are they P1, P2, Pn or multi-byte – Greatly reduces the task

17 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 17 www.brianhitchcock.net How To Find Non-ASCII?  Use SQL function convert – Convert a given column to ASCII character set – Compare resulting string with original – If original string is all ASCII  Will match converted string – If not a match  Column value is non-ASCII ­Could be WE8ISO8859Pn ­Could be multi-byte

18 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 18 www.brianhitchcock.net Example Finding Non-ASCII  in WE8ISO8859P1 database create table Psycho_Acircle (text VARCHAR2(100)); insert into Psycho_Acircle values (chr(197)||'BCDE'); insert into Psycho_Acircle values ('ABCDE'); select * from Psycho_Acircle; TEXT ----- Å BCDE ABCDE select convert(text,'US7ASCII','WE8ISO8859P1') from Psycho_Acircle; CONVERT(TEXT,'US7ASCII','WE8ISO8859P1') --------------------------------------- ?BCDE ABCDE ÅBCDE is not the same as ?BCDE

19 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 19 www.brianhitchcock.net Not Included  Did not scan – LONG datatype columns – CLOB datatype columns  Didn’t have any in schema – PL/SQL code in database  Dev team determined this wasn’t needed

20 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 20 www.brianhitchcock.net Scripts Strategy  Eliminate as much as possible – Identify all ASCII only tables – Left with set of non-ASCII tables  For remaining tables – Find likely Japanese character data – Verify it is Japanese – Copy to separate table – Remove from non-ASCII tables  Repeat for other languages – How to identify byte patterns for each language?

21 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 21 www.brianhitchcock.net PL/SQL scripts  Scripts used – Scan_Table_1_Gen_Column_Info.sql – Scan_Table_2_Gen_Nonascii_rows_Info.sql – Scan_Table_3_Gen_NonasciiTables_NoLong.sql – Scan_Table_4_Gen_NonasciiTables_NonasciiCols_Only.sql – Scan_Table_5_Gen_NonasciiTables_YesLong.sql – Scan_Table_6_Gen_NA_EUCJP_info_sql_col_info.sql – Scan_Table_7_Gen_NA_EUCJP_Tables.sql – Scan_Table_8_Gen_NA_EUCJP_2_rows_info.sql

22 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 22 www.brianhitchcock.net Scripts  Each script generates table(s) – Output of each script stored in table(s)  Next script uses tables  Lots of intermediate data stored – Helped develop scripts – Each script simpler – Provided extra output for developers, analysts to help them verify results  Is this data really Japanese?

23 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 23 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_1_Gen_Column_Info.sql – Scans all tables in a schema – Creates two tables  Table_Gen_Info ­Info on all tables  Table_Column_Info ­Info on character columns ­Which contain any non-ASCII strings ­Doesn’t include LONG columns ­Can’t use SQL functions on LONG datatype

24 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 24 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_2_Gen_Nonascii_rows_Info.sql – Use table Table_Column_Info – Examine tables with non-ASCII character data – Creates two tables  Table_NonAscii_info ­Number of rows, columns with non-ASCII data  Table_NonAscii_SQL ­SQL to extract non-ASCII data from each table ­Useful for developers, analysts to extract data from other environments

25 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 25 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_3_Gen_NonasciiTables_NoLong.sql – Use tables table_gen_info, table_nonascii_sql – Create copies of tables that have non-ASCII data – Copies contain only the non-ASCII rows  Have all character columns of original table  Helps identify which country data is from – Creates tables as select * from  Doesn’t work on tables with LONG column  Tables named NONASCII_

26 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 26 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_4_Gen_NonasciiTables_Nonasci iCols_Only.sql – Similar to third (previous) script – Table copies only contain columns that have non-ASCII data – Does handle tables with LONG column – Creates tables of form NA_CO_  Set of tables containing all non-ASCII data in the schema

27 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 27 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_5_Gen_NonasciiTables_YesLon g.sql – Creates copies of tables having non-ASCII data – Copy tables have all char columns of base table – Only copies tables that have LONG column – Companion to third script  Deals with tables that have LONG column  Tables named NONASCII_ – Now have complete set of tables  Have all non-ASCII char columns of base tables

28 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 28 www.brianhitchcock.net Katakana, Hiragana?  How to find Japanese character data? – Look at hex dump of character data and see lots of ¥_¥ and ¤_¤ – The byte code of ¥ is A4, ¤ is A5 – Many Japanese transliterated terms (company names) start with these bytes – Typical of EUCJP character set – Find rows that contain '%¥_¥%' or '%¤_¤%‘ – repeated ¥ or ¤ means EUCJP more likely – Verify that these rows are indeed Japanese

29 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 29 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_6_Gen_NA_EUCJP_info_sql_col _info.sql – For table copies with non-ASCII columns only – Look for specific pattern of '%¥_¥%' – Or '%¤_¤%‘ – Creates tables  Table_NA_EUCJP_Info  Table_NA_EUCJP_SQL  Table_NA_EUCJP_COL_INFO

30 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 30 www.brianhitchcock.net 6 th Script  What does each table contain? – Table_NA_EUCJP_Info  Number of EUCJP rows in each non-ASCII table – Table_NA_EUCJP_SQL  SQL to extract EUCJP rows – Table_NA_EUCJP_COL_INFO  Number of EUCJP rows in each column

31 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 31 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_7_Gen_NA_EUCJP_Tables.sql – Create two copies of each table that has EUCJP  Contain rows that have EUCJP  First table, all char columns  Second, only EUCJP columns – Tables created have names  EUCJP_  ECUJP_CO_

32 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 32 www.brianhitchcock.net After 7 th Script  We have identified EUCJP rows – In non-ASCII tables – Copied these rows to separate tables  Delete these rows from the non-ASCII tables  As we identify rows from a specific char set – Remove them from the non-ASCII tables – Smaller and smaller set of unknown rows

33 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 33 www.brianhitchcock.net What Does Each Script Do?  Scan_Table_8_Gen_NA_EUCJP_2_rows_inf o.sql – Find rows containing ¥ or ¤ – Could be Japanese – Could be WE

34 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 34 www.brianhitchcock.net Results  For each script – Time to run – Output – %of total db that is non-ASCII – Demonstrates power of this approach – No attempt to speed up  Only need to scan once, no need for speed – Copy prod data to separate environment – Run scripts there, develop the SQL to correctly convert the non-ASCII data as needed  Apply to prod as part of Unicode conversion

35 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 35 www.brianhitchcock.net Results  Scripts run against copy of production db  Database – 25Gb total, but 13Gb free space – 12Gb of actual data to scan – (be skeptical when people tell you they support multi-terabyte dbs, size of actual data counts)  Scripts create tables in the same schema they run in

36 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 36 www.brianhitchcock.net Results  Script 1 – 2hours – Scanned 12Gb of data – 2483 tables, 63138 columns – Created two tables  Table_gen_info  Table_column_info

37 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 37 www.brianhitchcock.net 1 st Script Results SQL> select * from Table_Gen_Info where rownum <=10; TABLENAME NUMROWS NUMCOLS NUMCHARCOLS NUMCLOBCOLS NUMLONGCOLS ------------------------------ ---------- ---------- ----------- ----------- ----------- ACCNT_STAT 15775 5 3 0 0 AMER_AR_OWNER 1085497 7 6 0 0 AMER_AR_T 1060 3 2 0 0 APAC_AR_OWNER 2770 6 6 0 0 AR_ADMIN 5578 35 31 0 0 AR_CON 3573 22 17 0 0 AR_STAT 88652 7 5 0 0 AUDIT_TABLE 53301 29 26 0 0 CONT_CREATED 515126 2 2 0 0 CON_CREATED 184744 2 2 0 0

38 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 38 www.brianhitchcock.net 1 st Script Results SQL> select * from Table_Column_Info where rownum <=20; TABLENAME NUMROWS NUMCHARCOLS CHARCOLNUM CHARCOLNAME NUMNONASCIIROWS ------------------------------ ---------- ----------- ---------- ------------ --------------- ACCNT_STAT 15775 3 1 WCD 0 ACCNT_STAT 15775 3 2 STATUS 0 ACCNT_STAT 15775 3 3 R4_STATUS 0... AR_ADMIN 5578 31 1 R4_ID 0 AR_ADMIN 5578 31 2 R4_SR_NUM 0 AR_ADMIN 5578 31 3 X_DESC 72 20 rows selected. SQL>

39 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 39 www.brianhitchcock.net 2 nd Script Results  12 minutes – 68 tables that have non-ASCII char data – 68 SQL statements  Overall – We have 12Gb of data – 68/2483 tables have any non-ASCII char data – Only 3% of the tables  But they’re some of the biggest tables  Schema analysis much easier on 68 tables

40 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 40 www.brianhitchcock.net 2 nd Script results SQL> select * from Table_NonAscii_Info where rownum <= 10; TABLENAME NUMROWS NUMNONASCIIROWS NUMCOLS NUMNONASCIICOLS ------------------------------ ---------- --------------- ---------- --------------- AR_ADMIN 5578 692 35 6 AR_CON 3573 107 22 3 AUDIT_TABLE 53301 17 29 1 CX_S_ADDR_ORG_XM 69470 275 19 5 C_ACCOUNT 17897 1114 20 1 C_ACT 6562 933 21 6 C_ADDRESS 25590 5490 28 6 C_AR 88638 3760 26 6 C_CONTACT 52574 10401 20 3 C_OPTY 2139 119 25 4

41 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 41 www.brianhitchcock.net 2 nd Script Results SQL> select * from Table_NonAscii_SQL where rownum <= 10; TABLENAME LENGTHNONASCIISQL ------------------------------ ----------------- NONASCIISQL ------------------------------------------------------------------------------------------------------------ -------------------------------------------------------------------------------------------- AR_ADMIN 445 select count(*) from AR_ADMIN where 1=0 or X_DESC != CONVERT (X_DESC, 'US7ASCII', 'WE8ISO8859P1') or LAST_NAME != CONVERT (LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') or FST_NAME != CONVERT (FST_NAME, 'US7 ASCII', 'WE8ISO8859P1') or ACCOUNT != CONVERT (ACCOUNT, 'US7ASCII', 'WE8ISO8859P1') or OWNER_LAST_NAME != CONVERT (OWNER_LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') or R3_CREATED_LAST_NAME != CONVERT (R3_C REATED_LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') AR_CON 233 select count(*) from AR_CON where 1=0 or OWNER_LAST != CONVERT (OWNER_LAST, 'US7ASCII', 'WE8ISO8859P1') or OWNER_FST != CONVERT (OWNER_FST, 'US7ASCII', 'WE8ISO8859P1') or R3_X_NOTES != CONVERT (R3_X_N OTES, 'US7ASCII', 'WE8ISO8859P1') AUDIT_TABLE 100 select count(*) from AUDIT_TABLE where 1=0 or FIELD2 != CONVERT (FIELD2, 'US7ASCII', 'WE8ISO8859P1')

42 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 42 www.brianhitchcock.net 3 rd Script Results  10 minutes – Create copies of non-ASCII tables – Copies contain all character columns  LONG columns not included – Creates 65 tables SQL> select table_name from user_tables where table_name like 'NONASCII%' and table_name not like '%_ORIG ‘ and rownum <= 5; TABLE_NAME ------------------------------ NONASCII_AR_ADMIN NONASCII_AR_CON NONASCII_AUDIT_TABLE NONASCII_CX_S_ADDR_ORG_XM NONASCII_C_ACCOUNT

43 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 43 www.brianhitchcock.net 4 th Script Results  7 minutes – Create copies of non-ASCII tables – Copies contain only non-ASCII columns – Creates 68 tables SQL> select table_name from user_tables where table_name like 'NA_CO_% ‘ and rownum <= 5; TABLE_NAME ------------------------------ NA_CO_AR_ADMIN NA_CO_AR_CON NA_CO_AUDIT_TABLE NA_CO_CX_S_ADDR_ORG_XM NA_CO_C_ACCOUNT

44 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 44 www.brianhitchcock.net 5 th Script Results  1 minute – Create copies of non-ASCII tables – Copies contain all character columns  LONG column included – Creates 3 tables  only 3 non-ASCII tables have LONG column TABLE_NAME ------------------------------ NONASCII_EIM_ACCNT_DTL NONASCII_EIM_OPTY_DTL NONASCII_S_CS_QUEST_LANG

45 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 45 www.brianhitchcock.net 6 th Script Results  27 minutes – Scan non-ASCII tables – Find '%¥_¥%' or '%¤_¤%‘ – Very likely EUCJP character set – Create three tables  Table_NA_EUCJP_Info (68 tables)  Table_NA_EUCJP_SQL (5 tables)  TABLE_NA_EUCJP_COL_INFO (213 columns) – 5 tables have EUCJP character data

46 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 46 www.brianhitchcock.net 6 th Script Results SQL> select * from Table_NA_EUCJP_Info where rownum <= 10; TABLENAME NUM_NONASCII_ROWS NUM_NA_EUCJP_ROWS NUM_NONASCII_COLS NUM_NA_EUCJP_COLS ------------------------------ ----------------- ----------------- ----------------- ----------------- NA_CO_AR_ADMIN 5578 9 6 1 NA_CO_AR_CON 3573 4 3 1 NA_CO_AUDIT_TABLE 53301 0 1 0 NA_CO_CX_S_ADDR_ORG_XM 69470 0 5 0 NA_CO_C_ACCOUNT 17897 0 1 0 NA_CO_C_ACT 6562 0 6 0 NA_CO_C_ADDRESS 25590 0 6 0 NA_CO_C_AR 88638 0 6 0 NA_CO_C_CONTACT 52574 0 3 0 NA_CO_C_OPTY 2139 0 4 0

47 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 47 www.brianhitchcock.net 6 th Script Results SQL> select * from Table_NA_EUCJP_SQL; TABLENAME LEN_NA_EUCJP_SQL ---------------- NA_EUCJP_SQL -------------------------------------------------------------------------------------------------------- NA_CO_AR_ADMIN 91 select count(*) from NA_CO_AR_ADMIN where 1=0 or X_DESC like '% ¥ _ ¥ %' or X_DESC like '% ¤ _ ¤ %' NA_CO_AR_CON 97 select count(*) from NA_CO_AR_CON where 1=0 or R3_X_NOTES like '% ¥ _ ¥ %' or R3_X_NOTES like '% ¤ _ ¤ %' NA_CO_S_ADDR_ORG 97 select count(*) from NA_CO_S_ADDR_ORG where 1=0 or COMMENTS like '% ¥ _ ¥ %' or COMMENTS like '% ¤ _ ¤ %' NA_CO_S_CONTACT 142 select count(*) from NA_CO_S_CONTACT where 1=0 or COMMENTS like '% ¥ _ ¥ %' or COMMENTS like '% ¤ _ ¤ %' or X_DEPT like '% ¥ _ ¥ %' or X_DEPT like '% ¤ _ ¤ %' NA_CO_S_SRV_REQ 200 select count(*) from NA_CO_S_SRV_REQ where 1=0 or X_NOTES like '% ¥ _ ¥ %' or X_NOTES like '% ¤ _ ¤ %' or X_DESC like '% ¥ _ ¥ %' or X_DESC like '% ¤ _ ¤ %' or X_EMAIL_NOTES like '% ¥ _ ¥ %' or X_EMAIL_NOTES like '% ¤ _ ¤ %'

48 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 48 www.brianhitchcock.net 6 th Script Results SQL> select * from TABLE_NA_EUCJP_COL_INFO where rownum <=10; TABLENAME NUMNONASCIIROWS NUMNACOLS NACOLNUM NAEUCJPCOLNAME NUMNAEUCJPROWS ------------------- --------------- ---------- ---------- ------------------------------ -------------- NA_CO_AR_ADMIN 5578 6 1 X_DESC 9 NA_CO_AR_ADMIN 5578 6 2 LAST_NAME 0 NA_CO_AR_ADMIN 5578 6 3 FST_NAME 0 NA_CO_AR_ADMIN 5578 6 4 ACCOUNT 0 NA_CO_AR_ADMIN 5578 6 5 OWNER_LAST_NAME 0 NA_CO_AR_ADMIN 5578 6 6 R3_CREATED_LAST_NAME 0 NA_CO_AR_CON 3573 3 1 OWNER_LAST 0 NA_CO_AR_CON 3573 3 2 OWNER_FST 0 NA_CO_AR_CON 3573 3 3 R3_X_NOTES 4 NA_CO_AUDIT_TABLE 53301 1 1 FIELD2 0

49 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 49 www.brianhitchcock.net 7 th Script Results  6 minutes – Create two copies of each EUCJP tables – First copy has all character columns of table – Second copy has only the EUCJP columns – Tables named  EUCJP_  EUCJP_CO_

50 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 50 www.brianhitchcock.net 7 th Script Results SQL> select table_name from user_tables where table_name like 'EUCJP_%' minus select 2 table_name from user_tables where table_name like 'EUCJP_CO_%'; TABLE_NAME ------------------------------ EUCJP_AR_ADMIN EUCJP_AR_CON EUCJP_S_ADDR_ORG EUCJP_S_CONTACT EUCJP_S_SRV_REQ SQL> select table_name from user_tables where table_name like 'EUCJP_CO_%'; TABLE_NAME ------------------------------ EUCJP_CO_AR_ADMIN EUCJP_CO_AR_CON EUCJP_CO_S_ADDR_ORG EUCJP_CO_S_CONTACT EUCJP_CO_S_SRV_REQ

51 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 51 www.brianhitchcock.net 7 th Script Results  EUCJP rows selected  Reviewed by dev team – EUCJP of all rows verified  Make copies of these tables for reference  Delete the EUCJP rows from the non-ASCII tables  Further scanning of the non-ASCII tables won’t consider the EUCJP rows

52 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 52 www.brianhitchcock.net 8 th Script Results  47 minutes – Scan non-ASCII tables (again) – Find '%¥%' or '%¤%‘ – Could be EUCJP character set  Could also be WE character data – Create three tables  Table_NA_EUCJP_2_Info  Table_NA_EUCJP_2_SQL  TABLE_NA_EUCJP_2_COL_INFO – 3 tables have EUCJP character data

53 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 53 www.brianhitchcock.net 8 th Script Results  Possible EUCJP rows selected  Reviewed by dev team – EUCJP of all rows verified  Make copies of these tables for reference  Delete these EUCJP rows from the non-ASCII tables  Further scanning of the non-ASCII tables won’t consider these EUCJP rows

54 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 54 www.brianhitchcock.net Next Steps  What I had planned  With the EUCJP rows verified and removed  Scan non-ASCII tables (yet again)  Look for 8859Pn character data – How? – WE languages, single isolated 8-bit byte code with ASCII (7-bit) byte codes on either side – Example: Bücher

55 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 55 www.brianhitchcock.net Next Steps  Select likely WE rows from non-ASCII tables – Review with dev team – Determine source country for each row  Schema has ‘country code’  Select each row using character set of country – Verify rows with fluent speaker for each country – Remove rows from non-ASCII tables as verified  What to do with remaining rows – Not sure…

56 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 56 www.brianhitchcock.net What Really Happened?  After 8 scripts  Dev team was able to – Identify likely country for each non-ASCII row – I identified likely character set for each country – I selected rows for each country  Using identified character set – Fluent speaker from each country verified  Rows as selected were correct – Wrote SQL to correctly convert rows to Unicode

57 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 57 www.brianhitchcock.net Conversion  How to convert non-ASCII rows to Unicode? – New db uses AL32UTF8 character set  With correct character set identified  After importing into new 9i database – Convert back to WE8MSWIN1252 – Convert to AL32UTF8 – Example:  UPDATE SET = CONVERT (, WE8MSWIN1252, AL32UTF8);  UPDATE SET = CONVERT (, AL32UTF8, );

58 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 58 www.brianhitchcock.net Script Summary  8 scripts, scanning 12 Gb of data – Run times  2 hours  12 minutes  10 minutes  7 minutes  1 minute  27 minutes  6 minutes  47 minutes  Total run time – 230 minutes, about 4 hours – Very slow development machine

59 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 59 www.brianhitchcock.net Conclusions  For character set conversion – From any 8-bit character set (WE8ISO8859Pn) – To Unicode – Accept that some of the existing data may not be in the database character set – Don’t assume, verify  Use PL/SQL scripts,identify non-ASCII character data  Decide how to evaluate the non-ASCII data  Document, test, communicate – Make sure everyone knows how data from each character set is identified

60 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 60 www.brianhitchcock.net Books Used  Oracle PL/SQL By Example – Rozenzweig, Silvestrova Prentice Hall 2004 – I needed lots of examples  multiple nested cursors – Needed to get going fast  Got help from experienced PL/SQL developer – Quotes issue – Even they couldn’t explain why the specific number of quotes works…but it did

61 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 61 www.brianhitchcock.net CRM Unicode Conversion  Three separate presentations – 1) The overall conversion process  What we had, what we wanted, how to get there  Issues that come up during conversion – 2) Multi-byte data in the existing CRM db  What’s the issue, how did it happen  A general method to find and fix this problem – 3) The actual conversion  What really happened  Issues that came up and how they were resolved  Focus on DBA issues, not Siebel application

62 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 62 www.brianhitchcock.net PL/SQL Notes  Quotes of quotes – Hard to know how many you need – Experiment – Test  PL/SQL that generates SQL that contains quoted strings  Keep it simple  Break up the task into multiple scripts  Generate tables of results, next script uses table(s) as input – Tables provide documentation of intermediate results

63 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 63 www.brianhitchcock.net PL/SQL Notes  Second script – Looping to build up select SQL – Selects data from all non-ASCII columns  Initial select SQL has to be – NonAsciiSQL_stmt := 'select count(*) from '||TableName||' where 1=0 – Subsequent SQL of form NonAsciiSQL_stmt := NonAsciiSQL_stmt||' or '||TableCharColName|| – Needed ‘where 1=0 so we could append further OR clauses

64 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 64 www.brianhitchcock.net PL/SQL Notes  LONG datatype – Third script created tables as select * from  Can’t do this when table has LONG column – Fourth script create tables by building up the create table SQL one column at a time  Skip the LONG column, if present in base table

65 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 65 www.brianhitchcock.net PL/SQL Notes  DBMS_OUTPUT limitations – Only works for so long – Has limit of 1M characters  Scripts are not commercial grade – Testing statements are left in  Commented out – No error trapping – Still development scripts – They work, but they aren’t pretty

66 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 66 www.brianhitchcock.net PL/SQL Notes  Scripts setup to – Run in SQL*Plus user’s schema – Output tables created in user’s schema  Could easily change scripts – Store output tables in separate schema – Take a schema as input  Scan tables in specified schema

67 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 67 www.brianhitchcock.net PL/SQL Script Example  Show PL/SQL of first script – Cursors with definitions that depend on loop variable of outer loop – Quotes and more quotes – Generating insert statements that are inserting strings of SQL

68 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 68 www.brianhitchcock.net 6 th Script Text set serveroutput on size 1000000; declare cursor C_EucJpTabNames is select table_name from user_tables where table_name like 'NA_CO_%'; cursor C_EucJpTabCols (i_table_name varchar2) is select column_name from user_tab_columns where table_name = i_table_name order by column_id; TableName VARCHAR2(100); TableRowCount NUMBER; ColCount NUMBER; TableCharColName VARCHAR(100); NumAsciiPlusNon NUMBER; TableCharColNum NUMBER; Num_NA_EUCJP_Rows NUMBER; TabNum_NA_EUCJP_Rows NUMBER; Len_NA_EUCJP_SQL_stmt NUMBER; TabNum_NA_EUCJP_Cols NUMBER; CurNum_NA_EUCJP_Cols NUMBER; Sql_stmt VARCHAR2(4000); Sql_stmt2 VARCHAR2(4000) := 'COMMIT'; NA_EUCJP_SQL_stmt VARCHAR2(4000); NA_EUCJP_SQL_stmt_insert VARCHAR2(4000); NAColCount NUMBER; BEGIN --dbms_output.disable; Sql_stmt := 'create table Table_NA_EUCJP_Info (TableName VARCHAR2(30), NUM_NONASCII_ROWS NUMBER, NUM_NA_EUCJP_ROWS NUMBER, NUM_NONASCII_COLS NUMBER, NUM_NA_EUCJP_COLS NUMBER)'; execute immediate Sql_stmt; Sql_stmt := 'create table Table_NA_EUCJP_SQL (TableName VARCHAR2(30), Len_NA_EUCJP_SQL NUMBER, NA_EUCJP_SQL VARCHAR2(4000))'; execute immediate Sql_stmt;

69 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 69 www.brianhitchcock.net 6 th Script Text Sql_stmt := 'create table Table_NA_EUCJP_Col_Info (TableName VARCHAR2(30), NUMNONASCIIROWS NUMBER, NUMNACOLS NUMBER, NACOLNUM NUMBER, NAEUCJPCOLNAME VARCHAR2(30), NUMNAEUCJPROWS NUMBER)'; execute immediate Sql_stmt; open C_EucJpTabNames; LOOP FETCH C_EucJpTabNames into TableName; Exit when C_EucJpTabNames%NOTFOUND; NA_EUCJP_SQL_stmt := 'select count(*) from '||TableName||' where 1=0'; NA_EUCJP_SQL_stmt_insert := '''select count(*) from '||TableName||' where 1=0'; execute immediate 'select count(*) from user_tab_columns where table_name = ''' || TableName || '''' into NAColCount; dbms_output.put_line('here is the NA_EUCJP_SQL_stmt_insert '); dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'',1,255)); dbms_output.put_line('table name is '||TableName); execute immediate 'select count(*) from '||TableName into TableRowCount; TableCharColNum := 0; CurNum_NA_EUCJP_Cols := 0; open C_EucJpTabCols (TableName); LOOP FETCH C_EucJpTabCols into TableCharColName; Exit when C_EucJpTabCols%NOTFOUND; dbms_output.put_line('This is column '||TableCharColName); TableCharColNum := TableCharColNum + 1; -- compute the number of EUCJP rows for this column... execute immediate 'select count(*) from '||TableName|| ' where '||TableCharColName||' like ''% ¥ _ ¥ %'' or ' ||TableCharColName||' like ''% ¤ _ ¤ %''' into Num_NA_EUCJP_Rows; dbms_output.put_line('This column has '||Num_NA_EUCJP_Rows||' NA_EUCJP_ rows'); IF Num_NA_EUCJP_Rows != 0 THEN NA_EUCJP_SQL_stmt := NA_EUCJP_SQL_stmt||' or '||TableCharColName|| ' like ''% ¥ _ ¥ %'' or '||TableCharColName||' like ''% ¤ _ ¤ %'''; NA_EUCJP_SQL_stmt_insert := NA_EUCJP_SQL_stmt_insert||' or '||TableCharColName|| ' like ''''% ¥ _ ¥ %'''' or '||TableCharColName||' like ''''% ¤ _ ¤ %''''';

70 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 70 www.brianhitchcock.net 6 th Script Text CurNum_NA_EUCJP_Cols := CurNum_NA_EUCJP_Cols + 1; dbms_output.put_line('This is NA_EUCJP_Column number '||CurNum_NA_EUCJP_Cols); dbms_output.put_line('here is CurNum_NA_EUCJP_Cols'); dbms_output.put_line(CurNum_NA_EUCJP_Cols); dbms_output.put_line('SQL statement appended...'); END IF; -- insert column info... --Dummy_col_count := 999; Sql_stmt := 'insert into Table_NA_EUCJP_Col_Info values ('''||TableName||''', '||TableRowCount|| ', '||NAColCount||', '||TableCharColNum||', '''||TableCharColName||''','||Num_NA_EUCJP_Rows||')'; execute immediate Sql_stmt; dbms_output.put_line('Column info insert completed...'); End Loop; NA_EUCJP_SQL_stmt_insert := NA_EUCJP_SQL_stmt_insert||''''; dbms_output.put_line('here is the NA_EUCJP_SQL_stmt_insert '); dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'',1,255));

71 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 71 www.brianhitchcock.net 6 th Script Text TabNum_NA_EUCJP_Cols:= CurNum_NA_EUCJP_Cols; dbms_output.put_line('here is TabNum_NA_EUCJP_Cols'); dbms_output.put_line(TabNum_NA_EUCJP_Cols); -- update number of NAEUCJP columns... --Sql_stmt := 'update Table_NA_EUCJP_Col_Info set NUMNAEUCJPCOLS = TabNum_NA_EUCJP_Cols --where TableName = '''||TableName||''; --execute immediate Sql_stmt; --dbms_output.put_line('Number of NAEUCJP columns updated...'); Close C_EucJpTabCols; Len_NA_EUCJP_SQL_stmt := LENGTH (NA_EUCJP_SQL_stmt); dbms_output.put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt); dbms_output.put_line('here is the NA_EUCJP_SQL_stmt'); dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt||'',1,255)); --this has already been done above... --execute immediate 'select count(*) from '||TableName into TableRowCount; execute immediate 'select count(*) from user_tab_columns where table_name = ''' || TableName || '''' into ColCount;

72 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 72 www.brianhitchcock.net 6 th Script Text --NA_EUCJP_SQL_stmt := 'testing'; TabNum_NA_EUCJP_Rows := 0; execute immediate NA_EUCJP_SQL_stmt into TabNum_NA_EUCJP_Rows; dbms_output.put_line('Number of NA_EUCJP_ rows... '||TabNum_NA_EUCJP_Rows); --Len_NA_EUCJP_SQL_stmt := 0; dbms_output.put_line('Num rows in the table '||TableRowCount); dbms_output.put_line('Num columns in the table '||ColCount); dbms_output.put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt); dbms_output.put_line('Num NAEUCJP_ Rows '||TabNum_NA_EUCJP_Rows); dbms_output.put_line('Num NAEUCJP_ Columns '||TabNum_NA_EUCJP_Cols); Sql_stmt := 'insert into Table_NA_EUCJP_Info values ('''||TableName||''', '||TableRowCount|| ', '||TabNum_NA_EUCJP_Rows||', '||ColCount||', '||TabNum_NA_EUCJP_Cols||')'; execute immediate Sql_stmt; dbms_output.put_line('First insert completed...'); -- If number of EUCJP rows is non-zero, insert select SQL into SQL table IF TabNum_NA_EUCJP_Rows != 0 THEN Sql_stmt := 'insert into Table_NA_EUCJP_SQL values ('''||TableName||''', '||Len_NA_EUCJP_SQL_stmt|| ', '||NA_EUCJP_SQL_stmt_insert||')'; execute immediate Sql_stmt; dbms_output.put_line('Second insert completed...');

73 DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 73 www.brianhitchcock.net 6 th Script Text End If; execute immediate Sql_stmt2; End Loop; Close C_EucJpTabNames; End; /


Download ppt "Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems DCSIT Technical."

Similar presentations


Ads by Google