Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems DCSIT Technical.

Slides:



Advertisements
Similar presentations
AN INTRODUCTION TO PL/SQL Mehdi Azarmi 1. Introduction PL/SQL is Oracle's procedural language extension to SQL, the non-procedural relational database.
Advertisements

NLS and The Case of the Missing Kanji Brian Hitchcock OCP DBA 8, 8i, 9i Global Sales IT Sun Microsystems NoCOUG.
Introduction to Structured Query Language (SQL)
Let’s try Oracle. Accessing Oracle The Oracle system, like the SQL Server system, is client / server. For SQL Server, –the client is the Query Analyser.
Lecture-5 Though SQL is the natural language of the DBA, it suffers from various inherent disadvantages, when used as a conventional programming language.
Introduction to Structured Query Language (SQL)
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
A Guide to SQL, Seventh Edition. Objectives Embed SQL commands in PL/SQL programs Retrieve single rows using embedded SQL Update a table using embedded.
Introduction to PL/SQL
7/2/2015Murali Mani -- CS5421 Database Management Systems DB Application Development Project Statement + Introduction to Oracle.
Introduction to Structured Query Language (SQL)
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
Homework Reading –Finish K&R Chapter 1 (if not done yet) –Start K&R Chapter 2 for next time. Programming Assignments –DON’T USE and string library functions,
9/15/09 - L3 CodesCopyright Joanne DeGroat, ECE, OSU1 Codes.
Advanced Web 2012 Lecture 4 Sean Costain PHP Sean Costain 2012 What is PHP? PHP is a widely-used general-purpose scripting language that is especially.
CC0002NI – Computer Programming Computer Programming Er. Saroj Sharan Regmi Week 7.
Bordoloi and Bock CURSORS. Bordoloi and Bock CURSOR MANIPULATION To process an SQL statement, ORACLE needs to create an area of memory known as the context.
PL / SQL P rocedural L anguage / S tructured Q uery L anguage Chapter 7 in Lab Reference.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 The Web Wizard’s Guide to PHP by David Lash.
Brian Hitchcock OCP DBA 8i Global Sales IT Sun Microsystems
Announcements Read JDBC Project Step 5, due Monday.
Oracle DataGuard Concepts and Architecture
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
07/10/ Strings ASCII& Processing Strings with the Functions - Locate (Instr), Mid, Length (Len), Char (ChrW) & ASCII (Asc)
Dinamic SQL & Cursor. Why Dinamic SQL ? Sometimes there is a need to dynamically create a SQL statement on the fly and then run that command. This can.
Stored Procedures, Transactions, and Error-Handling
CS 114 – Class 02 Topics  Computer programs  Using the compiler Assignments  Read pages for Thursday.  We will go to the lab on Thursday.
SQL data definition using Oracle1 SQL Data Definition using Oracle.
Computer Programming TCP1224 Chapter 3 Completing the Problem-Solving Process and Getting Started with C++
Chapter 1 Working with strings. Objectives Understand simple programs using character strings and the string library. Get acquainted with declarations,
1 What is database 2? What is normalization? What is SQL? What is transaction?
Sizing Basics  Why Size?  When to size  Sizing issues:  Bits and Bytes  Blocks (aka pages) of Data  Different Data types  Row Size  Table Sizing.
Using Procedures & Functions Oracle Database PL/SQL 10g Programming Chapter 9.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 10: The Data Tier We discuss back-end data storage for Web applications, relational data, and using the MySQL database server for back-end storage.
Upgrading to SQL Server 2000 Kashef Mughal. Multiple Versions SQL Server 2000 supports multiple versions of SQL Server on the same machine It does that.
Views Lesson 7.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Guide to Oracle 10g ITBIS373 Database Development Lecture 4a - Chapter 4: Using SQL Queries to Insert, Update, Delete, and View Data.
Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems DCSIT Technical.
Week 7 Lecture 2 Globalization Support in the Database.
Chapter 9: Advanced SQL and PL/SQL Guide to Oracle 10g.
CS Class 03 Topics  Sequence statements Input Output Assignment  Expressions Read pages Read pages 40 – 49 for next time.
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
implicit and an explicit cursor
Oracle Applications 11i Concepts II Brian Hitchcock OCP 11i DBA -- OCP 10g DBA Sun Microsystems Brian Hitchcock.
CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.
ITERATION. Iteration Computers are often used to automate repetitive tasks. Repeating identical or similar tasks without making errors is something that.
Starting with Oracle SQL Plus. Today in the lab… Connect to SQL Plus – your schema. Set up two tables. Find the tables in the catalog. Insert four rows.
Lab 2 Writing PL/SQL Blocks CISB514 Advanced Database Systems.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
Database Planning Database Design Normalization.
1 Section 10 - Embedded SQL u Many computer languages allow you to embed SQL statements within the code (e.g. COBOL, PowerBuilder, C++, PL/SQL, etc.) u.
D Copyright © 2009, Oracle. All rights reserved. Using SQL*Plus.
Creating Database Objects
Cincinnati Bell Telephone INsideOUT Database Redesign
Chapter 5 Introduction to SQL.
A Guide to SQL, Seventh Edition
SQL and SQL*Plus Interaction
PL/SQL.
Fundamentals of Data Structures
4.1 Strings ASCII & Processing Strings with the Functions
ORACLE.
Homework Reading Programming Assignments Finish K&R Chapter 1
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Creating Database Objects
Presentation transcript:

Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 1

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 2 CRM Unicode Conversion  Three separate presentations – 1) The overall conversion process  What we had, what we wanted, how to get there  Issues that come up during conversion – 2) Multi-byte data in the existing CRM db  What’s the issue, how did it happen  A general method to find and fix this problem – 3) The actual conversion  What really happened  Issues that came up and how they were resolved  Focus on DBA issues, not Siebel application

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 3 How Did I Get Involved?  Sleeping in a meeting…  Heard someone say – “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped”  Woke up, said – “I’ve done that before…” – See “Case of the Missing Kanji”  Don’t wake up in meetings…

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 4 What’s The Issue?  Existing Siebel CRM system – Oracle – Single-byte character set (WE8ISO8859P1)  Interface systems – Multi-byte character set(s) (UTF8) – Handle data between single,multi-byte apps  Want to convert to Unicode – Siebel, database, interfaces all should be UTF8 – Eliminate interface systems

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 5 What We Had Siebel CRM Oracle Db Custdb Apac Users Tcustdb Apac Custdb Emea Custdb Amer Tcustdb Emea Amer Emea Apac UTF8 WE8ISO8859P1 UTF8 WE8ISO8859P1 8859P1 Ordering System

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 6 What We Wanted Siebel CRM Oracle Db Custdb Apac Users Custdb Emea Custdb Amer Amer Emea Apac WE8ISO8859P1 UTF8 AL32UTF8 UTF8 Ordering System

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 7 What We Wanted  All data in one database – All languages – Unicode  Eliminate interface systems – Reduce support costs  Support increased CRM functionality – All data in one place – Supports new business functionality

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 8 Multi-byte Data In Source Db?  Source db is WE8ISO8859P1 – Single-byte character set – Doesn’t support multi-byte characters  That’s the official story  The reality is somewhat different  What, if any multi-byte data is in source db? – How to determine correct character set? – How to find, how to fix? – Japanese, Chinese, others?

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 9 But Wait, There’s More…  Not just multi-byte data to look for  Non-p1 character data also – Non multi-byte character data – Could be WE P1 (western European)  German, Italian, French etc. – Could be WE Pn  Polish, Greek, Russian etc.  How to find?

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 10 How Polish Was Handled  Use separate app that sends polish (P2) to CRM database  Stored in P1 db  Triggers move this polish data to TWCD  Triggers in TWCD – Know that it’s polish (P2) – Convert to UTF8 and send to WCD db  Therefore, multiple languages in Siebel P1 db

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 11 What’s the Problem?  Character data from multiple languages – Stored in oracle db – Db configured for P1  P1 supports multiple WE languages  Does not support polish, Russian, etc.  Need to find all such character data  Non-p1 can be – Single-byte (polish, Russian, etc.) – Multi-byte (Japanese, Chinese, etc.)

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 12 Single-byte Character Sets  All Pn (8859-1, , etc.) character sets – Share same range of byte codes, 0 to 255 – Above 0xA1 (decimal 161)  Same byte codes represent different characters  Example – WE8ISO8859P1 (8859-1)  Byte code 0xA3 (decimal 163) is character £ – EE8ISO8859P2 (8859-2)  Same byte code, 0xA3 is character Ł

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 13 Finding Non-p1 Char Data?  Logically – Examine db design, Siebel docs, figure out which tables designed to store language specific (local language) data – Some column (country code) in these tables to tell you which country data is from – Determine correct character set for data from each country – Convert these tables manually to AL32UTF8 as part of overall Unicode conversion process

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 14 Not Good  Want general method – No need to analyze the meaning of existing data – Need automated way to find all non-P1 char data  Can’t do it – No general way to determine if char data is P1 or P2 or Pn  As shown before, byte code 0xa3 (decimal 163) ­Character £ in P1 ­Character Ł in P2

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 15 Good  But, can find non-ASCII data in general – And then find multi-byte character data  Use separate approach to find non-P1  Use PL/SQL code – Examine every table – Examine every column that holds character data – Determine which rows if any are ASCII – Rows that aren’t ASCII are ‘suspect’ – Identify tables that have any non-ASCII character data

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 16 Why Look For ASCII?  Character data that is ASCII – Only 7 bits used to encode character – 8 th bit of every byte is 0 – For non-ASCII, 8 th byte is set  WE8ISO8859Pn  Multi-byte, Japanese, Chinese, etc.  By eliminating all tables that are ASCII – No need to ask are they P1, P2, Pn or multi-byte – Greatly reduces the task

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 17 How To Find Non-ASCII?  Use SQL function convert – Convert a given column to ASCII character set – Compare resulting string with original – If original string is all ASCII  Will match converted string – If not a match  Column value is non-ASCII ­Could be WE8ISO8859Pn ­Could be multi-byte

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 18 Example Finding Non-ASCII  in WE8ISO8859P1 database create table Psycho_Acircle (text VARCHAR2(100)); insert into Psycho_Acircle values (chr(197)||'BCDE'); insert into Psycho_Acircle values ('ABCDE'); select * from Psycho_Acircle; TEXT Å BCDE ABCDE select convert(text,'US7ASCII','WE8ISO8859P1') from Psycho_Acircle; CONVERT(TEXT,'US7ASCII','WE8ISO8859P1') ?BCDE ABCDE ÅBCDE is not the same as ?BCDE

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 19 Not Included  Did not scan – LONG datatype columns – CLOB datatype columns  Didn’t have any in schema – PL/SQL code in database  Dev team determined this wasn’t needed

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 20 Scripts Strategy  Eliminate as much as possible – Identify all ASCII only tables – Left with set of non-ASCII tables  For remaining tables – Find likely Japanese character data – Verify it is Japanese – Copy to separate table – Remove from non-ASCII tables  Repeat for other languages – How to identify byte patterns for each language?

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 21 PL/SQL scripts  Scripts used – Scan_Table_1_Gen_Column_Info.sql – Scan_Table_2_Gen_Nonascii_rows_Info.sql – Scan_Table_3_Gen_NonasciiTables_NoLong.sql – Scan_Table_4_Gen_NonasciiTables_NonasciiCols_Only.sql – Scan_Table_5_Gen_NonasciiTables_YesLong.sql – Scan_Table_6_Gen_NA_EUCJP_info_sql_col_info.sql – Scan_Table_7_Gen_NA_EUCJP_Tables.sql – Scan_Table_8_Gen_NA_EUCJP_2_rows_info.sql

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 22 Scripts  Each script generates table(s) – Output of each script stored in table(s)  Next script uses tables  Lots of intermediate data stored – Helped develop scripts – Each script simpler – Provided extra output for developers, analysts to help them verify results  Is this data really Japanese?

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 23 What Does Each Script Do?  Scan_Table_1_Gen_Column_Info.sql – Scans all tables in a schema – Creates two tables  Table_Gen_Info ­Info on all tables  Table_Column_Info ­Info on character columns ­Which contain any non-ASCII strings ­Doesn’t include LONG columns ­Can’t use SQL functions on LONG datatype

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 24 What Does Each Script Do?  Scan_Table_2_Gen_Nonascii_rows_Info.sql – Use table Table_Column_Info – Examine tables with non-ASCII character data – Creates two tables  Table_NonAscii_info ­Number of rows, columns with non-ASCII data  Table_NonAscii_SQL ­SQL to extract non-ASCII data from each table ­Useful for developers, analysts to extract data from other environments

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 25 What Does Each Script Do?  Scan_Table_3_Gen_NonasciiTables_NoLong.sql – Use tables table_gen_info, table_nonascii_sql – Create copies of tables that have non-ASCII data – Copies contain only the non-ASCII rows  Have all character columns of original table  Helps identify which country data is from – Creates tables as select * from  Doesn’t work on tables with LONG column  Tables named NONASCII_

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 26 What Does Each Script Do?  Scan_Table_4_Gen_NonasciiTables_Nonasci iCols_Only.sql – Similar to third (previous) script – Table copies only contain columns that have non-ASCII data – Does handle tables with LONG column – Creates tables of form NA_CO_  Set of tables containing all non-ASCII data in the schema

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 27 What Does Each Script Do?  Scan_Table_5_Gen_NonasciiTables_YesLon g.sql – Creates copies of tables having non-ASCII data – Copy tables have all char columns of base table – Only copies tables that have LONG column – Companion to third script  Deals with tables that have LONG column  Tables named NONASCII_ – Now have complete set of tables  Have all non-ASCII char columns of base tables

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 28 Katakana, Hiragana?  How to find Japanese character data? – Look at hex dump of character data and see lots of ¥_¥ and ¤_¤ – The byte code of ¥ is A4, ¤ is A5 – Many Japanese transliterated terms (company names) start with these bytes – Typical of EUCJP character set – Find rows that contain '%¥_¥%' or '%¤_¤%‘ – repeated ¥ or ¤ means EUCJP more likely – Verify that these rows are indeed Japanese

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 29 What Does Each Script Do?  Scan_Table_6_Gen_NA_EUCJP_info_sql_col _info.sql – For table copies with non-ASCII columns only – Look for specific pattern of '%¥_¥%' – Or '%¤_¤%‘ – Creates tables  Table_NA_EUCJP_Info  Table_NA_EUCJP_SQL  Table_NA_EUCJP_COL_INFO

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script  What does each table contain? – Table_NA_EUCJP_Info  Number of EUCJP rows in each non-ASCII table – Table_NA_EUCJP_SQL  SQL to extract EUCJP rows – Table_NA_EUCJP_COL_INFO  Number of EUCJP rows in each column

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 31 What Does Each Script Do?  Scan_Table_7_Gen_NA_EUCJP_Tables.sql – Create two copies of each table that has EUCJP  Contain rows that have EUCJP  First table, all char columns  Second, only EUCJP columns – Tables created have names  EUCJP_  ECUJP_CO_

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 32 After 7 th Script  We have identified EUCJP rows – In non-ASCII tables – Copied these rows to separate tables  Delete these rows from the non-ASCII tables  As we identify rows from a specific char set – Remove them from the non-ASCII tables – Smaller and smaller set of unknown rows

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 33 What Does Each Script Do?  Scan_Table_8_Gen_NA_EUCJP_2_rows_inf o.sql – Find rows containing ¥ or ¤ – Could be Japanese – Could be WE

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 34 Results  For each script – Time to run – Output – %of total db that is non-ASCII – Demonstrates power of this approach – No attempt to speed up  Only need to scan once, no need for speed – Copy prod data to separate environment – Run scripts there, develop the SQL to correctly convert the non-ASCII data as needed  Apply to prod as part of Unicode conversion

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 35 Results  Scripts run against copy of production db  Database – 25Gb total, but 13Gb free space – 12Gb of actual data to scan – (be skeptical when people tell you they support multi-terabyte dbs, size of actual data counts)  Scripts create tables in the same schema they run in

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 36 Results  Script 1 – 2hours – Scanned 12Gb of data – 2483 tables, columns – Created two tables  Table_gen_info  Table_column_info

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page st Script Results SQL> select * from Table_Gen_Info where rownum <=10; TABLENAME NUMROWS NUMCOLS NUMCHARCOLS NUMCLOBCOLS NUMLONGCOLS ACCNT_STAT AMER_AR_OWNER AMER_AR_T APAC_AR_OWNER AR_ADMIN AR_CON AR_STAT AUDIT_TABLE CONT_CREATED CON_CREATED

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page st Script Results SQL> select * from Table_Column_Info where rownum <=20; TABLENAME NUMROWS NUMCHARCOLS CHARCOLNUM CHARCOLNAME NUMNONASCIIROWS ACCNT_STAT WCD 0 ACCNT_STAT STATUS 0 ACCNT_STAT R4_STATUS 0... AR_ADMIN R4_ID 0 AR_ADMIN R4_SR_NUM 0 AR_ADMIN X_DESC rows selected. SQL>

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page nd Script Results  12 minutes – 68 tables that have non-ASCII char data – 68 SQL statements  Overall – We have 12Gb of data – 68/2483 tables have any non-ASCII char data – Only 3% of the tables  But they’re some of the biggest tables  Schema analysis much easier on 68 tables

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page nd Script results SQL> select * from Table_NonAscii_Info where rownum <= 10; TABLENAME NUMROWS NUMNONASCIIROWS NUMCOLS NUMNONASCIICOLS AR_ADMIN AR_CON AUDIT_TABLE CX_S_ADDR_ORG_XM C_ACCOUNT C_ACT C_ADDRESS C_AR C_CONTACT C_OPTY

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page nd Script Results SQL> select * from Table_NonAscii_SQL where rownum <= 10; TABLENAME LENGTHNONASCIISQL NONASCIISQL AR_ADMIN 445 select count(*) from AR_ADMIN where 1=0 or X_DESC != CONVERT (X_DESC, 'US7ASCII', 'WE8ISO8859P1') or LAST_NAME != CONVERT (LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') or FST_NAME != CONVERT (FST_NAME, 'US7 ASCII', 'WE8ISO8859P1') or ACCOUNT != CONVERT (ACCOUNT, 'US7ASCII', 'WE8ISO8859P1') or OWNER_LAST_NAME != CONVERT (OWNER_LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') or R3_CREATED_LAST_NAME != CONVERT (R3_C REATED_LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') AR_CON 233 select count(*) from AR_CON where 1=0 or OWNER_LAST != CONVERT (OWNER_LAST, 'US7ASCII', 'WE8ISO8859P1') or OWNER_FST != CONVERT (OWNER_FST, 'US7ASCII', 'WE8ISO8859P1') or R3_X_NOTES != CONVERT (R3_X_N OTES, 'US7ASCII', 'WE8ISO8859P1') AUDIT_TABLE 100 select count(*) from AUDIT_TABLE where 1=0 or FIELD2 != CONVERT (FIELD2, 'US7ASCII', 'WE8ISO8859P1')

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page rd Script Results  10 minutes – Create copies of non-ASCII tables – Copies contain all character columns  LONG columns not included – Creates 65 tables SQL> select table_name from user_tables where table_name like 'NONASCII%' and table_name not like '%_ORIG ‘ and rownum <= 5; TABLE_NAME NONASCII_AR_ADMIN NONASCII_AR_CON NONASCII_AUDIT_TABLE NONASCII_CX_S_ADDR_ORG_XM NONASCII_C_ACCOUNT

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results  7 minutes – Create copies of non-ASCII tables – Copies contain only non-ASCII columns – Creates 68 tables SQL> select table_name from user_tables where table_name like 'NA_CO_% ‘ and rownum <= 5; TABLE_NAME NA_CO_AR_ADMIN NA_CO_AR_CON NA_CO_AUDIT_TABLE NA_CO_CX_S_ADDR_ORG_XM NA_CO_C_ACCOUNT

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results  1 minute – Create copies of non-ASCII tables – Copies contain all character columns  LONG column included – Creates 3 tables  only 3 non-ASCII tables have LONG column TABLE_NAME NONASCII_EIM_ACCNT_DTL NONASCII_EIM_OPTY_DTL NONASCII_S_CS_QUEST_LANG

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results  27 minutes – Scan non-ASCII tables – Find '%¥_¥%' or '%¤_¤%‘ – Very likely EUCJP character set – Create three tables  Table_NA_EUCJP_Info (68 tables)  Table_NA_EUCJP_SQL (5 tables)  TABLE_NA_EUCJP_COL_INFO (213 columns) – 5 tables have EUCJP character data

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results SQL> select * from Table_NA_EUCJP_Info where rownum <= 10; TABLENAME NUM_NONASCII_ROWS NUM_NA_EUCJP_ROWS NUM_NONASCII_COLS NUM_NA_EUCJP_COLS NA_CO_AR_ADMIN NA_CO_AR_CON NA_CO_AUDIT_TABLE NA_CO_CX_S_ADDR_ORG_XM NA_CO_C_ACCOUNT NA_CO_C_ACT NA_CO_C_ADDRESS NA_CO_C_AR NA_CO_C_CONTACT NA_CO_C_OPTY

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results SQL> select * from Table_NA_EUCJP_SQL; TABLENAME LEN_NA_EUCJP_SQL NA_EUCJP_SQL NA_CO_AR_ADMIN 91 select count(*) from NA_CO_AR_ADMIN where 1=0 or X_DESC like '% ¥ _ ¥ %' or X_DESC like '% ¤ _ ¤ %' NA_CO_AR_CON 97 select count(*) from NA_CO_AR_CON where 1=0 or R3_X_NOTES like '% ¥ _ ¥ %' or R3_X_NOTES like '% ¤ _ ¤ %' NA_CO_S_ADDR_ORG 97 select count(*) from NA_CO_S_ADDR_ORG where 1=0 or COMMENTS like '% ¥ _ ¥ %' or COMMENTS like '% ¤ _ ¤ %' NA_CO_S_CONTACT 142 select count(*) from NA_CO_S_CONTACT where 1=0 or COMMENTS like '% ¥ _ ¥ %' or COMMENTS like '% ¤ _ ¤ %' or X_DEPT like '% ¥ _ ¥ %' or X_DEPT like '% ¤ _ ¤ %' NA_CO_S_SRV_REQ 200 select count(*) from NA_CO_S_SRV_REQ where 1=0 or X_NOTES like '% ¥ _ ¥ %' or X_NOTES like '% ¤ _ ¤ %' or X_DESC like '% ¥ _ ¥ %' or X_DESC like '% ¤ _ ¤ %' or X_ _NOTES like '% ¥ _ ¥ %' or X_ _NOTES like '% ¤ _ ¤ %'

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results SQL> select * from TABLE_NA_EUCJP_COL_INFO where rownum <=10; TABLENAME NUMNONASCIIROWS NUMNACOLS NACOLNUM NAEUCJPCOLNAME NUMNAEUCJPROWS NA_CO_AR_ADMIN X_DESC 9 NA_CO_AR_ADMIN LAST_NAME 0 NA_CO_AR_ADMIN FST_NAME 0 NA_CO_AR_ADMIN ACCOUNT 0 NA_CO_AR_ADMIN OWNER_LAST_NAME 0 NA_CO_AR_ADMIN R3_CREATED_LAST_NAME 0 NA_CO_AR_CON OWNER_LAST 0 NA_CO_AR_CON OWNER_FST 0 NA_CO_AR_CON R3_X_NOTES 4 NA_CO_AUDIT_TABLE FIELD2 0

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results  6 minutes – Create two copies of each EUCJP tables – First copy has all character columns of table – Second copy has only the EUCJP columns – Tables named  EUCJP_  EUCJP_CO_

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results SQL> select table_name from user_tables where table_name like 'EUCJP_%' minus select 2 table_name from user_tables where table_name like 'EUCJP_CO_%'; TABLE_NAME EUCJP_AR_ADMIN EUCJP_AR_CON EUCJP_S_ADDR_ORG EUCJP_S_CONTACT EUCJP_S_SRV_REQ SQL> select table_name from user_tables where table_name like 'EUCJP_CO_%'; TABLE_NAME EUCJP_CO_AR_ADMIN EUCJP_CO_AR_CON EUCJP_CO_S_ADDR_ORG EUCJP_CO_S_CONTACT EUCJP_CO_S_SRV_REQ

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results  EUCJP rows selected  Reviewed by dev team – EUCJP of all rows verified  Make copies of these tables for reference  Delete the EUCJP rows from the non-ASCII tables  Further scanning of the non-ASCII tables won’t consider the EUCJP rows

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results  47 minutes – Scan non-ASCII tables (again) – Find '%¥%' or '%¤%‘ – Could be EUCJP character set  Could also be WE character data – Create three tables  Table_NA_EUCJP_2_Info  Table_NA_EUCJP_2_SQL  TABLE_NA_EUCJP_2_COL_INFO – 3 tables have EUCJP character data

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Results  Possible EUCJP rows selected  Reviewed by dev team – EUCJP of all rows verified  Make copies of these tables for reference  Delete these EUCJP rows from the non-ASCII tables  Further scanning of the non-ASCII tables won’t consider these EUCJP rows

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 54 Next Steps  What I had planned  With the EUCJP rows verified and removed  Scan non-ASCII tables (yet again)  Look for 8859Pn character data – How? – WE languages, single isolated 8-bit byte code with ASCII (7-bit) byte codes on either side – Example: Bücher

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 55 Next Steps  Select likely WE rows from non-ASCII tables – Review with dev team – Determine source country for each row  Schema has ‘country code’  Select each row using character set of country – Verify rows with fluent speaker for each country – Remove rows from non-ASCII tables as verified  What to do with remaining rows – Not sure…

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 56 What Really Happened?  After 8 scripts  Dev team was able to – Identify likely country for each non-ASCII row – I identified likely character set for each country – I selected rows for each country  Using identified character set – Fluent speaker from each country verified  Rows as selected were correct – Wrote SQL to correctly convert rows to Unicode

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 57 Conversion  How to convert non-ASCII rows to Unicode? – New db uses AL32UTF8 character set  With correct character set identified  After importing into new 9i database – Convert back to WE8MSWIN1252 – Convert to AL32UTF8 – Example:  UPDATE SET = CONVERT (, WE8MSWIN1252, AL32UTF8);  UPDATE SET = CONVERT (, AL32UTF8, );

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 58 Script Summary  8 scripts, scanning 12 Gb of data – Run times  2 hours  12 minutes  10 minutes  7 minutes  1 minute  27 minutes  6 minutes  47 minutes  Total run time – 230 minutes, about 4 hours – Very slow development machine

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 59 Conclusions  For character set conversion – From any 8-bit character set (WE8ISO8859Pn) – To Unicode – Accept that some of the existing data may not be in the database character set – Don’t assume, verify  Use PL/SQL scripts,identify non-ASCII character data  Decide how to evaluate the non-ASCII data  Document, test, communicate – Make sure everyone knows how data from each character set is identified

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 60 Books Used  Oracle PL/SQL By Example – Rozenzweig, Silvestrova Prentice Hall 2004 – I needed lots of examples  multiple nested cursors – Needed to get going fast  Got help from experienced PL/SQL developer – Quotes issue – Even they couldn’t explain why the specific number of quotes works…but it did

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 61 CRM Unicode Conversion  Three separate presentations – 1) The overall conversion process  What we had, what we wanted, how to get there  Issues that come up during conversion – 2) Multi-byte data in the existing CRM db  What’s the issue, how did it happen  A general method to find and fix this problem – 3) The actual conversion  What really happened  Issues that came up and how they were resolved  Focus on DBA issues, not Siebel application

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 62 PL/SQL Notes  Quotes of quotes – Hard to know how many you need – Experiment – Test  PL/SQL that generates SQL that contains quoted strings  Keep it simple  Break up the task into multiple scripts  Generate tables of results, next script uses table(s) as input – Tables provide documentation of intermediate results

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 63 PL/SQL Notes  Second script – Looping to build up select SQL – Selects data from all non-ASCII columns  Initial select SQL has to be – NonAsciiSQL_stmt := 'select count(*) from '||TableName||' where 1=0 – Subsequent SQL of form NonAsciiSQL_stmt := NonAsciiSQL_stmt||' or '||TableCharColName|| – Needed ‘where 1=0 so we could append further OR clauses

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 64 PL/SQL Notes  LONG datatype – Third script created tables as select * from  Can’t do this when table has LONG column – Fourth script create tables by building up the create table SQL one column at a time  Skip the LONG column, if present in base table

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 65 PL/SQL Notes  DBMS_OUTPUT limitations – Only works for so long – Has limit of 1M characters  Scripts are not commercial grade – Testing statements are left in  Commented out – No error trapping – Still development scripts – They work, but they aren’t pretty

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 66 PL/SQL Notes  Scripts setup to – Run in SQL*Plus user’s schema – Output tables created in user’s schema  Could easily change scripts – Store output tables in separate schema – Take a schema as input  Scan tables in specified schema

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page 67 PL/SQL Script Example  Show PL/SQL of first script – Cursors with definitions that depend on loop variable of outer loop – Quotes and more quotes – Generating insert statements that are inserting strings of SQL

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Text set serveroutput on size ; declare cursor C_EucJpTabNames is select table_name from user_tables where table_name like 'NA_CO_%'; cursor C_EucJpTabCols (i_table_name varchar2) is select column_name from user_tab_columns where table_name = i_table_name order by column_id; TableName VARCHAR2(100); TableRowCount NUMBER; ColCount NUMBER; TableCharColName VARCHAR(100); NumAsciiPlusNon NUMBER; TableCharColNum NUMBER; Num_NA_EUCJP_Rows NUMBER; TabNum_NA_EUCJP_Rows NUMBER; Len_NA_EUCJP_SQL_stmt NUMBER; TabNum_NA_EUCJP_Cols NUMBER; CurNum_NA_EUCJP_Cols NUMBER; Sql_stmt VARCHAR2(4000); Sql_stmt2 VARCHAR2(4000) := 'COMMIT'; NA_EUCJP_SQL_stmt VARCHAR2(4000); NA_EUCJP_SQL_stmt_insert VARCHAR2(4000); NAColCount NUMBER; BEGIN --dbms_output.disable; Sql_stmt := 'create table Table_NA_EUCJP_Info (TableName VARCHAR2(30), NUM_NONASCII_ROWS NUMBER, NUM_NA_EUCJP_ROWS NUMBER, NUM_NONASCII_COLS NUMBER, NUM_NA_EUCJP_COLS NUMBER)'; execute immediate Sql_stmt; Sql_stmt := 'create table Table_NA_EUCJP_SQL (TableName VARCHAR2(30), Len_NA_EUCJP_SQL NUMBER, NA_EUCJP_SQL VARCHAR2(4000))'; execute immediate Sql_stmt;

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Text Sql_stmt := 'create table Table_NA_EUCJP_Col_Info (TableName VARCHAR2(30), NUMNONASCIIROWS NUMBER, NUMNACOLS NUMBER, NACOLNUM NUMBER, NAEUCJPCOLNAME VARCHAR2(30), NUMNAEUCJPROWS NUMBER)'; execute immediate Sql_stmt; open C_EucJpTabNames; LOOP FETCH C_EucJpTabNames into TableName; Exit when C_EucJpTabNames%NOTFOUND; NA_EUCJP_SQL_stmt := 'select count(*) from '||TableName||' where 1=0'; NA_EUCJP_SQL_stmt_insert := '''select count(*) from '||TableName||' where 1=0'; execute immediate 'select count(*) from user_tab_columns where table_name = ''' || TableName || '''' into NAColCount; dbms_output.put_line('here is the NA_EUCJP_SQL_stmt_insert '); dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'',1,255)); dbms_output.put_line('table name is '||TableName); execute immediate 'select count(*) from '||TableName into TableRowCount; TableCharColNum := 0; CurNum_NA_EUCJP_Cols := 0; open C_EucJpTabCols (TableName); LOOP FETCH C_EucJpTabCols into TableCharColName; Exit when C_EucJpTabCols%NOTFOUND; dbms_output.put_line('This is column '||TableCharColName); TableCharColNum := TableCharColNum + 1; -- compute the number of EUCJP rows for this column... execute immediate 'select count(*) from '||TableName|| ' where '||TableCharColName||' like ''% ¥ _ ¥ %'' or ' ||TableCharColName||' like ''% ¤ _ ¤ %''' into Num_NA_EUCJP_Rows; dbms_output.put_line('This column has '||Num_NA_EUCJP_Rows||' NA_EUCJP_ rows'); IF Num_NA_EUCJP_Rows != 0 THEN NA_EUCJP_SQL_stmt := NA_EUCJP_SQL_stmt||' or '||TableCharColName|| ' like ''% ¥ _ ¥ %'' or '||TableCharColName||' like ''% ¤ _ ¤ %'''; NA_EUCJP_SQL_stmt_insert := NA_EUCJP_SQL_stmt_insert||' or '||TableCharColName|| ' like ''''% ¥ _ ¥ %'''' or '||TableCharColName||' like ''''% ¤ _ ¤ %''''';

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Text CurNum_NA_EUCJP_Cols := CurNum_NA_EUCJP_Cols + 1; dbms_output.put_line('This is NA_EUCJP_Column number '||CurNum_NA_EUCJP_Cols); dbms_output.put_line('here is CurNum_NA_EUCJP_Cols'); dbms_output.put_line(CurNum_NA_EUCJP_Cols); dbms_output.put_line('SQL statement appended...'); END IF; -- insert column info... --Dummy_col_count := 999; Sql_stmt := 'insert into Table_NA_EUCJP_Col_Info values ('''||TableName||''', '||TableRowCount|| ', '||NAColCount||', '||TableCharColNum||', '''||TableCharColName||''','||Num_NA_EUCJP_Rows||')'; execute immediate Sql_stmt; dbms_output.put_line('Column info insert completed...'); End Loop; NA_EUCJP_SQL_stmt_insert := NA_EUCJP_SQL_stmt_insert||''''; dbms_output.put_line('here is the NA_EUCJP_SQL_stmt_insert '); dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'',1,255));

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Text TabNum_NA_EUCJP_Cols:= CurNum_NA_EUCJP_Cols; dbms_output.put_line('here is TabNum_NA_EUCJP_Cols'); dbms_output.put_line(TabNum_NA_EUCJP_Cols); -- update number of NAEUCJP columns... --Sql_stmt := 'update Table_NA_EUCJP_Col_Info set NUMNAEUCJPCOLS = TabNum_NA_EUCJP_Cols --where TableName = '''||TableName||''; --execute immediate Sql_stmt; --dbms_output.put_line('Number of NAEUCJP columns updated...'); Close C_EucJpTabCols; Len_NA_EUCJP_SQL_stmt := LENGTH (NA_EUCJP_SQL_stmt); dbms_output.put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt); dbms_output.put_line('here is the NA_EUCJP_SQL_stmt'); dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt||'',1,255)); --this has already been done above... --execute immediate 'select count(*) from '||TableName into TableRowCount; execute immediate 'select count(*) from user_tab_columns where table_name = ''' || TableName || '''' into ColCount;

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Text --NA_EUCJP_SQL_stmt := 'testing'; TabNum_NA_EUCJP_Rows := 0; execute immediate NA_EUCJP_SQL_stmt into TabNum_NA_EUCJP_Rows; dbms_output.put_line('Number of NA_EUCJP_ rows... '||TabNum_NA_EUCJP_Rows); --Len_NA_EUCJP_SQL_stmt := 0; dbms_output.put_line('Num rows in the table '||TableRowCount); dbms_output.put_line('Num columns in the table '||ColCount); dbms_output.put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt); dbms_output.put_line('Num NAEUCJP_ Rows '||TabNum_NA_EUCJP_Rows); dbms_output.put_line('Num NAEUCJP_ Columns '||TabNum_NA_EUCJP_Cols); Sql_stmt := 'insert into Table_NA_EUCJP_Info values ('''||TableName||''', '||TableRowCount|| ', '||TabNum_NA_EUCJP_Rows||', '||ColCount||', '||TabNum_NA_EUCJP_Cols||')'; execute immediate Sql_stmt; dbms_output.put_line('First insert completed...'); -- If number of EUCJP rows is non-zero, insert select SQL into SQL table IF TabNum_NA_EUCJP_Rows != 0 THEN Sql_stmt := 'insert into Table_NA_EUCJP_SQL values ('''||TableName||''', '||Len_NA_EUCJP_SQL_stmt|| ', '||NA_EUCJP_SQL_stmt_insert||')'; execute immediate Sql_stmt; dbms_output.put_line('Second insert completed...');

DCSIT Technical Services DBA Brian Hitchcock November 11, 2004Page th Script Text End If; execute immediate Sql_stmt2; End Loop; Close C_EucJpTabNames; End; /