This is a short story about Oracle data migration from this
The story simple, I had to move data from one Oracle database (charset Latin1 vers. 8i) to another oracle database (charset Unicode vers. 10g). So it was not as easy I thought first, because I recognized the source database store strange characters. I mean, somehow the client application recorded characters what are not represent in Latin1 charset.
The database stored these characters as raw data, but somebody would have wanted read, they would gotten a strange format. Here few examples (we had more than 7700):
I discovered four different type of client character set store in Lantin1 database. ISO8859P1, WIN1250, WIN1252 and Unicode. I do not want to give you more detail how to change the database character set, here few useful links:
I had to solve the strange character problem, so here is my solution steps:
1. export the schema from the source database what is using WE8ISO8859P1 character set
login the source location and shut down the database
start the database in restricted mode and export the schema
exp system/… owner=… file=… log=…
check the export log, stop database and transfer the dump file to temporary location
2. Move the export dump into temporary database and create a new export dump what is using WE8MSWIN1252 character set
create tablespaces in temporary db and schema user
create tablespace … datafile ‘/oracle/k2/….dbf’ size 1000m autoextend on next 10m maxsize 3G;
create temporary tablespace SYS_TMP_1 tempfile ‘/oracle/k2/SYS_TMP_1.dbf’ size 1000m autoextend on next 10m maxsize 3G;
create user AUDITM identified by AUDITM default tablespace AUDITMAN temporary tablespace SYS_TMP_1;
grant connect to AUDITM;
alter user AUDITM account unlock;
grant create table to AUDITM;
grant create any table to AUDITM;
grant unlimited tablespace to AUDITM;
install character set convert tool (provided by ORACLE) in temporary db
sqlplus / as sysdba
import data from source to temporary db what using the same character set as source db (WE8ISO8859P1)
imp SYSTEM/oracle file=/tmp/exp_data/data.dmp full=yes ignore=yes log=~/imp_k2.log
run character set diagnostic tool discover the different chars than WE8ISO8859P1. Different characters are what not part of the standard latin 1 character set.
csscan system/oracle FULL=Y FROMCHAR=WE8ISO8859P1 TOCHAR=WE8ISO8859P1 LOG=WE8_TO_WE8 CAPTURE=Y ARRAY=1000000
check the lossy records what charaters are responsible for loosing data
select * from csmig.csmv$errors;
Create a buffer table what we can use during the transport process to help conversation
create table GY_CONVERT_CHAR
owner_name VARCHAR2(30) not null,
table_name VARCHAR2(30) not null,
column_name VARCHAR2(30) not null,
Create and execute a PL/SQl procedure what fill the buffer table
This procedure is including the main part of the process. Here, I step by step analyzed the characters what are not suitable with Latin1 code map. And also here I put these characters code into the table, because during the character conversion these values would be changed. For more details please check the procedure details and comments.
Run csscan again on temp db with different parameters to prepare to convert to WE8MSWIN1252 character set since there is lossy data
I had to do this step middle of the whole process, because the direct Unicode conversion failed.
csscan system/oracle FULL=Y FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 LOG=WE8WIN_to_WE8WIN CAPTURE=Y ARRAY=1000000
Check the scanning result
select * from csmig.csmv$errors;
( seems to be fine, all character type application data remain the same in the new character set)
Change the temp database character set to WE8MSWIN1252
select * from v$NLS_PARAMETERS;
( NLS_CHARACTERSET WE8MSWIN1252)
Run csscan to prepare to conver to AL32UTF8 character set on temporary db.
(We won’t be using the CSALTER method to do the conversion but this step will identify any data that is labeled TRUNCATION.)
csscan system/oracle FULL=Y FROMCHAR=WE8MSWIN1252 TOCHAR=AL32UTF8 LOG= WE8TOUTF8 CAPTURE=Y ARRAY=1000000
Truncation Data is data that when expanded from single byte to multi byte would no longer fit in the column length and therefore the data would be truncated on import.
— Truncation data is resolved by setting NLS_LENGTH_SEMANTICS = CHAR.
— In our case it is necessary because we have … truncation occurs
Make a full schema DATAPUMP export of the temp database.
expdp system/oracle schemas=AUDITM dumpfile=exp_auditm.dmp logfile=exp_auditm.log
Export the buffer table
expdp system/oracle tables=system.GY_CONVERT_CHAR dumpfile=exp_gy_table.dmp logfile=exp_gy_table.log
Check the export logs, stop database and transfer the new dump file to target location
3. Move the new export dump into target database what is using AL32UTF8 character set, import , clean, correct and consolidate the characters
In target database have to set the NLS_LENGTH_SEMATICS parameter, when import process create the new table it must be long enough
sqlplus / as sysdba
ALTER SYSTEM SET NLS_LENGTH_SEMANTICS=CHAR scope=spfile;
— Bug 1488174 Don’t use ALTER SYSTEM SET NLS_LENGTH_SEMANTICS scope=both;
Create tablespace in target db where we import
create tablespace AUDITMAN datafile ‘/oracle/krumplee/AUDITMAN01.dbf’ size 1000m autoextend on next 10m maxsize 3G;
Import the schema data to the target database
cp ../exp_auditm.dmp …
( or create oracle directory)
impdp system/oracle schemas=AUDITM dumpfile=exp_auditm.dmp logfile=imp_auditm.log
Import the buffer table
cp …/exp_gy_table.dmp …
impdp system/oracle tables=system.GY_CONVERT_CHAR dumpfile=exp_gy_table.dmp logfile=imp_gy_table.log
Check the import logs
Execute server side part of the buffer table’s filling (it takes very long time, so it took in ORACLE job)
Here, after the data moved (including buffer working table) I read all stored ascii character and record in right (Unicode) format.
sys.dbms_job.submit(job => n, what => ‘GY_COMPARE;’);
select count(*) from GY_CONVERT_CHAR t where column_new_content is not null;
Update the cleaned data into the right place
And finally with the short script I put the right version to the right place.
4. Check the result
Just one example. This shows us everything:
– First we have unrecognized character format in database
– Second converted into target database and corrected the strange chars
– And finally checked the result. Because of some fields include the English version of the text I used Google translate to demonstrate how does it look like