Migration steps from Latin1 data to Unicode transport (based on true story :)

This is a short story about Oracle data migration from this

to this


The story simple, I had to move data from one Oracle database (charset Latin1 vers. 8i) to another oracle database (charset Unicode vers. 10g). So it was not as easy I thought first, because I recognized the source database store strange characters. I mean, somehow the client application recorded characters what are not represent in Latin1 charset.

The database stored these characters as raw data, but somebody would have wanted read, they would gotten a strange format.  Here few examples (we had more than 7700):

I discovered four different type of client character set store in Lantin1 database.  ISO8859P1, WIN1250, WIN1252 and Unicode. I do not want to give you more detail how to change the database character set, here few useful links:

I had to solve the strange character problem, so here is my solution steps:

1. export the schema from the source database what is using WE8ISO8859P1 character set
login the source location and shut down the database

shutdown immediate;
start the database in restricted mode and export the schema
startup restrict;
exp system/… owner=… file=… log=…

check the export log, stop database and transfer the dump file to temporary location

2. Move the export dump into temporary database and create a new export dump what is using WE8MSWIN1252 character set
create tablespaces in temporary db and schema user

create tablespace … datafile ‘/oracle/k2/….dbf’ size 1000m autoextend on next 10m maxsize 3G;
create temporary tablespace SYS_TMP_1 tempfile ‘/oracle/k2/SYS_TMP_1.dbf’ size 1000m autoextend on next 10m maxsize 3G;
create user AUDITM identified by AUDITM default tablespace AUDITMAN temporary tablespace SYS_TMP_1;
grant connect to AUDITM;
alter user AUDITM account unlock;
grant create table to AUDITM;
grant create any table to AUDITM;
grant unlimited tablespace to AUDITM;

install character set convert tool (provided by ORACLE) in temporary db

sqlplus / as sysdba
spool csminst.log
import data from source to temporary db what using the same character set as source db (WE8ISO8859P1)
imp SYSTEM/oracle file=/tmp/exp_data/data.dmp full=yes ignore=yes log=~/imp_k2.log

run character set diagnostic tool discover the different chars than WE8ISO8859P1. Different characters are what not part of the standard latin 1 character set.

csscan system/oracle FULL=Y FROMCHAR=WE8ISO8859P1 TOCHAR=WE8ISO8859P1 LOG=WE8_TO_WE8 CAPTURE=Y ARRAY=1000000

check the lossy records what charaters are responsible for loosing data

select * from csmig.csmv$errors;

Create a buffer table what we can use during the transport process to help conversation

create table GY_CONVERT_CHAR
owner_name VARCHAR2(30) not null,
table_name VARCHAR2(30) not null,
column_name VARCHAR2(30) not null,
data_rowid VARCHAR2(100),
column_type VARCHAR2(9),
column_content CLOB,
column_dump CLOB,
column_new_content CLOB,
column_new_dump_utf8 CLOB,
column_new_dump_win1250 CLOB,
column_new_dump_win1252 CLOB,
rec_key VARCHAR2(300),
rec_key_value VARCHAR2(300),
rec_type VARCHAR2(300),
rec_description VARCHAR2(4000)

Create and execute a PL/SQl procedure what fill the buffer table
This procedure is including the main part of the process. Here, I step by step analyzed the characters what are not suitable with Latin1 code map. And also here I put these characters code into the table, because during the character conversion these values would be changed. For more details please check the procedure details and comments.


execute gy_fill_error_table;

Run csscan again on temp db with different parameters to prepare to convert to WE8MSWIN1252 character set since there is lossy data
I had to do this step middle of the whole process, because the direct Unicode conversion failed.


Check the scanning result

select * from csmig.csmv$errors;

( seems to be fine, all character type application data remain the same in the new character set)

Change the temp database character set to WE8MSWIN1252

shutdown immediate
startup restrict
select * from v$NLS_PARAMETERS;


Run csscan to prepare to conver to AL32UTF8 character set on temporary db.
(We won’t be using the CSALTER method to do the conversion but this step will identify any data that is labeled TRUNCATION.)


Truncation Data is data that when expanded from single byte to multi byte would no longer fit in the column length and therefore the data would be truncated on import.
— Truncation data is resolved by setting NLS_LENGTH_SEMANTICS = CHAR.
— In our case it is necessary because we have … truncation occurs

Make a full schema DATAPUMP export of the temp database.

expdp system/oracle schemas=AUDITM dumpfile=exp_auditm.dmp logfile=exp_auditm.log

Export the buffer table

expdp system/oracle tables=system.GY_CONVERT_CHAR dumpfile=exp_gy_table.dmp logfile=exp_gy_table.log

Check the export logs, stop database and transfer the new dump file to target location

3. Move the new export dump into target database what is using AL32UTF8 character set, import , clean, correct and consolidate the characters
In target database have to set the NLS_LENGTH_SEMATICS parameter, when import process create the new table it must be long enough

sqlplus / as sysdba
— Bug 1488174 Don’t use ALTER SYSTEM SET NLS_LENGTH_SEMANTICS scope=both;
shutdown immediate

Create tablespace in target db where we import

create tablespace AUDITMAN datafile ‘/oracle/krumplee/AUDITMAN01.dbf’ size 1000m autoextend on next 10m maxsize 3G;

Import the schema data to the target database

cp ../exp_auditm.dmp …
( or create oracle directory)
impdp system/oracle schemas=AUDITM dumpfile=exp_auditm.dmp logfile=imp_auditm.log

Import the buffer table

cp …/exp_gy_table.dmp …
impdp system/oracle tables=system.GY_CONVERT_CHAR dumpfile=exp_gy_table.dmp logfile=imp_gy_table.log

Check the import logs

Execute server side part of the buffer table’s filling (it takes very long time, so it took in ORACLE job)

Here, after the data moved (including buffer working table) I read all stored ascii character and record in right (Unicode) format.


n number;
sys.dbms_job.submit(job => n, what => ‘GY_COMPARE;’);

for monitoring:

select count(*) from GY_CONVERT_CHAR t where column_new_content is not null;

Update the cleaned data into the right place
And finally with the short script I put the right version to the right place.


4. Check the result
Just one example. This shows us everything:
– First we have unrecognized character format in database

– Second converted into target database and corrected the strange chars

– And finally checked the result. Because of some fields include the English version of the text I used Google translate to demonstrate how does it look like


Author: gyzszabo

IT Architect ...

One thought on “Migration steps from Latin1 data to Unicode transport (based on true story :)”

  1. Just small comment. It is recommended execute the csscan with sysdba privileges use the following way:
    [o112@vic ~]$ csscan FULL=Y TONCHAR=UTF8 LOG=check CAPTURE=Y ARRAY=1000000 PROCESS=2

    Character Set Scanner v2.2 : Release – Production on Mon Nov 28 15:26:27 2011

    Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

    Username: sys as sysdba <——– !!!!!!
    Password: …..

    Connected to:
    Oracle Database 11g Enterprise Edition Release – 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s