combining marks are not in canonical order

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sid19
Participant
Posts: 64
Joined: Mon Jun 18, 2007 12:17 am
Location: kolkata

combining marks are not in canonical order

Post by sid19 »

Hi,

we have a simple job, it extracts data from Oracle and loads into Netezza through Netezza connector. Oracle source contains data in Arabic(Combining Charcater). The job is aborting with following error message

---------------------------------
Reason: [SQLCODE=HY008][Native=51] Operation canceled; [SQLCODE=HY000][Native=46] ERROR: External Table : count of bad input rows reached maxerrors limit (CC_NZCommon::checkThreadStatusThrow, file CC_NZCommon.cpp, line 424)
----------------------------------


we have looked into NZLOG file and found the following


Found bad records

bad #: input row #(byte offset to last char examined) [field #, declaration] diagnostic, "text consumed"[last char examined]
----------------------------------------------------------------------------------------------------------------------------
1: 3(50) [2, NVARCHAR(4000)] not NFC - combining marks are not in canonical order



When we have tried to insert the same record directly on Netezza Insert it also failed. But it get loaded after applying nzconvert -f utf8 -nfc
on data.

How can we implement the same in Datastage so that we can load the data.

Thanks
Sid
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Looking up nzconvert says that it is used "to convert between any two encodings, between these encodings and UTF-8, and from UTF-32, UTF-16, or UTF-8 to NFC, for loading with the nzload command or external tables". So I would imagine the solution in DataStage would be to use the proper characterset so it can happen automatically. What NLS settings are you using in the job?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sid19
Participant
Posts: 64
Joined: Mon Jun 18, 2007 12:17 am
Location: kolkata

Post by sid19 »

Hi Craig,

Our source(Oracle) and Target(Netezza) both are UTF8. So we are using project default UTF8, NLS in our datastage job.

Thanks
Sid
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Start by reading this W3 article on canonical normalization issues. It may give some insight into where in your data the problem lies.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply