Teradata Multi-load fails with character error

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

Teradata Multi-load fails with character error

Post by kurics40 »

Hi everybody,

I have multiload error.

I have UTF8 input -> Teradata multiload UTF8 default charset is ok BUT it fails with 6706 or 6705 error which says character error . As I know it is unicode error character such as U-FFFD or in ASCII 0x1a. How can I clear them from the txt?

I haven't get used to datastage well. Unfortunately I can not investigate from Teradata side with the translate_chk function. I don't have rights on the client side.

In this case I need Basic language solution.
Can somebody give me a good idea?
\
*Note - Subect Title Changed to be more descriptive - Andy*
Last edited by kurics40 on Thu Jan 21, 2010 6:40 pm, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you really think that's your solution, use Convert() along with the decimal value of the character to strip it from the string:

Code: Select all

Convert(CHAR(26),"",YourField)
:idea: Next time please do a better job of choosing a thread subject. We know your question would be about "DataStage" so make it pertinent to your actual issue. This one could have been "Multiload character error" or something along those lines.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

Post by kurics40 »

Thank you so much.

I guess I found out the source of the problem. U - 0000 FFFD = ef bf bd

This is the error character of the unicode and Teradata doesn't accept it.
How can I eliminate them with datatstage?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted above.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You don't eliminate them - they are your customer's data!

You figure out how to handle them, typically by specifying the correct character mappings in both DataStage and Teradata.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can only "mend" what you call "bad" characters if you know what they are and what they should be. You can, of course, eliminate them (with permission).

Oconv() supports NLSmapname as its second argument - perhaps you could investiate this?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

Post by kurics40 »

Allright. Sorry for the comment. I see I did a mistake.

Okay lets investigate the issue:

There is one typical problematic row.

"357 243 267" =  makes the 99% the problem

cat 203row
|Suspect_GSFAISLPLFI2|FIC1303835|FIA1176256|SYTO INTERNATIONAL BV|SYTO INTERNATIONAL BV

od -c 203row - ASCII way
0000000 357 243 267 | S u s p e c t _ G S F A
0000020 I S L P L F I 2 | F I C 1 3 0 3
0000040 8 3 5 | F I A 1 1 7 6 2 5 6 | S
0000060 Y T O I N T E R N A T I O N A
0000100 L B V | S Y T O I N T E R N
0000120 A T I O N A L B V \n
0000133


od -x 203row
0000000 efa3 b77c 5375 7370 6563 745f 4753 4641
0000020 4953 4c50 4c46 4932 7c46 4943 3133 3033
0000040 3833 357c 4649 4131 3137 3632 3536 7c53
0000060 5954 4f20 494e 5445 524e 4154 494f 4e41
0000100 4c20 4256 7c53 5954 4f20 494e 5445 524e
0000120 4154 494f 4e41 4c20 4256 0a00
0000133




Teradata says:
6705 An illegally formed character string was
encountered during translation.
..
Unicode Any single-byte character (SBC).

It would be good if I can at least transform the unicode character till the | sign ASCII or LATIN if it is single-byte (damaged) Unicode.
Last edited by kurics40 on Thu Jan 21, 2010 5:07 pm, edited 1 time in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I don't know the answer, because it will depend on how your data are actually encoded. And there's no way I can know this, so I suggest a more general approach.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

Post by kurics40 »

Might be the answer is the recode UTF8 to UTF16 and after to UTF8 ...

- I know it can be a temporaly solution to replace those characters, but what if I will receive in the future different type of strange characters? -
kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

possible solution

Post by kurics40 »

Hi all,

Might be I found a solution:

http://www.teradata.com/DownloadCenter/ ... 123-1.aspx

The other can be to write Inmod Multiload.
If somebody knows a DataStage solution for this it would be great.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Did you ask your official support provider?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

Post by kurics40 »

chulett wrote:Did you ask your official support provider?
I don't understand what you mean. This is the offical Teradata state in this topic.

Btw
I am looking for DataStage solution.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Ask whomever is officially providing you with DataStage support.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

Post by kurics40 »

Aha! I see. No I haven't asked the offical Datastage support. Good idea! :)
That will be the next step.
kurics40
Premium Member
Premium Member
Posts: 61
Joined: Wed Nov 18, 2009 10:01 am

Workaround

Post by kurics40 »

As some of my colleague asked more workarounds:

You can create error table in teradata of any table:

Syntax
CREATE ERROR TABLE FOR tbl_name;

What I recognized that many quick and dirty developers try to load the data in ansi mode because it gives no error. If the data is in UTF8 it is highly not recommended to load in ANSI. UTF8 can represent a character in 1 or 2 or 3 or 4 bytes as a multibyte character. If ANSI latin mode mapps each byte as a valid character than it screws up the data in irreversible way in the target, unless the UTF8 data content matches in a special scenario with ANSI latin characterset.
Post Reply