Junk characters when moving data from VARCHAR to CHAR

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
deesh
Participant
Posts: 193
Joined: Mon Oct 08, 2007 2:57 am

Junk characters when moving data from VARCHAR to CHAR

Post by deesh »

Hi,

My source having CHAR datatype and target has VARCHAR datatype. While moving data I am getting junk character into target.

Same thing s happening while moving data from VARCHAR to CHAR.

One of the options is to trim the column through out the job, but I need a better option.

Please suggest how to remove the junk char.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, let's start with trying to explain what exactly your "junk" characters are. The vast majority of the time, they're simply perfectly good but misunderstood and mistranslated data, i.e. a characterset conversion issue. Can you post some before and after examples, please, so we can get you some proper help? I'd also like to know what exactly your source and target are and if you have any idea what characterset they are.
-craig

"You can never have too many knives" -- Logan Nine Fingers
deesh
Participant
Posts: 193
Joined: Mon Oct 08, 2007 2:57 am

Post by deesh »

It is DB2 data, let me show an example

VARCHAR(5) data "ABCD" moved to CHAR(5) now data "ABCD " in the 5th character junk will happen instead of space.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need to tell us what "junk" means, what the value of this unwanted character is. There must be a DB2 function that will give you that, on Oracle is it called DUMP. Also, what is your APT_STRING_PADCHAR environment variable set to? Left at its default, it can look like "junk" to some folks / systems.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Because you have not identified what kind of "junk" you are getting - my best guess is that your project is still using the default value 0x0 (Hex 0) for padding CHAR fields. CHAR fields are always automatically padded to their max length. Your job is probably using the default value from $APT_STRING_PADCHAR in the project environment variable settings.

Hex 0 is a somewhat problematic default. The 0x0 character is invisible in most cases unless you use "cat -tv", "od" or some other hex viewing mechanism.

The quick fix is to put $APT_STRING_PADCHAR into your job and set it to a space.

Note - if other jobs are also using that 0x0 default pad character, you might have issues with data from those jobs failing joins with data from your jobs.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply