Job Control Code Characterset Issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

premkishore1983
Participant
Posts: 13
Joined: Mon Oct 08, 2007 4:41 am
Location: India

Job Control Code Characterset Issue

Post by premkishore1983 »

Hi All,

Currently I'm in the process of migrating the datastage codes from DS9.1 (mounted in AIX) to DS11.3 (mounted in Red Hat 4.4.7-16).

Below universal code is written for scrubbing/converting particular position of characters of incoming EBCIDIC file into spaces.

It is working fine in Datastage9.1 in AIX & the downstream is able to process the files without any issues.

But when i try to run the same code against the same input EBCIDIC file in Datastage11.3 in LINUX, I could find the output file size is getting increased by 2.5 times & the downstream processing of the output file genertaed by this code is getting impacted.

I also tried commenting out the conversion piece of code & re-directing the input data directly to the output file, but still the output file size is getting increased 2.5 times.

Any help/suggestions on this regard could be really helpful.

Code: Select all

FILE_NAME = SOURCE_DIR:"/":SOURCE_NAME:".new"
GOSUB INIT
*
OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".src" TO F.SRC ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.src',"Output")
  ErrorCode = 1
  GOTO 10000
END
*
OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".new" TO F.NEW ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
  ErrorCode = 1
  GOTO 10000
END
*
CNT = 1
EOF = 0
LOOP UNTIL EOF DO
  READBLK RECORD FROM F.SRC,LENGTH THEN
    RECORD[76,6] = STR(CHAR(32),6);                               
    WRITEBLK RECORD ON F.NEW ELSE
      Call DSLogInfo('Unable to write to ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
      ErrorCode = 1
      GOTO 10000
    END
    CNT = CNT + 1
    IF CNT/100000 = INT(CNT/100000) THEN
      Call DSLogInfo(CNT:' records processed...',"Output")
    END
  END ELSE 
    EOF = 1
  END
  *
REPEAT
*
      Call DSLogInfo(CNT:' records processed...',"Output")

CLOSESEQ F.SRC
CLOSESEQ F.NEW
*
GOTO 10000
*
INIT: * - - Initialize Unix Flat File - - *
*
  Command = "rm ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "touch ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "chmod 660 ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
RETURN
10000
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Without looking at the code itself, the first thing that comes to mind when someone says "the output file size is getting increased by 2.5 times" that there is a codepage / characterset issue. Can you confirm / deny that?
-craig

"You can never have too many knives" -- Logan Nine Fingers
premkishore1983
Participant
Posts: 13
Joined: Mon Oct 08, 2007 4:41 am
Location: India

Post by premkishore1983 »

Thanks for the reply Chuck.

What do you mean by the codepage/characterset issue?
Can you please elaborate?

NLS Mapping was set to Project default (ISO8859-1) in both the environments.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Craig, actually... not Chuck. :wink:

I'm wondering what character encoding the file is being created with and if, perhaps, you've gone from single-byte to a multi-byte one. ISO 8859-1 is 8-bit single-byte coded graphic character set so guessing that's not the issue.

Have you compared the two files, either visually or thru your tool of choice? Can you determine where the increase in file size is coming from? If not, it might help to post a couple of records from each version, wrapped in

Code: Select all

[/b] tags so we can see them.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How about in a hex dump format? od -h should shed more light on this, I would think.
-craig

"You can never have too many knives" -- Logan Nine Fingers
premkishore1983
Participant
Posts: 13
Joined: Mon Oct 08, 2007 4:41 am
Location: India

Post by premkishore1983 »

Input :

Code: Select all

0000000 f1f0 f6f1 e3c1 40d4 4040 f0f0 f0f5 f0f0
0000020 f0f8 f9f9 f1f0 f4f5 f6f6 f1f4 f0f0 f0f5
0000040 f1f1 d9f6 c5c4 d6d4 e3d5 c9c5 d5c1 c440
0000060 4040 f2f0 f0f4 f0f0 02f1 8120 0c01 5001
0000100 3053 c33c f3f2 d2f4 40c3 2040 2020 2020
0000120
AIX OutPut :

Code: Select all

0000000 f1f0 f6f1 e3c1 40d4 4040 f0f0 f0f5 f0f0
0000020 f0f8 f9f9 f1f0 f4f5 f6f6 f1f4 f0f0 f0f5
0000040 f1f1 d9f6 c5c4 d6d4 e3d5 c9c5 d5c1 c440
0000060 4040 f2f0 f0f4 f0f0 02f1 8120 0c01 5001
0000100 3053 c33c f3f2 d2f4 40c3 4040 4040 4040
0000120

LINUX Output :

Code: Select all

0000000 bfef efbd bdbf bfef efbd bdbf bfef efbd
0000020 bdbf bfef 40bd 4040 bfef efbd bdbf bfef
0000040 efbd bdbf bfef efbd bdbf bfef efbd bdbf
0000060 bfef efbd bdbf bfef efbd bdbf bfef efbd
0000100 bdbf bfef efbd bdbf bfef efbd bdbf bfef
0000120 efbd bdbf bfef efbd bdbf bfef efbd bdbf
0000140 bfef efbd bdbf bfef efbd bdbf bfef efbd
0000160 bdbf bfef efbd bdbf bfef efbd bdbf bfef
0000200 efbd bdbf ef40 bdbf 4040 bfef efbd bdbf
0000220 bfef efbd bdbf bfef efbd bdbf bfef 02bd
0000240 ef20 bdbf 0c01 5001 3053 ef3c bdbf bfef
0000260 efbd bdbf bfef efbd bdbf bfef 40bd 2040
0000300 2020 2020
0000304
premkishore1983
Participant
Posts: 13
Joined: Mon Oct 08, 2007 4:41 am
Location: India

Post by premkishore1983 »

Hi All,

Good Morning,
Any suggestions to resolve this issue would be really helpful.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... still obviously some kind of codeset issue, though I couldn't tell you what you got there on the LINUX side. For both input and AIX, this:

Code: Select all

"f1f0 f6f1 e3c1 40d4 4040 f0f0 f0f5 f0f0" = "1061TA M  00500"
This, the alleged equivalent on the LINUX side:

Code: Select all

"bfef efbd bdbf bfef efbd bdbf bfef efbd"
I have no idea what that is. Sorry. Hopefully someone else can be more helpful.

I'm wondering if it is a Big Endian (AIX) vs. Little Endian (LINUX I believe) issue? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I think it might be a byte order issue, as Craig suspects.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
premkishore1983
Participant
Posts: 13
Joined: Mon Oct 08, 2007 4:41 am
Location: India

Post by premkishore1983 »

Thanks for the reply Craig & Ray.

Can you please let me know where could i check the codeset / byte order issue. I mean whether this has to be checked at Datastage Level or OS level?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the source is a file, there may be a Byte Order Mark at the beginning of the file. If not there is no easy way to check the byte order, but you could try converting the file using dos2unix or unix2dos command, as appropriate, and determining whether the result is more sensible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... that's what BOM stands for. Duh. :wink:

There are many resources available online regarding the conversion from one 'endian' to another, for example this one. I'm wondering how the file gets 'downstream' from the Linux system, perhaps this transfer mechanism (something like Connect:Direct I imagine) could leverage the conversion. Do you have any friendly System Administrator types that might be able to help with this?
-craig

"You can never have too many knives" -- Logan Nine Fingers
premkishore1983
Participant
Posts: 13
Joined: Mon Oct 08, 2007 4:41 am
Location: India

Post by premkishore1983 »

Thanks Craig for sharing the information on how little endian & big endian works.

We get the Input file from Mainframe connect directed to our AIX server, Since the connection between the mainframe and Linux was not established yet, we scp'ed the input file from AIX to LINUX for testing our jobs & we identified this issue.
premkishore1983
Participant
Posts: 13
Joined: Mon Oct 08, 2007 4:41 am
Location: India

Post by premkishore1983 »

I'm trying to convert the byte order to bid endian by using FORMAT.CONV, as like below

Code: Select all

FILE_NAME = SOURCE_DIR:"/":SOURCE_NAME:".new"
FORMAT.CONV -u FILE_NAME

GOSUB INIT
*
OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".src" TO F.SRC ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.src',"Output")
  ErrorCode = 1
  GOTO 10000
END
*

OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".new" TO F.NEW ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
  ErrorCode = 1
  GOTO 10000
END
*
CNT = 1
EOF = 0
LOOP UNTIL EOF DO
  READBLK RECORD FROM F.SRC,LENGTH THEN
    RECORD[76,6] = STR(CHAR(32),6);                             
    WRITEBLK RECORD ON F.NEW ELSE
      Call DSLogInfo('Unable to write to ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
      ErrorCode = 1
      GOTO 10000
    END
    CNT = CNT + 1
    IF CNT/100000 = INT(CNT/100000) THEN
      Call DSLogInfo(CNT:' records processed...',"Output")
    END
  END ELSE 
    EOF = 1
  END
  *
REPEAT
*
      Call DSLogInfo(CNT:' records processed...',"Output")

FORMAT.CONV -u FILE_NAME

CLOSESEQ F.SRC
CLOSESEQ F.NEW
*
GOTO 10000
*
INIT: * - - Initialize Unix Flat File - - *
*
  Command = "rm ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "touch ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "chmod 660 ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
RETURN
10000
but i'm getting the following error,


Warning
0018 FORMAT.CONV -u FILE_NAME
^
Variable Name (UNDEFINED) unexpected, Was expecting: Assignment Operator
0057 FORMAT.CONV -u FILE_NAME
^
Variable Name (UNDEFINED) unexpected, Was expecting: Assignment Operator

2 Errors detected, No Object Code Produced
Am i missing something here, any suggestions would be helpful.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You would need to issue the line like "EXECUTE 'FORMAT.CONV -u ':FILE_NAME CAPTURING ScreenIO RETURNING ErrorCode" in order to do that.
Post Reply