Page 1 of 2

Job Control Code Characterset Issue

Posted: Thu Jun 30, 2016 11:51 am
by premkishore1983
Hi All,

Currently I'm in the process of migrating the datastage codes from DS9.1 (mounted in AIX) to DS11.3 (mounted in Red Hat 4.4.7-16).

Below universal code is written for scrubbing/converting particular position of characters of incoming EBCIDIC file into spaces.

It is working fine in Datastage9.1 in AIX & the downstream is able to process the files without any issues.

But when i try to run the same code against the same input EBCIDIC file in Datastage11.3 in LINUX, I could find the output file size is getting increased by 2.5 times & the downstream processing of the output file genertaed by this code is getting impacted.

I also tried commenting out the conversion piece of code & re-directing the input data directly to the output file, but still the output file size is getting increased 2.5 times.

Any help/suggestions on this regard could be really helpful.

Code: Select all

FILE_NAME = SOURCE_DIR:"/":SOURCE_NAME:".new"
GOSUB INIT
*
OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".src" TO F.SRC ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.src',"Output")
  ErrorCode = 1
  GOTO 10000
END
*
OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".new" TO F.NEW ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
  ErrorCode = 1
  GOTO 10000
END
*
CNT = 1
EOF = 0
LOOP UNTIL EOF DO
  READBLK RECORD FROM F.SRC,LENGTH THEN
    RECORD[76,6] = STR(CHAR(32),6);                               
    WRITEBLK RECORD ON F.NEW ELSE
      Call DSLogInfo('Unable to write to ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
      ErrorCode = 1
      GOTO 10000
    END
    CNT = CNT + 1
    IF CNT/100000 = INT(CNT/100000) THEN
      Call DSLogInfo(CNT:' records processed...',"Output")
    END
  END ELSE 
    EOF = 1
  END
  *
REPEAT
*
      Call DSLogInfo(CNT:' records processed...',"Output")

CLOSESEQ F.SRC
CLOSESEQ F.NEW
*
GOTO 10000
*
INIT: * - - Initialize Unix Flat File - - *
*
  Command = "rm ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "touch ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "chmod 660 ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
RETURN
10000

Posted: Thu Jun 30, 2016 12:19 pm
by chulett
Without looking at the code itself, the first thing that comes to mind when someone says "the output file size is getting increased by 2.5 times" that there is a codepage / characterset issue. Can you confirm / deny that?

Posted: Thu Jun 30, 2016 1:14 pm
by premkishore1983
Thanks for the reply Chuck.

What do you mean by the codepage/characterset issue?
Can you please elaborate?

NLS Mapping was set to Project default (ISO8859-1) in both the environments.

Posted: Thu Jun 30, 2016 1:35 pm
by chulett
Craig, actually... not Chuck. :wink:

I'm wondering what character encoding the file is being created with and if, perhaps, you've gone from single-byte to a multi-byte one. ISO 8859-1 is 8-bit single-byte coded graphic character set so guessing that's not the issue.

Have you compared the two files, either visually or thru your tool of choice? Can you determine where the increase in file size is coming from? If not, it might help to post a couple of records from each version, wrapped in

Code: Select all

[/b] tags so we can see them.

Posted: Fri Jul 01, 2016 12:31 pm
by chulett
How about in a hex dump format? od -h should shed more light on this, I would think.

Posted: Fri Jul 01, 2016 1:38 pm
by premkishore1983
Input :

Code: Select all

0000000 f1f0 f6f1 e3c1 40d4 4040 f0f0 f0f5 f0f0
0000020 f0f8 f9f9 f1f0 f4f5 f6f6 f1f4 f0f0 f0f5
0000040 f1f1 d9f6 c5c4 d6d4 e3d5 c9c5 d5c1 c440
0000060 4040 f2f0 f0f4 f0f0 02f1 8120 0c01 5001
0000100 3053 c33c f3f2 d2f4 40c3 2040 2020 2020
0000120
AIX OutPut :

Code: Select all

0000000 f1f0 f6f1 e3c1 40d4 4040 f0f0 f0f5 f0f0
0000020 f0f8 f9f9 f1f0 f4f5 f6f6 f1f4 f0f0 f0f5
0000040 f1f1 d9f6 c5c4 d6d4 e3d5 c9c5 d5c1 c440
0000060 4040 f2f0 f0f4 f0f0 02f1 8120 0c01 5001
0000100 3053 c33c f3f2 d2f4 40c3 4040 4040 4040
0000120

LINUX Output :

Code: Select all

0000000 bfef efbd bdbf bfef efbd bdbf bfef efbd
0000020 bdbf bfef 40bd 4040 bfef efbd bdbf bfef
0000040 efbd bdbf bfef efbd bdbf bfef efbd bdbf
0000060 bfef efbd bdbf bfef efbd bdbf bfef efbd
0000100 bdbf bfef efbd bdbf bfef efbd bdbf bfef
0000120 efbd bdbf bfef efbd bdbf bfef efbd bdbf
0000140 bfef efbd bdbf bfef efbd bdbf bfef efbd
0000160 bdbf bfef efbd bdbf bfef efbd bdbf bfef
0000200 efbd bdbf ef40 bdbf 4040 bfef efbd bdbf
0000220 bfef efbd bdbf bfef efbd bdbf bfef 02bd
0000240 ef20 bdbf 0c01 5001 3053 ef3c bdbf bfef
0000260 efbd bdbf bfef efbd bdbf bfef 40bd 2040
0000300 2020 2020
0000304

Posted: Tue Jul 05, 2016 9:53 am
by premkishore1983
Hi All,

Good Morning,
Any suggestions to resolve this issue would be really helpful.

Posted: Tue Jul 05, 2016 10:49 am
by chulett
Well... still obviously some kind of codeset issue, though I couldn't tell you what you got there on the LINUX side. For both input and AIX, this:

Code: Select all

"f1f0 f6f1 e3c1 40d4 4040 f0f0 f0f5 f0f0" = "1061TA M  00500"
This, the alleged equivalent on the LINUX side:

Code: Select all

"bfef efbd bdbf bfef efbd bdbf bfef efbd"
I have no idea what that is. Sorry. Hopefully someone else can be more helpful.

I'm wondering if it is a Big Endian (AIX) vs. Little Endian (LINUX I believe) issue? :?

Posted: Tue Jul 05, 2016 5:00 pm
by ray.wurlod
I think it might be a byte order issue, as Craig suspects.

Posted: Wed Jul 06, 2016 8:55 am
by premkishore1983
Thanks for the reply Craig & Ray.

Can you please let me know where could i check the codeset / byte order issue. I mean whether this has to be checked at Datastage Level or OS level?

Posted: Wed Jul 06, 2016 5:01 pm
by ray.wurlod
If the source is a file, there may be a Byte Order Mark at the beginning of the file. If not there is no easy way to check the byte order, but you could try converting the file using dos2unix or unix2dos command, as appropriate, and determining whether the result is more sensible.

Posted: Wed Jul 06, 2016 10:59 pm
by chulett
Ah... that's what BOM stands for. Duh. :wink:

There are many resources available online regarding the conversion from one 'endian' to another, for example this one. I'm wondering how the file gets 'downstream' from the Linux system, perhaps this transfer mechanism (something like Connect:Direct I imagine) could leverage the conversion. Do you have any friendly System Administrator types that might be able to help with this?

Posted: Thu Jul 07, 2016 11:57 am
by premkishore1983
Thanks Craig for sharing the information on how little endian & big endian works.

We get the Input file from Mainframe connect directed to our AIX server, Since the connection between the mainframe and Linux was not established yet, we scp'ed the input file from AIX to LINUX for testing our jobs & we identified this issue.

Posted: Wed Jul 20, 2016 4:19 pm
by premkishore1983
I'm trying to convert the byte order to bid endian by using FORMAT.CONV, as like below

Code: Select all

FILE_NAME = SOURCE_DIR:"/":SOURCE_NAME:".new"
FORMAT.CONV -u FILE_NAME

GOSUB INIT
*
OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".src" TO F.SRC ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.src',"Output")
  ErrorCode = 1
  GOTO 10000
END
*

OPENSEQ SOURCE_DIR:"/":SOURCE_NAME:".new" TO F.NEW ELSE
  Call DSLogInfo('Unable to open ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
  ErrorCode = 1
  GOTO 10000
END
*
CNT = 1
EOF = 0
LOOP UNTIL EOF DO
  READBLK RECORD FROM F.SRC,LENGTH THEN
    RECORD[76,6] = STR(CHAR(32),6);                             
    WRITEBLK RECORD ON F.NEW ELSE
      Call DSLogInfo('Unable to write to ':SOURCE_DIR:'/':SOURCE_NAME:'.new',"Output")
      ErrorCode = 1
      GOTO 10000
    END
    CNT = CNT + 1
    IF CNT/100000 = INT(CNT/100000) THEN
      Call DSLogInfo(CNT:' records processed...',"Output")
    END
  END ELSE 
    EOF = 1
  END
  *
REPEAT
*
      Call DSLogInfo(CNT:' records processed...',"Output")

FORMAT.CONV -u FILE_NAME

CLOSESEQ F.SRC
CLOSESEQ F.NEW
*
GOTO 10000
*
INIT: * - - Initialize Unix Flat File - - *
*
  Command = "rm ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "touch ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
  Command = "chmod 660 ":FILE_NAME
  Call DSLogInfo("Command = ":Command,"Command")
  CMD = 'sh -c "':Command:'"'
  Call DSExecute("UV", CMD, Output, SystemReturnCode)
  Call DSLogInfo("Output ":Output, "Output")
  Call DSLogInfo("System Returncode ":SystemReturnCode, "SysCode")
  *
RETURN
10000
but i'm getting the following error,


Warning
0018 FORMAT.CONV -u FILE_NAME
^
Variable Name (UNDEFINED) unexpected, Was expecting: Assignment Operator
0057 FORMAT.CONV -u FILE_NAME
^
Variable Name (UNDEFINED) unexpected, Was expecting: Assignment Operator

2 Errors detected, No Object Code Produced
Am i missing something here, any suggestions would be helpful.

Posted: Thu Jul 21, 2016 2:58 am
by ArndW
You would need to issue the line like "EXECUTE 'FORMAT.CONV -u ':FILE_NAME CAPTURING ScreenIO RETURNING ErrorCode" in order to do that.