COBOL file without copybook

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi,

While loading metadata in CFF stage , should we select all the group columns ? also there is an option for "Create Filler" I assume this should not be checked.

In the next window for loading metadata, there are 3
1)Flatten Selective arrays
2)All arrays
3)AS IS

Which option should be selected?
Rumu
IT Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

rumu wrote:I used the command in the command line, it did not work as it returns 0 rows.
Figured. You've got an EBCDIC file so a simple grep for ASCII characters wasn't going to work.

:idea: Always test your filter commands outside of DataStage first.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi Craig,
That's correct...My File in EBCDIC and I requested to change it to ASCII during FTP but mainframe guy did not agree.
Without converting to ASCII, is there a way to filter records before passing to CFF?
Rumu
IT Consultant
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Rumu,

There's too much going on here for you to keep up with the details. I strongly recommend sitting with your mainframe developer and going through an exercise in simplification.

On the copybook, as they created it, get help with flattening the groups manually. Edit the copybook before importing it to DataStage. This gives you two advantages: you don't have to rely on DS to make decisions about groups, and you have a clear set of fields to test your input data.

Ideally, when you finish, you'll have a series of record types and layouts, with just the REDEFINES at the highest level to separate them.

You do not need to convert to ASCII if your mainframe team can be engaged to do any data manipulation before you use FTP and CFF. It is just better to do it there for many reasons, starting with the mainframe processes created the file and they are best suited to manipulating it.

One choice I might test: rather than one complex multi-record type file, break it down to one file for each record type. This is not difficult to do in COBOL (or most macro-based utilities they might have). DataStage is nice to leverage for how it is prepared to handle some things, and not nice when those things don't fit what DataStage expects from them.'

In short, if you can't make it work easily, you need to put in the work to simplify.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi Frank,

I have suggested the mainframe team to split the files for each of the Record types but they said they cant do it. So I was trying to levearage the 'Constraint' tab in Output tab. I checked single record type and use HeaderRecordID fied to select one type of records for which metadata is defined.
But it seems, did not work as I am getting warnings as 'Short input record" or " record overrun" leading Import error .
If we select Single record type, then this constartint tab wont work?


While loading the columns, I unchecked the group columns as checked Create Filler options. I selected all the columns. Then used the Flatten selective arrays optins.


Are these settings ok?


My main issue is , receiving no help from mainframe team as they are third party. They are just sending the File. No one from Mainframe team is avavialble to discuss.

I have some queries on your following suggestion:

Code: Select all

On the copybook, as they created it, get help with flattening the groups manually. Edit the copybook before importing it to DataStage. This gives you two advantages: you don't have to rely on DS to make decisions about groups, and you have a clear set of fields to test your input data. 

Ideally, when you finish, you'll have a series of record types and layouts, with just the REDEFINES at the highest level to separate them.

Different record types are defined by level 01 so how can I get series of record types and Layouts using Rededife clause? I may be missing your point so If you can explain a bit , will be good for me.
I know this has been going on for long but really this is a cumbersome task that I have been assigned.
No one else is there in my project to help me out. So I am coming back to you all again and again.Thanks for bearing me for such a long time.
Rumu
IT Consultant
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Rumu,

The lack of cooperation with you is a serious problem. I'm sorry that you have to deal with it.

You can try to do the split yourself. It won't be easy. If you have a consistent record length for each record type, the "split" job would just identify the record type and write it to its own file. You could then use CFF to read each file based on the copybook section for the record type.

If the records don't have a consistent length, I'm not sure what you would do, but there may be methods others here can suggest.

The basic approach would be a two column read. First column would be the record type field, the second column would be the rest of the record. I would use transformer constraints to split the records to their files. Each of the split files could then be read by their own CFF stages with the matching copybook section.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi Frank,

In one of my previous posts, I mentioned that I am trying to read the file using Sequential File stage with one column as Varbinary then try to split the record in columns using transformer stage, there I faced issue that while converting the first 6 bytes (which holds the Recordid for the each type of records) using rawtoString function, I could view the Recordid in datastage viewer and unix but Transforstage could not read it as String and can not split it...
You suggested to read the entire record as Char but that makes the job in hanging state generating warnings. So I went back to mainframe team to split the files from tehir side but they said no.
As the file is in EBCDIC format, what exactly data type to be used in splitting the files in datastage ?
Rumu
IT Consultant
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi All,

Can you please let me know how can I read the Cobol binary record as 1 record and split based on Record identifies. I need help mainly in datatype selection.

I used Varbinary to read the record as single file and then in the transformer stage used rawTo string function to the column and take first 5 characters to compare with the Record string. But the equality is not working ..
My Input data is 16326 byte long and Recird id is 10 byte long. so I read in one column as 16336 length of Var binary. Do I need make any specific changes in Format tab? Like Record delimeter, String type etc?
If I want to cut 11 to 16326 as a raw data to feed to subsequent CFF stage then which data type to be used ?
Rumu
IT Consultant
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Assumptions:

Input record length is 16336 bytes. First 10 bytes is record type indentifier.
Every record is consistently of that length.
You must preserve the EBCDIC character set.

Sample table definition:

Code: Select all

REC_TYPE Char(10).
DATA_COLUMN Binary(16326).
You can use Sequential File stage for this. Set the format of the input file the same as it would be set in CFF. Your transformer output links would go to Sequential Stages, and each link constraint would be for the record type.

It shouldn't matter what SQL type the DATA_COLUMN is because you will not attempt to use any functions on it in the split job. Each output file would be read for the section of the copybook for the record type, and CFF would be best for that.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Frank..I followed the design. I set Character set as EBCDIC, Byte order Big Endian, Data format Binary, Rounding nearest value and Record Delimeter Unix newline. These are formats set in CFF stage. In CFF there was one more field called Seaparator which was set to project default.

I kept Rec Type as Char 10 and Data Column as 16326 . In the following transformer used constratnt, REC_TYPE[1,5]='01MST' to filter out first type of data.
In out put seq file has one column DATA_COLUMN as Binary 16326.
While I ran the job, it aborts giving warnings as REC_TYPE lacks whitespace delimeter at offset 10. ..Same warnings for 50 records an dthen aborted as warnings set as 50 in project level.
Rumu
IT Consultant
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

If you have a delimiter setting, you should delete it. That's the only reason I can think of for the warning.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Reviewing this thread. A truly frustrating situation when the mainframe folks aren't communicating. It has probably been said in earlier parts of this thread, but be ABSOLUTELY CERTAIN you are working with a quality editor that allows you optionally to see every row in its full HEX format. That's critical for situations like this.

I can't help but wonder if perhaps the Sequential Stage is getting tripped up when looking for the end of record. I've seen it happen where a truly binary value (a 4 byte integer perhaps) "just happens" to have a value equal to a line feed (hex 0A or decimal 10). There may be ways to force it to blindly not look, but over the years in cases like this I have seen too many cases where the EE Sequential Stage is run astray. In those situations I have turned to a very unorthodox but sometimes very useful solution --- a Server Job. Its Sequential Stage "tends" to be a bit more forgiving when asked to deal with "random amounts bytes". It wouldn't be hard to try --- and I would only suggest giving it a spin for this initial "split" purpose.

Still --- even with this or a Server Job, have you been able to get the data to consistently retrieve the correct number of rows, even WITHOUT the 10 byte record type column? Meaning --- just have the one column, with a length of the FULL 16336. If you know there are (say) 22,535 rows in the file, can you run it and send that same number of rows to another sequential stage?

If you can't, then its not your record type logic that is failing.....either the file is not truly that length, as Franklin and I are assuming, or else possibly the Stage is getting caught by some other end-of-record identifier.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi Frank,

I tried the job removing the property

Record Delimiter=UnixNewline.

The same warnings I received.

I guess, since the record is contiguous(without field deliemeter) so using 2 columns to read the input record is giving error as it expects a delimiter after first Rec_Type column which is not there in the record. RecType is 10 Byte and actual data is starting from 11th byte.
I tried to read 16336(10+16326) in a Binary column and spitted the column using rawtoString function to read first 6 bytes for the Record type and mapping entire 16336 to data column so that this can be read using copybook later. but this job now throws warnings as

Code: Select all

Sequential_File_11: When checking operator: When validating import schema: Unrecognized top level format property: round=round_inf

Sequential_File_11: When checking operator: When validating import schema: Unrecognized top level format property: packed

Sequential_File_11: When checking operator: When validating import schema: Unrecognized top level format property: julian
Then reads 90% of the file and then started Fatal message as
imput buffer overrun and getting aborted.

I tried using the following setting under format tab
Decimal
Packed=yes
Rounding=nearest value
Date=IsJulian

Still the warnings persists,

Also, I came to know that, the input file which has 11 types of records, each with varying length ie on type '01MST' with 16326 length, another '02CHK' is with 56778 length etc..
In that case If I use the highest record length in the DataColumn will that work? or put garbage's at trailing part for the shorter record?
Rumu
IT Consultant
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

At this point, I have an unhelpful observation to make: you are not dealing with standard COBOL formatting. You are dealing with undisciplined formatting out of COBOL coding.

They have set for you an impossible obstacle: reading a file with inconsistent record lengths. One more thing you might check, and I ask you to forgive me if I describe things you already know.

You have a series of variable length records. In COBOL, the prefix bytes of each record are defined explicitly, and the prefix contains the actual length of the following record. In the object oriented world, the prefix is always implied, never defined explicitly.

So, if you can see the prefix for each record -- and you will likely need the help of the mainframe developers for that -- it's possible to set up a VarChar column, scale undefined or set to the maximum record length in the file, which would accurately read each record.

I have little experience with variable length records out of COBOL processes, because the first standard for a COBOL file is that every record in a file without exception has the same record length as every other record. The mainframe developer should be padding the records in your file to make them all the same length, they should all be the length of the longest possible record. Since they are not cooperating, your only option is to do the padding yourself -- you've tried and failed so far -- or create a process before you try reading the actual data which splits the records into separate files where every record is of a consistent length. Ernie has a good suggestion to look to Server stages that might work. I expect you may need to build a process outside of DataStage.

I recommend not using a binary format to read the data. The RawTo functions are very limited in scope. I wish I could be more helpful, but the lack of cooperation is the main obstacle, and neither of us has any control over that. :(
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi Frank, Ernie

I am unable to read the post that Ernie posted. Could anyone please share the entire post here?

I am trying to read the binary file first and split into separate files as per record type code. But normal unix will not help as the file is binary.
I am trying in Phython but I do not know the language so still researching.
I received one set of of copybooks for the different record type but that shows fixed record length where as in reality, each record type is variable .
My expectation was that the optional segments should have a clause like

Code: Select all

HRSK-SEG OCCURS 0 TO 1 TIMES 
DEPENDING ON HRSK_SEG_CNT 
then followed by the actual structure of this segment.
But the copy book does not indicate that OCCURS 0 TO 1TIMES. When I refer the corresponding data dictionary, I found the HRSK-SEG is optional and may occurs 0 to 1 times.
The same situation repeats for other variable segments.
I tried to modify the copybook, adding the above syntax .
Upon importing the metadata, I can see the OCCURS depending on clause in not highlighted for the segments where it repeats 0 to 1 times but for other segments which repeats 0 to 4 times, it shows in the cobol layout.
Any specific reason, why datastage does not recognize 0 to 1 clause but it shows 1 under the OCCURS column in Colum grid.
Rumu
IT Consultant
Post Reply