Flat File with Parent Child relation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gssr
Participant
Posts: 243
Joined: Fri Jan 09, 2009 12:51 am
Location: India

Flat File with Parent Child relation

Post by gssr »

I am having a fixed width file with different record layout.
The file is having a hierarchical structure with a parent Child Relation.

Example:
000HEADER
001PARENT1_COL1_COL2
010CHILD1_COL4_COL5_COL6
020CHILD2_COL7
030CHILD3_COL_8_COL9
001PARENT2_COL1_COL2
010CHILD1_COL4_COL5_COL6
020CHILD2_COL7
030CHILD3_COL_8_COL9
..........
999TRAILER


i need to extract the file as
001PARENT1_COL1_COL2_COL4_COL5_COL6_COL7_COL_8_COL9
001PARENT2_COL1_COL2_COL4_COL5_COL6_COL7_COL_8_COL9
...........
I tried it with CFF stage, but i cannot able to preserver the parent child relation.
Any suggestion is mostly appriciated
Thanks in advance
RAJ
shank
Participant
Posts: 18
Joined: Wed Mar 25, 2009 3:11 am

Post by shank »

Do you have any record field identifier which says it is a parent/Child record ?
I guess there should be some. (Lets say .. First three bytes in your example. If it is '001' it is parent record else its a child record)
In that case, use Sequential file to read the file and a Filter on that Identifier field to write Parent and Child files seperately.

Hope this helps..
Regards,
Shank
gssr
Participant
Posts: 243
Joined: Fri Jan 09, 2009 12:51 am
Location: India

Post by gssr »

Shank,
I dont want to read the parent and Child records seperatly.

the first three characters represents the record type.
Parent record - 001
Child record - 010,020,030...

Here , i am getting set of records with thhis combination.
and i donk have any key to differentiate each group

In the ablove example,
Below is the first set of record,
001PARENT1_COL1_COL2
010CHILD1_COL4_COL5_COL6
020CHILD2_COL7
030CHILD3_COL_8_COL9

And i have to preserver the relation and write this as below
001PARENT1_COL1_COL2_COL4_COL5_COL6_COL7_COL_8_COL9

Similarly , i have to extract each set of records.
RAJ
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Read the entire record as a single VarChar and parse it within the Transformer stage.

Use stage variables to detect the arrival of a new parent and to accumulate the output record.

Use a downstream RemoveDuplicates stage to preserve the last of each group (with the same parent record).

You didn't mention needing to do anything with the header and trailer records, so I haven't discussed these.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Some assumptions:

That the number of bytes between the first byte of the 001 "parent" and the last byte prior to the next 001 is always the same.

That there is always a fixed-width (padded if necessary) set of 010, 020 and 030 children.

Then, a variation of Ray's suggestion will work for you out of a CFF stage:

Record keys are 000 for header, 999 for trailer (they can be dropped or have their own output links) and 001 for details, and the record is parsed into a table definition for the "parent" and "child" columns.

I put quotes around the terms because what you've described is standard for a Cobol-generated source.

The most important implication is that every record -- 000, 001 and 999 -- is fixed on the same width. If the header and trailer are different lengths, you have a different problem.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
gssr
Participant
Posts: 243
Joined: Fri Jan 09, 2009 12:51 am
Location: India

Post by gssr »

its working good now.
I missed to add and information that, there could be multiple child records with same identifier (Say there could be more than 010 record.)

The job design is,
1. CFF Stage to extract all type of records(Header, Parent, Child and Trailer)
2. Transformer Stage (run in sequence) to generate a column to identify each new parent record.
The stage variable derivation is,

Code: Select all

vChk ==>  if DSLink118.Field001 = "001" Then vChk+1 else vChk
3.Used different links for each type of record
4.Join all the Links with key column as vChk

Thanks for all your responces
RAJ
Post Reply