Splitting Concatenated Strings

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
JezT
Participant
Posts: 75
Joined: Wed Sep 15, 2004 1:52 am

Splitting Concatenated Strings

Post by JezT »

I have a file consisting of multiple rows containing multiple Surrogate Keys concatenated together against a single Natural Key.

For example,

Code: Select all

 Surr Key     Nat Key
1:2:3:4       ABC
5:6:7:8       DEF
I am now wanting to split this concatenated string up so that for every instance of a surrogate key, a row is created, as below.

Code: Select all

 Surr Key     Nat Key
          1                ABC
          2                ABC
          3                ABC
          4                ABC
          5                DEF
          6                DEF
          7                DEF
          8                DEF
I am wanting to either action this in a transformer or within an INSERT statement on DB2 if possible as the data will be loaded into a DB2 table.

Any suggestions ?

Jez
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Pivot Stage.
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

Post by sonia jacob »

Hi,

you could try replacing the ":" with Nat Key : Char(013) : Char(010).

ereplace(Surr Key : ":" , ":", trim(Nat Key) : Char(013) : Char(010)).

This would work only if you are sure about the delimitor in the Surr Key.

If a output file required is a delimited one then following is to be done
1. Concatinate the delimitor required, in the ereplace statement itself. eg
ereplace(Surr Key : ":" , ":", '|' : trim(Nat Key) : Char(013) : Char(010)). If | (pipe symbol) is the delimitor
2. In the format of the output file (sequential stage) the delimitor is to be supressed (000)

If a output file required is a fixed width one then following is to be done
1. Put appropriate padding for the columns

Hope it helps.

Sonia
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Read your file, and write to a hash file. NatKey will be the primary key in the hash file. SurrKey will be a data column in the hash file.

Set the derivation for SurrKey to

Code: Select all

Convert(":", @VM, SurrKey)
This converts the colons to value marks.

After the hash file hash been loaded, read it, but on the Output/General tab of the hash file stage, set the Normalize on drop box to SurrKey, your surrogate key column.

View data, and you will see the results you want. :wink:
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

better solution

Post by 1stpoint »

IF this is a flat file, it is usually better to pre-process the incoming flat file outside of datastage and create a new flat file. Yes, you can use DataStage by creating a Hash stage, etc. but the processing may be slower than a simple script (perl or python (I prefer python)) that will preprocess the file and create an input file for you.

You can then call the script from your DataStage Batch/Sequence.
Post Reply