possible ways to write a Sequential File stage in PARALLEL?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

possible ways to write a Sequential File stage in PARALLEL?

Post by mouthou »

Hi all,

There are alternative ways to read a sequential file by using options such as number of readers. Are there similar options while WRITING too? Referring to a single file though and not a file pattern. Appreciate your thoughts.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Short answer? No.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

you can split the file into X chunks with filenamechunknumber format and open each one in its own reader. Then write X chunks again and use cat or something to reassemble them. you can also write a dataset and use orchadmin to convert that to a flatfile at the end for writing, but reading, I don't see a way around splitting it externally. Is the file reading the actual bottleneck? How long does it take a dumb job to read the file and write to another file of another name, no processing, no implicit data conversions, just a pass through RCP job?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That would be the longer answer... write the file out in chunks and then cat the chunks back together once they are all complete. However, I've never personally seen a situation where doing that made sense, never mind the (albeit temporary) need for twice the amount of disk space.

FWIW, the question is strictly about writing in parallel.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Post by mouthou »

Thanks all for the responses. But as chulett mentioned, the workaround given by UCDI seems little different from my query. I was hoping to see the features in DataStage like readers than a manual logic which may be done in many ways though.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You're fighting the nature of the beast... it's called a sequential file for a reason. They can support multiple readers but always support only a single writer, which is why you're not going to find any such option built into DataStage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Although your question has already been answered, think of using DataSets, which are exactly what you are looking for. You can have parallel processes writing to datasets (which are nothing but glorified parallel sequential files) in "Append" mode in each process.

The result is effectively a sequential file, although the order of records is non-deterministic.
Post Reply