Page 1 of 1

possible ways to write a Sequential File stage in PARALLEL?

Posted: Mon Apr 02, 2018 10:29 am
by mouthou
Hi all,

There are alternative ways to read a sequential file by using options such as number of readers. Are there similar options while WRITING too? Referring to a single file though and not a file pattern. Appreciate your thoughts.

Posted: Mon Apr 02, 2018 12:05 pm
by chulett
Short answer? No.

Posted: Mon Apr 02, 2018 1:25 pm
by UCDI
you can split the file into X chunks with filenamechunknumber format and open each one in its own reader. Then write X chunks again and use cat or something to reassemble them. you can also write a dataset and use orchadmin to convert that to a flatfile at the end for writing, but reading, I don't see a way around splitting it externally. Is the file reading the actual bottleneck? How long does it take a dumb job to read the file and write to another file of another name, no processing, no implicit data conversions, just a pass through RCP job?

Posted: Mon Apr 02, 2018 3:08 pm
by chulett
That would be the longer answer... write the file out in chunks and then cat the chunks back together once they are all complete. However, I've never personally seen a situation where doing that made sense, never mind the (albeit temporary) need for twice the amount of disk space.

FWIW, the question is strictly about writing in parallel.

Posted: Mon Apr 02, 2018 8:39 pm
by mouthou
Thanks all for the responses. But as chulett mentioned, the workaround given by UCDI seems little different from my query. I was hoping to see the features in DataStage like readers than a manual logic which may be done in many ways though.

Posted: Mon Apr 02, 2018 10:16 pm
by chulett
You're fighting the nature of the beast... it's called a sequential file for a reason. They can support multiple readers but always support only a single writer, which is why you're not going to find any such option built into DataStage.

Posted: Tue Apr 03, 2018 1:45 am
by ArndW
Although your question has already been answered, think of using DataSets, which are exactly what you are looking for. You can have parallel processes writing to datasets (which are nothing but glorified parallel sequential files) in "Append" mode in each process.

The result is effectively a sequential file, although the order of records is non-deterministic.