Splitting a Dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Inquisitive
Charter Member
Charter Member
Posts: 88
Joined: Tue Jan 13, 2004 3:07 pm

Splitting a Dataset

Post by Inquisitive »

Is it possible to split a Dataset through Unix and use the split files as inputs to a DataStage PX Job ?

if so, what are the precautions that one needs to take ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Umm... What do you mean by "dataset" here? In the PX environment we usually refer to a dataset as something created by DataStage PX. Also, a fileset is something similar, the main difference being that a dataset is in memory to the greatest degree possible, while a fileset exists in persistent storage.
Can you please clarify? If it's a PX dataset, there's no need to split it; the PX control file (the one with a ".ds" suffix) contains the information about how and where it's split.
If you mean a data set as something like the result set of the query, then simply running the query within the PX environment will cause the data to be "fanned out" over the available resources.
All this is well explained in the Parallel Job Developer's Guide.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sandy
Participant
Posts: 24
Joined: Sun Feb 01, 2004 1:14 am

Post by sandy »

If i understand Inquisitive correctly, you want to split a DataSet just like a flat file on Unix.

No, you cannot do this. Just because the data in a Dataset is already split and stored in nodes.

If you want to distribute your data into various datasets, do it in the job itself, by having multiple output links from your last stage and writing into more than one DataSet stage.

Hope i answered your query.

Regards,
Sandyla.
lcduddridge
Participant
Posts: 8
Joined: Tue Nov 16, 2004 12:34 pm

Re: Splitting a Dataset

Post by lcduddridge »

You can split dataset in two ways

1) create a small job which reads the dataset and writes to another dataset. When you run, give the correct $APT_CONFIG_FILE which tells how to split the file (you have to create correct config file)

2) If you are familiar with unix, then use orchadmin command.

Siva
(for Luke)


Inquisitive wrote:Is it possible to split a Dataset through Unix and use the split files as inputs to a DataStage PX Job ?

if so, what are the precautions that one needs to take ?
Post Reply