Query regarding the sorting of data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Query regarding the sorting of data

Post by vigneshra »

Hi

My job is like this.

Source1 DB2->Sorter1->A

A->Sorter3->Transformer->Target DB2

Source2 DB2->Sorter2->A



I have sorted the records from both source DB2 plug-ins based on a key key1 before joining. Again after join operation, I am sorting the records based on the same key key1. My query here is do I need to sort the data again using the sorter3 or can I direct the output to the transformer directly. In transformer we are making some calculations involving Stage Variables and that requires data to be in the sorted order. Kindly clarify this query.

Thanks in advance !!

Regards,
Vignesh.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Uh... why sort?

Do not sort unless there is a specific case where you have to sort.

I am presuming that A is a join stage. Looking at the help file for the Join stage, we see:
The data sets input to the Join stage must be key partitioned and sorted.

...

Choosing the auto partitioning method will ensure that partitioning and sorting is done.
So relax. Don't manually sort. Let DataStage do it for you. There are some stages where you need to sort beforehand, but in this case, I do not see any reason as long as the input files are either database or sequential. Datasets, on the other hand is interesting:
If sorting and partitioning are carried out on separate stages before the Join stage, DataStage in auto mode will detect this and not repartition.
In this case, if you have sorted the data in the previous job and passed it to datasets, this sort will be used as is in the join stage, thus you will have to sort.

Fun stuff, isn't it?
Post Reply