Sort Stage - (Don't Sort) Performance Impact

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sensiva
Premium Member
Premium Member
Posts: 21
Joined: Tue Aug 22, 2017 10:39 am

Sort Stage - (Don't Sort) Performance Impact

Post by sensiva »

Hello,

I would like to have your views on the usage of multiple sort stages for the use of creating a key change column and not sorting the data actually.

Here is the scenario,

Code: Select all

Sort Stage 1 - Sort Mode - Sort for all columns
Sorting columns A, B, C, D, E

Sort Stage 2 - Sort Mode - (Don't Sort Previously sorted)
Sorting columns A, B, C 
Create a key change column 1

Sort Stage 3 - Sort Mode - (Don't Sort Previously sorted)
Sorting columns A
Create a key change column 2
I did read from the knowledge center that Don't sort would not use much of memory, but still hesitant to use multiple sort stage for an input data that would probably contain around 3 million records. Is it advisable to use the sort stage just for creating key change columns, else would have do in transformer with comparing the previous records..

Any pointers would be of great help.

Thanks
Sen
sen
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, it's been awhile but can't you sort and create the key change column at the same time? Meaning two rather than three Sort stages. And if the sort handles the key change, I'm not sure there's a need to have a transformer do it post-sort unless there are rules to it that you would need those stage variables to handle properly... seeing as how the data needs to be sorted regardless.

Regardless, I don't think you need to be too concerned about the performance impact of "Don't Sort" stages but curious what others think. And 3M isn't really a large amount to sort IMHO unless your infrastructure is not up to the task.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sensiva
Premium Member
Premium Member
Posts: 21
Joined: Tue Aug 22, 2017 10:39 am

Post by sensiva »

Thanks for your reply
chulett wrote:Sorry, it's been awhile but can't you sort and create the key change column at the same time? Meaning two rather than three Sort stages.
Yes, Sort does create a keyChange while sorting, but i don't want the keyChange with 5 keys (A,B,C,D,E) but with rather one keyChange with (A,B,C) keys and another with just A as key.

Code: Select all

 Say for example A = COUNTRY, B = STATE, C = ORDER, D = PRODUCTS E = xxxx

I need to sort on all these keys to process the data and then i would need a key change till ORDER and another key change just for the COUNTRY to route and process them differently. 
I don't think you need to be too concerned about the performance impact of "Don't Sort" stages but curious what others think. And 3M isn't really a large amount to sort IMHO unless your infrastructure is not up to the task.
Our infrastructure is well built, and I could still ask for more cpu if need be. But would really like my design to be well made to put forth my points and demand them.

I would go ahead and implement with 3 sort stage with 2 of them needing just for keyChange.

And definetly as said, it would be great to have others views as well .

Thanks
Sen
sen
sensiva
Premium Member
Premium Member
Posts: 21
Joined: Tue Aug 22, 2017 10:39 am

Post by sensiva »

Thanks for your reply
chulett wrote:Sorry, it's been awhile but can't you sort and create the key change column at the same time? Meaning two rather than three Sort stages.
Yes, Sort does create a keyChange while sorting, but i don't want the keyChange with 5 keys (A,B,C,D,E) but with rather one keyChange with (A,B,C) keys and another with just A as key.

Code: Select all

 Say for example A = COUNTRY, B = STATE, C = ORDER, D = PRODUCTS E = xxxx

I need to sort on all these keys to process the data and then i would need a key change till ORDER and another key change just for the COUNTRY to route and process them differently. 
I don't think you need to be too concerned about the performance impact of "Don't Sort" stages but curious what others think. And 3M isn't really a large amount to sort IMHO unless your infrastructure is not up to the task.
Our infrastructure is well built, and I could still ask for more cpu if need be. But would really like my design to be well made to put forth my points and demand them.

I would go ahead and implement with 3 sort stage with 2 of them needing just for keyChange.

And definetly as said, it would be great to have others views as well .

Thanks
Sen
sen
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

I think I would go with 1 sort stage.

Partition by A
Sort by A,B,C,D,E
LastRowInGroup(C) transformer function will give you key breaks on A,B,C
LastRowInGroup(A) transformer function will give you key break on A

Partition by A since you likely want all of the A rows passing through the same processing node.

That's all from memory as I haven't used the LastRowInGroup() function for some time now.

Mike
sensiva
Premium Member
Premium Member
Posts: 21
Joined: Tue Aug 22, 2017 10:39 am

Post by sensiva »

Thanks Mike, your solution worked great and just one sort stage was enough
sen
Post Reply