Sort and remove duplicates based on different keys

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
leandrohmvieira
Participant
Posts: 44
Joined: Wed Sep 02, 2015 7:19 am
Location: Brasilia, Brazil

Sort and remove duplicates based on different keys

Post by leandrohmvieira »

Greetings experts.

I already searched for something similar here, but im not undertanding why i am receiving the following error:

User inserted sort does not fulfill the sort requirements of the downstream operator remove_duplicates

The business requirement is:

i have 3 columns ID_HIST, DOC_TYPE and ID_PERSON. I must keep one row per ID_PERSON, with the lowest DOC_TYPE and highest ID_HIST, in this order.

ID_HIST is the table PK.

So i designed a flow like this:


ora_connector--------->sort--------->remove_dup

on sort, i sorted by DOC_TYPE ASC and ID_HIST DESC, and on remove_dup, kept the first.

Its working somehow, but we cant deploy a job with warnings, then i changed the solution to SQL, but this still hauting me since yesterday.

Can someone explain me how to archieve this on DataStage without getting warnings?

PS: tried same, hash, and auto partitioning modes between sort and rem_dup stages
Leandro Vieira

Data Expert - Brasilia, Brazil
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sort the keys in the same order as they are specified in the Remove Duplicates stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply