Lookup stage outputs duplicate rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Lookup stage outputs duplicate rows

Post by ASU_ETL_DEV »

Hello,
I am transitioning from the server to the parallel world so bear with me...

I have two oracle tables: one is used as stream and the other as reference. I need to find out which rows from the stream are NOT in the reference and output them to a target file.

I am using the Oracle connector for both stream and reference (row count on each is under 400) and I am using a Lookup stage with a reject link to capture the failed lookups. I coded for a sparse lookup in the reference Oracle connector.
Since the Lookup stage does not allow me to code only the reject link I have also an output stream link to a flat file for the lookup hits.
What I am seeing is that the job outputs 200k+ rows to the output stream link. In looking at this data I find that there are lots of duplicates (500+) for each input stream row.

Can you please explain this behavior of the Lookup stage?

Thanks.
Marco
ASU Developer
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Duplicates in the stream input link.
Duplicates in the reference input link.
Both of the above.

Why are you using sparse lookup for such a small volume?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ASU_ETL_DEV
Premium Member
Premium Member
Posts: 43
Joined: Tue Sep 09, 2008 1:56 pm

Post by ASU_ETL_DEV »

Hi Ray,
There were duplicates in the lookup table. I changed the SELECT to DISTINCT and also changed the lookup type to normal instead of sparse and now it is working correctly.
Thanks.
Marco
ASU Developer
yugee
Participant
Posts: 34
Joined: Fri Feb 04, 2011 5:54 pm

Post by yugee »

Though your reference link has duplicates, unless you specify "multiple rows returned from link" for the reference link, you should not get the duplicates. If your main link has duplicates, then it is possible to get the duplicates.
Post Reply