Check if multiple rows returned from Lookup Stage

sensiva · Post by **sensiva** » Tue Jan 15, 2019 8:45 am

Hello All,

Not sure if this type of requirement is common, but i felt it was a little strange.

The requirement is that, a lookup needs to be done on a table which could potentially return multiple rows. If one row is returned, then process the records, but if multiple rows are returned, then ignore the record.

I planned of using a lookup stage with multiple rows returned option, and then thought of using sort stage with key column change option. But still with that I would know, if i get multiple records only from the second record, which means I would have processed the first record by then. I am thinking of other methods, but its getting complex.

Any ideas of how else could this be achieved. I was thinking of remove duplicates as well.

UCDI · Post by **UCDI** » Tue Jan 15, 2019 11:56 am

can you rig some sort of having clause into your lookup query so that if it has more than 1 it returns nothing?

myerdsdupp · Post by **myerdsdupp** » Tue Jan 15, 2019 4:37 pm

Use an aggregate stage after a lookup and get the count out of it. You can use filter or transformer stage to pick only records where count = 1.

sensiva · Post by **sensiva** » Wed Jan 16, 2019 1:19 am

Thanks for your replies

I see both similar as in the first case, we filter with oracle and the other with datastage.

I was asking this question to myself for a while now, there are operations that could be done with oracle and with datastage also, like sort, group by, distinct etc.. My collegues have always been saying to retrive all records from oracle and do all the other functionalities using datastage as its more performant and is considered as a best practise. But not sure if its true always. As our oracle is well partitioned and indexed.

As in this case, if we could implement the having clause, we would treat the same number of records as we receive from the main link, where-as in case of using the lookup stage, we would multiply the number of records to treat(if we have 1million records to treat, and if we have atleast 3 matches for each line we end up treating 2-3 million records, grouping them by storing them in memory and then filtering.

your views.

I am little inclined to using having option. Lets see how the team reacts

myerdsdupp · Post by **myerdsdupp** » Wed Jan 16, 2019 8:48 pm

I would prefer using a having clause in oracle.

If your data is huge in the reference table then lookup is not the right option.

You can always do a trail and run with your design but try to follow the approach with avoids the bottleneck on the server.

mouthou · Post by **mouthou** » Sun Jan 20, 2019 11:15 pm

I think your colleagues are not entirely correct and working up entirely in Datastage is a selective scenario. Best practice is to push the operation to DB so the minimal preprocessing like yours happens while selecting the data itself.

'Multiple rows returned option" did what it is supposed to in your case. If that is not your objective, you could try co-related queries similar to your having clause option except that GroupBY is kept separate so that you can select key and all other columns in the outer query)

SELECT <MAINTAB.requiredcolumns>
FROM MAINTAB, (SELECT keycol1 from MAINTAB GROUP BY KEYCOL1 HAVING COUNT(*) = 1) MAINTABuniq
WHERE MAINTAB.keycol1 = MAINTABuniq.keycol1

OR use RANK function if you are familiar with the syntax