Hello All,
Not sure if this type of requirement is common, but i felt it was a little strange.
The requirement is that, a lookup needs to be done on a table which could potentially return multiple rows. If one row is returned, then process the records, but if multiple rows are returned, then ignore the record.
I planned of using a lookup stage with multiple rows returned option, and then thought of using sort stage with key column change option. But still with that I would know, if i get multiple records only from the second record, which means I would have processed the first record by then. I am thinking of other methods, but its getting complex.
Any ideas of how else could this be achieved. I was thinking of remove duplicates as well.
Check if multiple rows returned from Lookup Stage
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 4
- Joined: Thu Oct 25, 2018 8:24 pm
Thanks for your replies
I see both similar as in the first case, we filter with oracle and the other with datastage.
I was asking this question to myself for a while now, there are operations that could be done with oracle and with datastage also, like sort, group by, distinct etc.. My collegues have always been saying to retrive all records from oracle and do all the other functionalities using datastage as its more performant and is considered as a best practise. But not sure if its true always. As our oracle is well partitioned and indexed.
As in this case, if we could implement the having clause, we would treat the same number of records as we receive from the main link, where-as in case of using the lookup stage, we would multiply the number of records to treat(if we have 1million records to treat, and if we have atleast 3 matches for each line we end up treating 2-3 million records, grouping them by storing them in memory and then filtering.
your views.
I am little inclined to using having option. Lets see how the team reacts
I see both similar as in the first case, we filter with oracle and the other with datastage.
I was asking this question to myself for a while now, there are operations that could be done with oracle and with datastage also, like sort, group by, distinct etc.. My collegues have always been saying to retrive all records from oracle and do all the other functionalities using datastage as its more performant and is considered as a best practise. But not sure if its true always. As our oracle is well partitioned and indexed.
As in this case, if we could implement the having clause, we would treat the same number of records as we receive from the main link, where-as in case of using the lookup stage, we would multiply the number of records to treat(if we have 1million records to treat, and if we have atleast 3 matches for each line we end up treating 2-3 million records, grouping them by storing them in memory and then filtering.
your views.
I am little inclined to using having option. Lets see how the team reacts
sen
-
- Premium Member
- Posts: 4
- Joined: Thu Oct 25, 2018 8:24 pm
I think your colleagues are not entirely correct and working up entirely in Datastage is a selective scenario. Best practice is to push the operation to DB so the minimal preprocessing like yours happens while selecting the data itself.
'Multiple rows returned option" did what it is supposed to in your case. If that is not your objective, you could try co-related queries similar to your having clause option except that GroupBY is kept separate so that you can select key and all other columns in the outer query)
SELECT <MAINTAB.requiredcolumns>
FROM MAINTAB, (SELECT keycol1 from MAINTAB GROUP BY KEYCOL1 HAVING COUNT(*) = 1) MAINTABuniq
WHERE MAINTAB.keycol1 = MAINTABuniq.keycol1
OR use RANK function if you are familiar with the syntax
'Multiple rows returned option" did what it is supposed to in your case. If that is not your objective, you could try co-related queries similar to your having clause option except that GroupBY is kept separate so that you can select key and all other columns in the outer query)
SELECT <MAINTAB.requiredcolumns>
FROM MAINTAB, (SELECT keycol1 from MAINTAB GROUP BY KEYCOL1 HAVING COUNT(*) = 1) MAINTABuniq
WHERE MAINTAB.keycol1 = MAINTABuniq.keycol1
OR use RANK function if you are familiar with the syntax