DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
sensiva



Group memberships:
Premium Members

Joined: 22 Aug 2017
Posts: 12

Points: 235

Post Posted: Tue Jan 15, 2019 8:45 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Hello All,

Not sure if this type of requirement is common, but i felt it was a little strange.

The requirement is that, a lookup needs to be done on a table which could potentially return multiple rows. If one row is returned, then process the records, but if multiple rows are returned, then ignore the record.

I planned of using a lookup stage with multiple rows returned option, and then thought of using sort stage with key column change option. But still with that I would know, if i get multiple records only from the second record, which means I would have processed the first record by then. I am thinking of other methods, but its getting complex.

Any ideas of how else could this be achieved. I was thinking of remove duplicates as well.

_________________
sen
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 362

Points: 3744

Post Posted: Tue Jan 15, 2019 11:56 am Reply with quote    Back to top    

can you rig some sort of having clause into your lookup query so that if it has more than 1 it returns nothing?
Rate this response:  
Not yet rated
myerdsdupp



Group memberships:
Premium Members

Joined: 25 Oct 2018
Posts: 4

Points: 32

Post Posted: Tue Jan 15, 2019 4:37 pm Reply with quote    Back to top    

Use an aggregate stage after a lookup and get the count out of it. You can use filter or transformer stage to pick only records where count = 1.
Rate this response:  
Not yet rated
sensiva



Group memberships:
Premium Members

Joined: 22 Aug 2017
Posts: 12

Points: 235

Post Posted: Wed Jan 16, 2019 1:19 am Reply with quote    Back to top    

Thanks for your replies

I see both similar as in the first case, we filter with oracle and the other with datastage.

I was asking this question to myself for a while now, there are operations that could be done with oracle and with datastage also, like sort, group by, distinct etc.. My collegues have always been saying to retrive all records from oracle and do all the other functionalities using datastage as its more performant and is considered as a best practise. But not sure if its true always. As our oracle is well partitioned and indexed.

As in this case, if we could implement the having clause, we would treat the same number of records as we receive from the main link, where-as in case of using the lookup stage, we would multiply the number of records to treat(if we have 1million records to treat, and if we have atleast 3 matches for each line we end up treating 2-3 million records, grouping them by storing them in memory and then filtering.

your views.

I am little inclined to using having option. Lets see how the team reacts

_________________
sen
Rate this response:  
Not yet rated
myerdsdupp



Group memberships:
Premium Members

Joined: 25 Oct 2018
Posts: 4

Points: 32

Post Posted: Wed Jan 16, 2019 8:48 pm Reply with quote    Back to top    

I would prefer using a having clause in oracle.

If your data is huge in the reference table then lookup is not the right option.

You can always do a trail and run with your design but try to follow the approach with avoids the bottleneck on the server.
Rate this response:  
Not yet rated
mouthou
Participant



Joined: 04 Jul 2004
Posts: 208

Points: 1991

Post Posted: Sun Jan 20, 2019 11:15 pm Reply with quote    Back to top    

I think your colleagues are not entirely correct and working up entirely in Datastage is a selective scenario. Best practice is to push the operation to DB so the minimal preprocessing like yours happens while selecting the data itself.

'Multiple rows returned option" did what it is supposed to in your case. If that is not your objective, you could try co-related queries similar to your having clause option except that GroupBY is kept separate so that you can select key and all other columns in the outer query)

SELECT <MAINTAB.requiredcolumns>
FROM MAINTAB, (SELECT keycol1 from MAINTAB GROUP BY KEYCOL1 HAVING COUNT(*) = 1) MAINTABuniq
WHERE MAINTAB.keycol1 = MAINTABuniq.keycol1

OR use RANK function if you are familiar with the syntax
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours