usage of join,merge and lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
legendkiller
Participant
Posts: 60
Joined: Sun Nov 21, 2004 2:24 am

usage of join,merge and lookup

Post by legendkiller »

in what scenario we used join, lookup or merge, I guess they serve common purpose
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

You have enough details in Parallel job developer guide. Anyway, lookups have to be done only if the data to be looked up is very small. In that case, lookups' performance will be better than counter parts. On the other hand, if the data volume is high, you have to go for join or merge, depending on what the requirement is.

Regards,
Vignesh.
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Hi


the same topic coverd many times. any way please check previous post.

or check in ParallelDevel.pdf doc.


Thanks
man
legendkiller
Participant
Posts: 60
Joined: Sun Nov 21, 2004 2:24 am

Post by legendkiller »

Hi

Thanks for response.

But what generally defines the small volume of data and what range defines the large volumns of data.

Are there are some standard ranges of volumn of data, so that based on range we clearly say if this is volume of data use lookup other wise join like that.
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

Although lookup, join and merge may serve the same purpose it depends on the scenario as well as the volume of data that you are dealing with.

A lookup is nothing but a left outer join and is used for less volume. A join has more options other than left outer join but it cannot handle rejects. The merge stage is similar to the join stage and is capable of handling rejects.

HTH
--Rich

Pride comes before a fall
Humility comes before honour
legendkiller
Participant
Posts: 60
Joined: Sun Nov 21, 2004 2:24 am

Post by legendkiller »

Thanks again,

but agin my question remains unanswerd. Are there are certain ranges of volume of data for which we can say if this is range use join. if this is range then use lookup etc.
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

As its name suggests, lookups are used for looking up some values based on key values. Generally dimension tables are looked up. Volume of data lookups can handle depends on your hardware configuration of the server. Generally tables or files that do not grow enormously over a period of time are looked up. Also look up will return only one row even if there are more than one row available for the match (ODBC multirow lookup is an exception).

Vignesh.
rajeev_prabhuat
Participant
Posts: 136
Joined: Wed Sep 29, 2004 5:56 am
Location: Chennai
Contact:

Re: usage of join,merge and lookup

Post by rajeev_prabhuat »

Hi Vignesh,

If the reference to a Lookup is directly from a table, and the number of input rows is significantly more than (eg. 1:100 or more), and if job is small enough to fit into available physical memory, then go for lookup. Else go for join or Merge Stage.
legendkiller wrote:in what scenario we used join, lookup or merge, I guess they serve common purpose
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

Thanks for the info rajeev !!
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Hi Rajeev,


U r correct.


Thanks
Man
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

legendkiller wrote:but agin my question remains unanswerd. Are there are certain ranges of volume of data for which we can say if this is range use join. if this is range then use lookup etc.
It should be based on the amount of memory you have available to you at that particular moment. This is not a cut n' dry answer, but with some smart calculations, and careful observation of the job during its run, you can determine what your general limits are.

And be careful -- I'm referring to production machines. That is why it is always best to have an equivalent machine for development with PX -- easy to tune without risking a production box due to mischeft.
Post Reply