Comparison between Database and dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Comparison between Database and dataset

Post by shiva459 »

Hi All,

I am working in a project where is we use Datastage 7.5.2 and db2 as database.I have a senario where in I have to do a lookup on the code tables many times.
I want to know is it better to use database everytime or load the table in a dataset and then do a look up on that dataset.
If anyone has experience on tihs please let me know.

Thanks in advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In general, unless you explicitly specify "sparse" lookup, you will be performing your lookup against a Data Set - a virtual Data Set. There's one associated with every link.

"Sparse" lookups - directly to DB2 or Oracle tables - will be substantially slower than to a virtual Data Set (which is in local memory). However, there is an implied (and enforced) requirement that the reference Data Set must be able to be loaded into memory.

Therefore, with a very large reference table on which you can not be selective, look for alternatives - perhaps a join in the database, perhaps a Join stage in DataStage, perhaps a "sparse" lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

Ray Thanks for your reply.The lookup that I use is static. and there will be max 500 records in it.But I use this many times.So in this case also 'Sparse' lookup is better than dataset right?
ray.wurlod wrote:In general, unless you explicitly specify "sparse" lookup, you will be performing your lookup against a Data Set - a virtual Data Set. There's one associated with every link.

"Sparse" lookups - directly to DB2 or Oracle tables - will be substantially slower than to a virtual Data Set (which is in local memory). However, there is an implied (and enforced) requirement that the reference Data Set must be able to be loaded into memory.

Therefore, with a very large reference table on which you can not be selective, look for alternatives - perhaps a join in the database, perhaps a Join stage in DataStage, perhaps a "sparse" lookup.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If your lookup contains only 500 records then you are much better off using a normal lookup instead of a static one.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

So I dont have to load the data in dataset and then do a lookup right?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No - as Ray already stated, if you use a database for a lookup the engine will load that result into a virtual dataset and do the reference from that.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Take a look at the job score. Your Lookup stage generates a "composite operator" (two operators called LUT_CreateOp and LUT_ProcessOp). The first of these loads data from your table into a virtual Data Set and builds an index on it. The second actually performs the lookups, via the index, into the virtual Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

I understand that if i use the database for a lookup it creates virtual dataset.My problem is I am doing this lookup many times and almost in all the jobs.So is it good to use the database everytime?


Thanks
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

I understand that if i use the database for a lookup it creates virtual dataset.My problem is I am doing this lookup many times and almost in all the jobs.So is it good to use the database everytime?


Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you use the same SELECT for the lookup several times then it is best to perform the database lookup once to a dataset or lookup and then use that for the 2nd and further lookups.
The point of the previous posts is that a "sparse" lookup into a table is to be avoided unless absolutely necessary.
shiva459
Premium Member
Premium Member
Posts: 100
Joined: Wed Jun 11, 2003 1:49 am
Location: Bangalore

Post by shiva459 »

Thanks a lot ArndW and Ray for helping me in this.
I will go ahead and design my job lin this way.

Thanks
Shiva
DSDexter
Participant
Posts: 94
Joined: Wed Jul 11, 2007 9:36 pm
Location: Pune,India

Post by DSDexter »

ArndW wrote:If you use the same SELECT for the lookup several times then it is best to perform the database lookup once to a dataset or lookup and then use that for the 2nd and further lookups.
The point of the ...
Why not load database into lookupfileset instead of dataset :?: According to my understanding lookup fileset will give you that additional throughput as compared to a dataset :idea:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The performance benefits that I've seen with lookup filesets isn't very significant, but lookup filesets have the drawback of not letting you view the data using any tool and need to have the keys that are going to be used hardcoded. I have used Datasets for development in the past and then, once all checking has been completed, converted them to lookup filesets.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The main benefit of using a Lookup File Set is that the index is built when the Lookup File Set is created, and stored within its structure, rather than having to be built "on the fly" by the LUT_CreateOp operator.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply