Page 1 of 1

Comparison between Database and dataset

Posted: Thu Feb 07, 2008 9:05 pm
by shiva459
Hi All,

I am working in a project where is we use Datastage 7.5.2 and db2 as database.I have a senario where in I have to do a lookup on the code tables many times.
I want to know is it better to use database everytime or load the table in a dataset and then do a look up on that dataset.
If anyone has experience on tihs please let me know.

Thanks in advance

Posted: Thu Feb 07, 2008 11:51 pm
by ray.wurlod
In general, unless you explicitly specify "sparse" lookup, you will be performing your lookup against a Data Set - a virtual Data Set. There's one associated with every link.

"Sparse" lookups - directly to DB2 or Oracle tables - will be substantially slower than to a virtual Data Set (which is in local memory). However, there is an implied (and enforced) requirement that the reference Data Set must be able to be loaded into memory.

Therefore, with a very large reference table on which you can not be selective, look for alternatives - perhaps a join in the database, perhaps a Join stage in DataStage, perhaps a "sparse" lookup.

Posted: Fri Feb 08, 2008 1:05 am
by shiva459
Ray Thanks for your reply.The lookup that I use is static. and there will be max 500 records in it.But I use this many times.So in this case also 'Sparse' lookup is better than dataset right?
ray.wurlod wrote:In general, unless you explicitly specify "sparse" lookup, you will be performing your lookup against a Data Set - a virtual Data Set. There's one associated with every link.

"Sparse" lookups - directly to DB2 or Oracle tables - will be substantially slower than to a virtual Data Set (which is in local memory). However, there is an implied (and enforced) requirement that the reference Data Set must be able to be loaded into memory.

Therefore, with a very large reference table on which you can not be selective, look for alternatives - perhaps a join in the database, perhaps a Join stage in DataStage, perhaps a "sparse" lookup.

Posted: Fri Feb 08, 2008 2:08 am
by ArndW
If your lookup contains only 500 records then you are much better off using a normal lookup instead of a static one.

Posted: Fri Feb 08, 2008 2:40 am
by shiva459
So I dont have to load the data in dataset and then do a lookup right?

Posted: Fri Feb 08, 2008 3:09 am
by ArndW
No - as Ray already stated, if you use a database for a lookup the engine will load that result into a virtual dataset and do the reference from that.

Posted: Fri Feb 08, 2008 3:18 am
by ray.wurlod
Take a look at the job score. Your Lookup stage generates a "composite operator" (two operators called LUT_CreateOp and LUT_ProcessOp). The first of these loads data from your table into a virtual Data Set and builds an index on it. The second actually performs the lookups, via the index, into the virtual Data Set.

Posted: Fri Feb 08, 2008 4:56 am
by shiva459
I understand that if i use the database for a lookup it creates virtual dataset.My problem is I am doing this lookup many times and almost in all the jobs.So is it good to use the database everytime?


Thanks

Posted: Fri Feb 08, 2008 4:57 am
by shiva459
I understand that if i use the database for a lookup it creates virtual dataset.My problem is I am doing this lookup many times and almost in all the jobs.So is it good to use the database everytime?


Thanks

Posted: Fri Feb 08, 2008 5:01 am
by ArndW
If you use the same SELECT for the lookup several times then it is best to perform the database lookup once to a dataset or lookup and then use that for the 2nd and further lookups.
The point of the previous posts is that a "sparse" lookup into a table is to be avoided unless absolutely necessary.

Posted: Fri Feb 08, 2008 5:12 am
by shiva459
Thanks a lot ArndW and Ray for helping me in this.
I will go ahead and design my job lin this way.

Thanks
Shiva

Posted: Sat Feb 09, 2008 1:13 am
by DSDexter
ArndW wrote:If you use the same SELECT for the lookup several times then it is best to perform the database lookup once to a dataset or lookup and then use that for the 2nd and further lookups.
The point of the ...
Why not load database into lookupfileset instead of dataset :?: According to my understanding lookup fileset will give you that additional throughput as compared to a dataset :idea:

Posted: Sat Feb 09, 2008 3:35 am
by ArndW
The performance benefits that I've seen with lookup filesets isn't very significant, but lookup filesets have the drawback of not letting you view the data using any tool and need to have the keys that are going to be used hardcoded. I have used Datasets for development in the past and then, once all checking has been completed, converted them to lookup filesets.

Posted: Mon Feb 11, 2008 10:27 pm
by ray.wurlod
The main benefit of using a Lookup File Set is that the index is built when the Lookup File Set is created, and stored within its structure, rather than having to be built "on the fly" by the LUT_CreateOp operator.