Comparison between Database and dataset
Moderators: chulett, rschirm, roy
Comparison between Database and dataset
Hi All,
I am working in a project where is we use Datastage 7.5.2 and db2 as database.I have a senario where in I have to do a lookup on the code tables many times.
I want to know is it better to use database everytime or load the table in a dataset and then do a look up on that dataset.
If anyone has experience on tihs please let me know.
Thanks in advance
I am working in a project where is we use Datastage 7.5.2 and db2 as database.I have a senario where in I have to do a lookup on the code tables many times.
I want to know is it better to use database everytime or load the table in a dataset and then do a look up on that dataset.
If anyone has experience on tihs please let me know.
Thanks in advance
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In general, unless you explicitly specify "sparse" lookup, you will be performing your lookup against a Data Set - a virtual Data Set. There's one associated with every link.
"Sparse" lookups - directly to DB2 or Oracle tables - will be substantially slower than to a virtual Data Set (which is in local memory). However, there is an implied (and enforced) requirement that the reference Data Set must be able to be loaded into memory.
Therefore, with a very large reference table on which you can not be selective, look for alternatives - perhaps a join in the database, perhaps a Join stage in DataStage, perhaps a "sparse" lookup.
"Sparse" lookups - directly to DB2 or Oracle tables - will be substantially slower than to a virtual Data Set (which is in local memory). However, there is an implied (and enforced) requirement that the reference Data Set must be able to be loaded into memory.
Therefore, with a very large reference table on which you can not be selective, look for alternatives - perhaps a join in the database, perhaps a Join stage in DataStage, perhaps a "sparse" lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray Thanks for your reply.The lookup that I use is static. and there will be max 500 records in it.But I use this many times.So in this case also 'Sparse' lookup is better than dataset right?
ray.wurlod wrote:In general, unless you explicitly specify "sparse" lookup, you will be performing your lookup against a Data Set - a virtual Data Set. There's one associated with every link.
"Sparse" lookups - directly to DB2 or Oracle tables - will be substantially slower than to a virtual Data Set (which is in local memory). However, there is an implied (and enforced) requirement that the reference Data Set must be able to be loaded into memory.
Therefore, with a very large reference table on which you can not be selective, look for alternatives - perhaps a join in the database, perhaps a Join stage in DataStage, perhaps a "sparse" lookup.
If your lookup contains only 500 records then you are much better off using a normal lookup instead of a static one.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
No - as Ray already stated, if you use a database for a lookup the engine will load that result into a virtual dataset and do the reference from that.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Take a look at the job score. Your Lookup stage generates a "composite operator" (two operators called LUT_CreateOp and LUT_ProcessOp). The first of these loads data from your table into a virtual Data Set and builds an index on it. The second actually performs the lookups, via the index, into the virtual Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
If you use the same SELECT for the lookup several times then it is best to perform the database lookup once to a dataset or lookup and then use that for the 2nd and further lookups.
The point of the previous posts is that a "sparse" lookup into a table is to be avoided unless absolutely necessary.
The point of the previous posts is that a "sparse" lookup into a table is to be avoided unless absolutely necessary.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Why not load database into lookupfileset instead of dataset According to my understanding lookup fileset will give you that additional throughput as compared to a datasetArndW wrote:If you use the same SELECT for the lookup several times then it is best to perform the database lookup once to a dataset or lookup and then use that for the 2nd and further lookups.
The point of the ...
The performance benefits that I've seen with lookup filesets isn't very significant, but lookup filesets have the drawback of not letting you view the data using any tool and need to have the keys that are going to be used hardcoded. I have used Datasets for development in the past and then, once all checking has been completed, converted them to lookup filesets.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The main benefit of using a Lookup File Set is that the index is built when the Lookup File Set is created, and stored within its structure, rather than having to be built "on the fly" by the LUT_CreateOp operator.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.