Writting into hashed files is taking lot of time

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bharathappriyan
Participant
Posts: 47
Joined: Fri Sep 23, 2005 6:01 pm

Writting into hashed files is taking lot of time

Post by bharathappriyan »

Hi,

We have an ETL job which creates the Hashed file and used for look up purpose in the same job. If there is a record missing, the same is written in the file in another link. The input record count is 16 millions. The job is taking around 2 hours to create the file itself. It was taking 45 mins to 1 hour before. Nothing got changed in the ETL job. It is a daily process.

The job design is like this:

Code: Select all

 Lookup table -->Hashed File(File1)(Disabled,Lock for updates)
                                    |
                                                                                  
Source Table ---------------->Transformer---------->Target Table
                                    |
                                     -------> Hashed File(File1)
The hashed file is created in the account.

1. Is this because of UV db reaching its limit? . Is there any limit at all?
2. If i add the IPC, it is aking only 45 mins. What is the impack of IPC here? Is this the right approach?
3. I disabled Allow stage write cache when i created the file. But it behaves differently. Some records are missing. What is the role of Allow stage write cache here?

Thanks,
Bharathappriyan
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First question in this situation is always have you done anything to 'size' the hashed file properly for that volume - meaning not left it with the default Minimum Modulus value of 1? You should have something called the Hashed File Calculator that Ray wrote somewhere in an 'Unsupported' directory with your install media, it will help you with those calculations... 'help' as in do them for you. :wink:

Write Caching just does what it sounds like - it allows the records to be written to memory first and then flushed to disk later / at the end. Nothing about that normally would cause 'missing' records. However, I would not suggest using it in a design of this nature as it could cause cache misses depending on how both hashed file stages are configured. Just leave both without any caching enabled and you'll be fine. I also never had a need to turn on 'lock for updates', seemed to cause more problems than it (allegedly) solved.

And I personally was not a big fan of the IPC stage, it's no Magic Bullet you add to a job to make it go faster and will eventually cause your job to fail with timeout issues. People have written about it here, pros and cons, an exact search should turn those up... believe it was Ernie.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ps. Use code tags for your job designs, not quote tags. Code preserves whitespace and keeps everything lined up, everything else removes 'extra' whitespace and all text gets trimmed and left justified.

I fixed your example for you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply