size of the logs affecting job performance

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
anton
Premium Member
Premium Member
Posts: 20
Joined: Wed Jul 19, 2006 9:32 am

size of the logs affecting job performance

Post by anton »

before i go into more details, i just wanted to see if any of you guys know of any issues off the top of your head that stem from the size of the logs.

we keep a few days worth of logs around, and we noticed that as the size of the log grows, jobs take longer and longer to complete (+10-20 minutes, with the usual runtimes of 30-40 minutes).

we are talking about several thousand entries (sometimes up to 10K+), we use a sequencer to kickoff the jobs that in turn calls a custom basic routine that deals with parameters, then kicks off the job and checks its return status.

if the logs are reset or the job is recompiled right before the run, it often performs much faster. if left alone with every run it gets slower and slower.

nothing special, everything is pretty standard, ibm support could not find any relevant cases. just looking for some pointers.

thank you.

p.s. a bit unrelated, but sometimes when trying to check logs using dsjob from command line or director the client process will hang, while dsapi_slaves will hang around for days eating quite a lot of cpu, until we kill them. we even put a script together to monitor them.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

It's true that the size of the logs affect the performance of job. It's always a good practice to clean the log entries regularly. We here, maintain logs for one week. Anything prior to that is auto-purged. There are quite a few posts discussed here regarding that which you can find by search.
Kris

Where's the "Any" key?-Homer Simpson
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As Kris notes - yes. This may sound obvious but we've also found that the disk you are installed on can make an enormous difference as well. Found out our initial setup put the DataStage directory on some old, slow SCSI drives that just couldn't keep up with all the requests they were getting. Migrating that to 'primary' enterprise type storage solved that particular problem.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Active stages insert lines into the log when the stages start and finish. If the log is extremely large or poorly sized (they are dynamic hashed files afterall), then these messages can seriously take a long time to get added to the hashed file. The more active stages in the job, the longer it takes to get the job running and stopping, not to mention the startup and wrapup messages from the main job controller.

The point it, your project needs to be on an efficient file system, your logs have to be kept small, and you periodically MUST clear out the logs entirely.

Just purging is not sufficient. The files get upsized to a high watermark, only a CLEAR.FILE actually removes the storage information and sets the hashed file back to its initial size. Consider grabbing the utilities on my website members area to clear the logs via a job. Use that utility to truncate the logs, then the other utility to reset the log purg settings on the jobs afterwards.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
anton
Premium Member
Premium Member
Posts: 20
Joined: Wed Jul 19, 2006 9:32 am

Post by anton »

kenneth, thank you. so you are saying that having a project-level purging setting and even clearing the logs in the director is not enough - i should do an extra step to clear them?

others: we have purging setup - we used to keep 3 days, but that was slowing things down. now if we purge the logs right before the run every night, things go smoothly. you comments make sense and are common sense and we were following all of these recommendations. what we are seeing however, is the fact that we are keeping several days of logs results in jobs running 10-20 minutes slower than usual. this is beyond "big logs cause some performance problems" common sense issue.

thank you for the input.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Dynamic hashed files have a modulus value that increases as the file grows. There's only two ways to change the modulus of a hashed file:

CLEAR.FILE: removes all data and resets the file to the minimum modulus

RESIZE.FILE: analyze the current file contents and adjust the modulus to an optimal setting

Purging deletes the contents, similar to a SQL delete statement. No space is reclaimed, the file stays the same size and the modulus doesn't adjust.

Adding rows to the file can potentially grow the file even larger, even if it has almost no "real" data, just empty space.

A manual purge to clear the file actually must issue the CLEAR.FILE statement and replace the purge settings row into the log file. Because you would have to do this by hand for all log files, I wrote utility jobs to help faciliate this.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... up to version 7.5.2 at least.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
anton
Premium Member
Premium Member
Posts: 20
Joined: Wed Jul 19, 2006 9:32 am

Re: size of the logs affecting job performance

Post by anton »

thanks for your input guys. ibm acknowledged the problem and provided the patch.

in some cases log files get corrupted, then performance drops through the floor. this seems to be inherent to the deficiencies with universe db file hashing that stores the logs.

if this bites you, ask support for a patch.

thank you.
Post Reply