Abnormal Termination-add_to_heap() Unable to allocate memory

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rsrikant
Participant
Posts: 58
Joined: Sat Feb 28, 2004 12:35 am
Location: Silver Spring, MD

Abnormal Termination-add_to_heap() Unable to allocate memory

Post by rsrikant »

hi,

I get these two errors in my director log when i run my jobs.

1. Abnormal termination of stage. (Error)

This job has lot of hash file lookups. Around 10 hash file look ups.
The job reads from a sequential file, looks up in these 10 hash files and loads in to Oracle as well as a couple of hash files.

The same job runs on our DEV and TEST boxes. But in PROD i get this abnormal termination error and the job gets aborted.

I splitted the job in to two. 5 hash file look ups in each job. After splitting into two, the jobs complete with out aborting.

Any idea why this error comes in one box alone?



2. add_to_heap() - Unable to allocate memory (Warning)

This job reads from oracle and loads in to two hash files. Each hash file has around 15 million records.

Once it reaches around 2 miilion records it gives this warning and the job continues.

Both hash files are dynamic and write caching is enabled.

This warning comes on all three DEV / TEST / PROD boxes.

Any idea on what settings i need to change to avoid this warning. Are there any memory / kernel / shared memory settings i can work on to avoid this.


Thanks,
Srikanth
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Srikanth

What is the size of the file? Usually this is a hash file which is corrupted. This usually happens when you have a system crash or run out of disk space. If your hash file is created in the account then at TCL do

DELETE.FILE MyHashFile
or
CLEAR.FILE MyHashFile

If it is a respoitory file like DS*, RT* or something important like VOC then you have problems. If a file is corrupt then you can usually tell by counting records:

COUNT MyHashFile
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Abnormal Termination-add_to_heap() Unable to allocate me

Post by chulett »

rsrikant wrote:2. add_to_heap() - Unable to allocate memory (Warning)

This job reads from oracle and loads in to two hash files. Each hash file has around 15 million records.

Once it reaches around 2 miilion records it gives this warning and the job continues.

Both hash files are dynamic and write caching is enabled.

This warning comes on all three DEV / TEST / PROD boxes.

Any idea on what settings i need to change to avoid this warning. Are there any memory / kernel / shared memory settings i can work on to avoid this.
Turn OFF write caching. As best as we can tell, this annoying "warning" shows up once the cache fills. The other option would be to bump your default write cache size for the Project up high enough to stop this message from appearing - but that change will effect all jobs.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If your hash file is over 2GB then you need to add the 64bit option. Look at the OS level and add the sizes of DATA.30 and OVER.30. I wouild say make it 64 bit no matter what. This is a large hash file and why worry about it. If it does it on all 3 boxes then it has to be too big for 32 bit hash file.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Or find some options to make it less than 2GB, such as loading it only with columns and rows that are actually required in the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rsrikant
Participant
Posts: 58
Joined: Sat Feb 28, 2004 12:35 am
Location: Silver Spring, MD

Post by rsrikant »

Hi,

Thanks for the replies.

The hash file is not that big. It has very few columns. And it is below 1 GB. The warnings come when the hash file reaches 100 MB and they keep coming once in a while until the job completes.

Craig -- How to increase the write cache at the project level?? If i turn off the write caching the performance is very slow.

Kim -- Are you talking about the abnormal termination error? If yes, then i deleted the hash files from command prompt and tried running the job. But still i get this abnormal termination in PROD box alone.

For the abnormal termination error, this job runs fine on DEV and TEST boxes. Only on PROD box i get the error. Once i split the job into two to distribute the hash lookups between these jobs, i got rid of the problem.
I wanted to know why it occurs in PROD box alone. Is it some memory settings in PROD box not allowing to have 10 hash files to be opened at a time or something similar to that?


Thanks,
Srikanth
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

rsrikant wrote:Craig -- How to increase the write cache at the project level?? If i turn off the write caching the performance is very slow.
I hardly ever have write caching turned on and I can get very high speed performance from hashed file writes - provided the hashed file is properly precreated.

That being said, if you want to try bumping the write cache size - it is done via the Adminstrator from the Tunables tab of each Project, from what I recall. The effect of a change there is immediate, nothing 'extra' needs to be done.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

That is really odd. You fixed it in DEV and TEST but not PROD? Did you change your Oracle client? If you did then I would suggest that you have different versions on the different machines. Maybe you have a memory leak in your Oracle client.
Mamu Kim
rsrikant
Participant
Posts: 58
Joined: Sat Feb 28, 2004 12:35 am
Location: Silver Spring, MD

Post by rsrikant »

Kim - I believe the problem is with no. of hash files and not with the oracle client. Because once i split the job in to two and reduced the hash look ups the jobs are running in PROD as well. Is my understanding wrong?

Thanks craig. I found where to change the write cache limit in administrator.

Thanks,
Srikanth
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Maybe there are enough rows in production to exceed the hashed file cache size limit (default 128MB) but not in development or test environments?

Try increasing the read and write cache sizes in production, using Administrator client, Tunables tab.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Srikanth

The cache size can only be set to what shmtest command will allow. Beyond that you get these kinds of errors. This command will tell what to set the parameters in uvconfig. You must then do a uvregen. Then you need to stop and restart DataStage.

You need to open a ticket with support on this with IBM. Should not be happening. Your DEV and TEST boxes are different machines than PROD. Therefore uvconfig will be different as well as the results of shmtest.

You are correct in spliting the file because if the file exceeds the cache limit then it gives you a warning and uses it from disk and not in memory. That is a very clever solution. If you have a natural way to split your keys then why not have 2 files in memory because it will run lots faster with 2 files in memory and 2 lookups instead of one lookup from disk.

I will look for my notes on shmtest. I am doing all this from memory and my memory is not as good as it used to be. Maybe we can get a full Wurlod.

No matter. You are on the right track. Let us know how you solve it.

If you reinstalled or upgraded DataStage then your uvconfig file got overwritten. It went back to the defaults which are not optimal.
Mamu Kim
Post Reply