Heap allocation failed. Code 134

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Heap allocation failed. Code 134

Post by bobyon »

Am getting the following messages:

Code: Select all

main_program: orchgeneral: loaded
orchsort: loaded
orchstats: loaded
APT_BadAlloc: Heap allocation failed.
RT_SC310/OshExecuter.sh: line 25: 14911 Aborted                 (core dumped) $APT_ORCHHOME/bin/osh "$@" 2>&1
Parallel job reports failure (code 134)
Job CopyOfpj_cth_rmd_parse_clob_combine_test1 aborted.
Resetting Job CopyOfpj_cth_rmd_parse_clob_combine_test1.
Job CopyOfpj_cth_rmd_parse_clob_combine_test1 has been reset.
Ive searched the forum and reviewed dozens of entries but nothing yet seems to resolve the issue.

As you can see I've tried resetting the job in Director to see if additional messages are provided: none.
I've set disable combination to TRUE: no help
I've replaced the target sequential file with a peek: nothing changed
We do not have NLS turned on
The Seq files are small, so I don't think it is a question of too many duplicates on the left side of the join.

The job is not complex but might be a little difficult to describe here. There are 4 source sequential files. The first two files are sorted hashed and joined. The result of this join is joined with the other two seq files which are also sorted and hashed.

So said differently a 2 file join and a 3 "file" join (2 seq files and the output leg of the first join).

The second join feeds a TX followed by the peek.

If I eliminate the second join and the following TX and Peek and just send the result of the first join to a peek the job runs successfully, so I've narrowed it down a bit but not to a complete resolution.

Any ideas?

Can someone decipher the following:

Code: Select all

RT_SC310/OshExecuter.sh: line 25: 14911 Aborted                 (core dumped) APT_ORCHHOME/bin/osh "$@" 2>&1
We have a ticket open with support provider but thought someone hear might provide some ideas a little quicker

Thanks,
Bob
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Have you read this one?

http://www-01.ibm.com/support/docview.w ... wg21411997

I know it is AIX specific, but it does have some valid points.

Also - have you checked to make sure all the files are "well formatted"? I seem to remember running out of memory because one of my sequential files wasn't formatted the way it was supposed to be, and the entire file was being loaded without parsing (ie: one big string). Note: View Data is NOT your friend in this. It will "adjust" if it can to make the View work, but the job won't make the same adjustments.

Oh - the OSH.Executer failed because it couldn't allocate the memory required for a process (ie: the Bad Heap Allocation caused the core dump). The Bad Heap Allocation is the root cause. Something is chewing through memory.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

Thanks Andy. that was a big help.

I did discover that one of the sequential files has 150+ columns defined as varchar(255). however current testing is only using 8 records so not certain that alone would cause the issue.

Aslo discovered that the output of the 2nd join includes all 150+ varchar columns except that 4 of those were converted to unbounded longvarchar columns.

I'm not sure how much memory those longvarchars will consume but I am assuming they are REALLY big.

I recommended to the developer that she use better record schema definitions but she explained that since she is building XML records the CLOBs are required.

I also passed on your suggestion to pre-sort the files in a shell script.

The odd thing about all of this is that this same job with the same files runs fine in another environment and similar jobs run successfully in production. So, I'm still thinking there must be a configuration difference somewhere. Just have not found it yet.
Bob
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Put a "ulimit -a" in the Before Job ExecSH and run it on both environments. See if any settings are different between the two systems.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply