Column Analysis failing due to java heap space

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
Novak
Participant
Posts: 97
Joined: Mon May 21, 2007 10:08 pm
Location: Australia

Column Analysis failing due to java heap space

Post by Novak »

Hi experts,

Running on Windows system with 32 GB RAM, 8 CPUs and 4 node file.
We have a flat file with 96 columns needing to be analyzed.
When running on a sample of 20 records it runs incredibly slow, with only a few columns analyzed before eventually failing with "java.lang.OutOfMemoryError: Java heap space" error message.

What we have noticed also, is that in Director's log there is 10 jobs for 1 column analysis being run. Guessing that is what is making the process heavy.

Does anyone know how can we fix this?

Regards,

Novak
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Is your DataStage 32bit or 64bit?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Novak
Participant
Posts: 97
Joined: Mon May 21, 2007 10:08 pm
Location: Australia

Post by Novak »

Hi Craig,

It is 32bit.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can increase the size of Java heap space. Search here and/or IBM Information Center for "Xmx".

If you run your column analyses with "preserve scripts" enabled, you will be able to look at the jobs in DataStage director, including the logs, to ascertain what some of these processes do.

Note, too, that Information Analyzer will break up a column analysis request into multiple requests each of which doesn't process too many columns. So, to process your 96 columns, it's no real surprise that the workload was split into ten units each processing 10 (or 9) columns.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I only ask because you have more ability with a 64bt system to increase the heap size than you do with a 32bit one and the memory limits it brings. There are several Technotes out there on that subject, here is one example.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Novak
Participant
Posts: 97
Joined: Mon May 21, 2007 10:08 pm
Location: Australia

Post by Novak »

Thanks a lot guys.

We will almost definitely upgrade to 64-bit on Linux and hopefully within couple of months. This is the second time I am running IA on Windows and it is painful to say the least. Not just because of this failure, but the overall end user response times.

Until then, and on advice from IBM's support, we have continued our data profiling on 2 nodes, rather than 8. Hardly any failures since then.
The run times between them are not that different so we can live with it.

Cheers,

Novak
Post Reply