Memory usage by lookup stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Memory usage by lookup stage

Post by asorrell »

So, I informed a developer that the lookup stage (non-sparse) should typically be used when referencing data in smaller tables, which allows the structure to fit into memory.

He asked two very good questions for which I didn't have an answer.

1) Does it use standard virtual memory to buffer the lookup records, or does it have a "reserved" buffer area.

2) If it has a reserved area, what is the size and is there anyway to set or increase the size of that area.

The reason he's probably asking is that we're running in a BigIntegrate (DataStage on Yarn) environment with really large (5 GB) container sizes, so if memory is there and available, why not use it for a larger lookup buffer?

I think its using general virtual memory, but wanted to confirm...
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I do not know for sure but strongly suspect that it uses standard memory allocation, which via your operating system also provides virtual memory.

I also think there is more going on behind the scenes than the classroom training or the documentation tells us.

For instance, I once had a normal lookup with around 9 GB data on the reference link, which should have been fine considering my server had 40+ GB RAM free. It was taking over 45 minutes of disk I/O before the reference data actually got loaded into memory because DataStage was building some sort of intermediate memory-mapped file on disk (for the lookup to consume), which I assume was also indexed in some way, and then reading the resulting file and loading it into memory. So, while we had the RAM, there was some unexpected, undocumented disk activity that killed the performance for us.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply