AIX vs Linux DataStage performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

AIX vs Linux DataStage performance

Post by thompsonp »

I've used different versions of DataStage on both AIX and Linux and Windows over the years.
However I've never done any comparison of the relative performance on different platforms.
The underlying hardware is likely to be different for AIX and Linux (although you could run Linux or AIX on an IBM Power CPU box).
I know real world performance will depend on lots of things, not least the specifics of what our ETL jobs do and what they are connecting with, but does anyone have any metrics to compare performance between the platforms when using DataStage?

What I am really interested in finding out is the relative performance against cost. Not only is the hardware priced quite differently but the PVU licensing costs seem to be weighted in favour of running Power8 (or previous Power generation) CPUs. For example I've seen a 4 cpu, 24 core Power8 compare favourably to a 4 cpu, 70 core Xeon box (although DataStage wasn't being used).
The PVU difference between the two is significant.
The PVU of a Power system running Linux is also significantly lower than one running AIX making me wonder if there is a significant performance disadvantage of doing so.

Any thoughts or real world comparisons welcome please.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

It would be interesting to see some benchmarks like that.

The IBM PVU calculator does have a drop-down choice for Linux on any POWER system with a ratio of 70 value units per core. That matches up with some of the other POWER8 server models that also have a ratio of 70 which could also run AIX. Yet there are still other POWER models with ratios of 80, 100, or 120. Perhaps it represents some sort of Linux discount?
Choose a job you love, and you will never have to work a day in your life. - Confucius
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Another criteria to think about is availability of features and support.

There are variations in DataStage features based on operating system e.g. BDFS was initially launched only for linux and about 1+ year later made available for AIX and that too for connectivity to BigInsights only.
These variations are not really highlighted and are easy to overlook. Do ask these questions to ensure that DataStage AIX meets your business needs.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

It's been my experience that the performance of a datastage job has not been throttled by CPU but by IO to and from your data sources.

DISK, DataBase, or SFTP... that has mainly been the deciding factor on job speed. Poor job design as well.


I personally prefer RHEL.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Thanks for your thoughts.
I've spent a while trying to find some metrics / benchmarks / comparisons without much success.
I did find a paper where Intel and IBM ran some tests to show the overhead for running in a virtual environment - 5 to 10% drop over the physical server.

I've still not been able to find a comparison between AIX and Linux - though I am waiting to see if IBM have any data available to help size a new Linux installation. In the past they have sized a server based on the volume of input data and an assumption about how that grows through the ETL e.g. a 1GB file may generate 3GB of interim data before being loaded.

So let me open up the question and ask if anyone has any metrics for their DataStage installs running on Linux? What server specs do you have and how much data do you process an hour for example?
Post Reply