I've used different versions of DataStage on both AIX and Linux and Windows over the years.
However I've never done any comparison of the relative performance on different platforms.
The underlying hardware is likely to be different for AIX and Linux (although you could run Linux or AIX on an IBM Power CPU box).
I know real world performance will depend on lots of things, not least the specifics of what our ETL jobs do and what they are connecting with, but does anyone have any metrics to compare performance between the platforms when using DataStage?
What I am really interested in finding out is the relative performance against cost. Not only is the hardware priced quite differently but the PVU licensing costs seem to be weighted in favour of running Power8 (or previous Power generation) CPUs. For example I've seen a 4 cpu, 24 core Power8 compare favourably to a 4 cpu, 70 core Xeon box (although DataStage wasn't being used).
The PVU difference between the two is significant.
The PVU of a Power system running Linux is also significantly lower than one running AIX making me wonder if there is a significant performance disadvantage of doing so.
Any thoughts or real world comparisons welcome please.
AIX vs Linux DataStage performance
Moderators: chulett, rschirm, roy
It would be interesting to see some benchmarks like that.
The IBM PVU calculator does have a drop-down choice for Linux on any POWER system with a ratio of 70 value units per core. That matches up with some of the other POWER8 server models that also have a ratio of 70 which could also run AIX. Yet there are still other POWER models with ratios of 80, 100, or 120. Perhaps it represents some sort of Linux discount?
The IBM PVU calculator does have a drop-down choice for Linux on any POWER system with a ratio of 70 value units per core. That matches up with some of the other POWER8 server models that also have a ratio of 70 which could also run AIX. Yet there are still other POWER models with ratios of 80, 100, or 120. Perhaps it represents some sort of Linux discount?
Choose a job you love, and you will never have to work a day in your life. - Confucius
Another criteria to think about is availability of features and support.
There are variations in DataStage features based on operating system e.g. BDFS was initially launched only for linux and about 1+ year later made available for AIX and that too for connectivity to BigInsights only.
These variations are not really highlighted and are easy to overlook. Do ask these questions to ensure that DataStage AIX meets your business needs.
There are variations in DataStage features based on operating system e.g. BDFS was initially launched only for linux and about 1+ year later made available for AIX and that too for connectivity to BigInsights only.
These variations are not really highlighted and are easy to overlook. Do ask these questions to ensure that DataStage AIX meets your business needs.
Thanks for your thoughts.
I've spent a while trying to find some metrics / benchmarks / comparisons without much success.
I did find a paper where Intel and IBM ran some tests to show the overhead for running in a virtual environment - 5 to 10% drop over the physical server.
I've still not been able to find a comparison between AIX and Linux - though I am waiting to see if IBM have any data available to help size a new Linux installation. In the past they have sized a server based on the volume of input data and an assumption about how that grows through the ETL e.g. a 1GB file may generate 3GB of interim data before being loaded.
So let me open up the question and ask if anyone has any metrics for their DataStage installs running on Linux? What server specs do you have and how much data do you process an hour for example?
I've spent a while trying to find some metrics / benchmarks / comparisons without much success.
I did find a paper where Intel and IBM ran some tests to show the overhead for running in a virtual environment - 5 to 10% drop over the physical server.
I've still not been able to find a comparison between AIX and Linux - though I am waiting to see if IBM have any data available to help size a new Linux installation. In the past they have sized a server based on the volume of input data and an assumption about how that grows through the ETL e.g. a 1GB file may generate 3GB of interim data before being loaded.
So let me open up the question and ask if anyone has any metrics for their DataStage installs running on Linux? What server specs do you have and how much data do you process an hour for example?