Physical Vs. Virtual Head nodes on a grid

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Physical Vs. Virtual Head nodes on a grid

Post by bobyon »

We are in the process of desigining our architecture for a new 9.1.2 install on a grid.

What major considerations are there, other than cost, when deciding between virtual and physical head nodes? All of the compute nodes will be physical servers, but there is some debate here over whether performance will improve or degrade if using virtual servers for the head nodes.

Given all else being equal (network, storage, memory, etc.) which will give the best performance virtual or physical head nodes?
Bob
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

The performance may be less because you don't know what the performance impact will be by the workload of other instances on the same hardware node.

Good thing about grid is that jobs don't run on the head node. So no big deal for using virtual server or physical server for the head node.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I'm not a big fan of virtuals for my ETL environment. The ETL environment needs to PUSH DATA and fast. Head Nodes in general tend to be IO intensive because of a number of reasons. People target it for FTP transfers, people write scripts to masage data, people do (un)zips on the head node... the list goes on. For that reason I prefer a beefy head node that is as close to the network card as can be. If you are the only host on the box, you know you don't have to share the network card or memory, etc...


You've spent a ton on this product if you are running a grid. COST of the head node server should not be a factor. Your environment is already mostly non-virtual so your internal company politics can be overcome.

You most likely have the virtual server folks harping about how CLOUD needs to be the wave of the future... since you are setting up your environment, you COULD throw them a bone and set up an Active/Passive server off to the side. Make it your target FTP / scripts / file backup server. Anything you want to farm off the head node.


Don't stick the engine binaries on it since you don't have to licence datastage on it for FTPs. Just mount the application teams data mounts.

Those boxes could be virtual and you'd satisfy the cloud people and remove precious IO off the Head Node.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

Paul,

Thanks for your comments. It's obvious you have been around this block a time or two.

You are correct about the IO on the head node. We have experienced problems in the past due to excessive IO on the head node and when the head node gets bogged down so goes the entire grid. We are currently dodging that bullet thanks to an upgraded filer and pipe to it, but we have not yet done as you suggest and move non-DataStage activity off the head node.

And, as you suggest we would like to avoid "noisy neighbors" on shared resouces, or being that noisy neighbor to others. We see lots of good reasons for getting as closed to shared nothing as possible,

However, like you said, our challenge is not cost. The challenge, and the reason we are seeking as much information we can find that supports non-shared resources, is that the our data center is trying to reduce as much physical space and power consumption as possible as we grow toward it's capacity.

Again, thanks for your input.

OH. And yes, we are also hearing the clamoring for cloud, so again your comments are very helpful. We will certainly consider your suggestions.
Bob
Post Reply