Difference between Cluster & Grid Architecture

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
ppgoml
Participant
Posts: 58
Joined: Mon Aug 20, 2007 11:00 pm

Difference between Cluster & Grid Architecture

Post by ppgoml »

Hi all,

maybe this is a stupid question,howerver,I get a little confused with datastage cluster environment and datastage grid environment. Could someone tell me if they are the same concept or what are the differences between them.

I know datastage engine could be distributed to other machines to build an MPP environment, is that a datastage cluster or a grid?

Thanks for clarifying it to me.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As far as DataStage is concerned, MPP = cluster, which has a fixed number of machines. A grid has a dynamically variable number of machines, with available resources being managed by some form of grid management software.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

A Cluster is a group of Datastage servers where The admin/designer designs the APT file to use the compute node resources available.

A Grid uses software to auto-gen the APT file to use a subset of resources based on compute node use.

If you have a group of 25 compute nodes, the designer would write his APT file to use 4 compute nodes. If those nodes are busy, Datastage would still try and use those nodes. Under the GRID software, it will look at the utilisation of the compute nodes on the grid and pick which machines are underused and dispatch the job to these nodes.

In addition you can have different software running on the same grid compute nodes...such as SAS, DStage, etc. So a SAS batch job could be dispatched to 2 of the compute nodes which would remove them for contention for a Datastage job..

Cheers,

Ray D
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

DataStage in a grid environment, APT_Configurations file is generated dynamically that means a job was run on node2 and node4 yesterday, but you don't know this same job will be running on which nodes today. Grid enablement toolkit wouldn't work in a clustered environment since nothing is shared.
ppgoml
Participant
Posts: 58
Joined: Mon Aug 20, 2007 11:00 pm

Post by ppgoml »

Thanks for all your input. It's much clear to me now.
Terala
Premium Member
Premium Member
Posts: 73
Joined: Wed Apr 06, 2005 3:04 pm

Post by Terala »

lstsaur wrote:DataStage in a grid environment, APT_Configurations file is generated dynamically that means a job was run on node2 and node4 yesterday, but you don't know this same job will be running on which nodes today. .
lstsaur : how are DataSets and filesets managed when a job runs on different nodes everytime depending on the available compute nodes?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Data Set and File Set descriptor files include the configuration with which they were written. When it comes time to read them, a virtual "read only" descriptor is created from this. (You can accomplish the same thing using the -x option with orchadmin command.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Also remember in a grid (grid computing), EVERYTHING is shared.
Post Reply