Hi all,
maybe this is a stupid question,howerver,I get a little confused with datastage cluster environment and datastage grid environment. Could someone tell me if they are the same concept or what are the differences between them.
I know datastage engine could be distributed to other machines to build an MPP environment, is that a datastage cluster or a grid?
Thanks for clarifying it to me.
Difference between Cluster & Grid Architecture
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
As far as DataStage is concerned, MPP = cluster, which has a fixed number of machines. A grid has a dynamically variable number of machines, with available resources being managed by some form of grid management software.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
A Cluster is a group of Datastage servers where The admin/designer designs the APT file to use the compute node resources available.
A Grid uses software to auto-gen the APT file to use a subset of resources based on compute node use.
If you have a group of 25 compute nodes, the designer would write his APT file to use 4 compute nodes. If those nodes are busy, Datastage would still try and use those nodes. Under the GRID software, it will look at the utilisation of the compute nodes on the grid and pick which machines are underused and dispatch the job to these nodes.
In addition you can have different software running on the same grid compute nodes...such as SAS, DStage, etc. So a SAS batch job could be dispatched to 2 of the compute nodes which would remove them for contention for a Datastage job..
Cheers,
Ray D
A Grid uses software to auto-gen the APT file to use a subset of resources based on compute node use.
If you have a group of 25 compute nodes, the designer would write his APT file to use 4 compute nodes. If those nodes are busy, Datastage would still try and use those nodes. Under the GRID software, it will look at the utilisation of the compute nodes on the grid and pick which machines are underused and dispatch the job to these nodes.
In addition you can have different software running on the same grid compute nodes...such as SAS, DStage, etc. So a SAS batch job could be dispatched to 2 of the compute nodes which would remove them for contention for a Datastage job..
Cheers,
Ray D
lstsaur : how are DataSets and filesets managed when a job runs on different nodes everytime depending on the available compute nodes?lstsaur wrote:DataStage in a grid environment, APT_Configurations file is generated dynamically that means a job was run on node2 and node4 yesterday, but you don't know this same job will be running on which nodes today. .
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Data Set and File Set descriptor files include the configuration with which they were written. When it comes time to read them, a virtual "read only" descriptor is created from this. (You can accomplish the same thing using the -x option with orchadmin command.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.