Grid storage NAS vs SAN

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

A users HOME directory does not need to be a shared mount.

It does need SSH keys set up (assuming you are using ssh rather than rsh) between HN and CNs.


NAS vs SAN... as long as the data mounts, project mounts, engine mounts are shared and visible to each host in question. Doesn't matter.

I can speak from experience that I prefer NAS over shared SAN.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Paul,
I'm still not clear on what is required of the file system and what the issues with having a SAN might be.
We have two head nodes sharing the compute nodes and some parts of the file system use GPFS e.g. for datasets. I thought this was to allow all the nodes access to specific parts of the file system.

Is it possible to set up an Information Server grid using a SAN without having a clustered file system?
Are there any downsides or issues in not having a clustered file system in a setup where multiple head nodes (for different environments) share compute nodes?
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

The file system must support files being shared between servers with locking. This basically means NFS, IBM GPFS (rebranded as Spectrum Scale) or Red Hat GFS2 with the Resilient Storage add-on. If you don't put in a locking file system you are going to have corruption.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

The DataStage binaries (install path) must be shared between HN and CNs.
The Project Directories must be shared between HN and CNs.
The Data must be shared (for obvious reasons as well).

The Batch IDs Home path, do NOT have to be shared, but it's handy if they are.

The GRid Job Dir path (used to store the dynamically created APT file) must be shared.
The grid Fifo directory must be local to the HN.

TMPDIR should be local to the HN but not mandatory, it can be a shared mount but the shared aspect is not a factor. Speed is important on this mount and IBM will advise to make it local.

"Shared" can be done in many ways, so as long as you poke a file on HN and it's the same physical file on the CNs, mission accomplished.


SCRATCH disk should be local (and fast) to the HN and each CN. It CAN be a shared mount, but typically the SHARED aspect slows down the data access speed so given that you want scratch as fast as you can, make it local to each host.

Websphere does not need to be exposed to the compute nodes.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

PaulVL wrote:Websphere does not need to be exposed to the compute nodes.
My personal opinion is nothing should EVER be exposed to WebSphere... :-)
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

There are some stages that use some Java JAR files where are only present on Websphere path, so IBM made some "assumptions" and never tested the IRULE or ILOG stuff on a cluster/grid environment. I ran into that and basically had to copy a JAR file over in order to make the environment work. The support PMR basically said "So you have it working now... can we close the ticket?" instead of ensuring that the JAR file would be on a compute nodes in a grid setting.
Post Reply