Datastage 8.0.1
OS: AIX 5.3.0.0
When trying to write to a dataset, I'm getting the following errors:
########################
FATAL :
########################
APT_CombinedOperatorController(7),4: Write to dataset on [fd 17] failed (Error 0) on node node5, hostname <Server name>
APT_CombinedOperatorController(7),4: Orchestrate was unable to write to any of the following files:
APT_CombinedOperatorController(7),4: /DataStage/data/<filename>
APT_CombinedOperatorController(7),0: Write to dataset on [fd 17] failed (Error 0) on node node1, hostname <Server name>
APT_CombinedOperatorController(7),0: Orchestrate was unable to write to any of the following files:
APT_CombinedOperatorController(7),0: /DataStage/data/<filename>
APT_CombinedOperatorController(7),4: Block write failure. Partition: 4
<Filename>,4: Failure during execution of operator logic.
APT_CombinedOperatorController(7),4: Fatal Error: File data set, file "/DataStage/data/<Filename>.ds".; output of "<Filename>": DM getOutputRecord error.
APT_CombinedOperatorController(7),0: Block write failure. Partition: 0
<Filename>,0: Failure during execution of operator logic.
APT_CombinedOperatorController(7),0: Fatal Error: File data set, file "/DataStage/data/<Filename>.ds".; output of "<Filename>": DM getOutputRecord error.
node_node1: Player 67 terminated unexpectedly.
node_node5: Player 64 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node1), player 67 - Unexpected exit status 1.
<Filename 2>,0: Failure during execution of operator logic.
<Filename 2>,0: Fatal Error: Unable to allocate communication resources
node_node1: Player 42 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node1), player 42 - Unexpected exit status 1. (...)
<Filename 2>,4: Failure during execution of operator logic.
<Filename 2>,4: Fatal Error: Unable to allocate communication resources
main_program: Step execution finished with status = FAILED.
########################
Failed to execute job :<Job Name>. Return Code : 16
In the same log, we see also:
Message:: main_program: The open files limit is 2000; raising to 2147483647.
I do not know if it is normal.
Another log in /DataStage/MetaData/<project_name>/&PH&/ gives
"DataStage Job 1035 Phantom 20950
readSocket() returned 16
DataStage Phantom Finished."
The Setting is unchanged. We have unix rights in the directories.
We have now this problem in 3 servers (2 of Production), with the same error message always in an old job.
We found by replacing a Join processing by a Lookup that worked fine but the issue was on the next job.
![Sad :(](./images/smilies/icon_sad.gif)
All these jobs worked for a long time. We have too many jobs and Lookup could not be always implemented.
We are looking for ulimit parameters:
We have the same values for all 3 servers
from Unix Box
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) unlimited
coredump(blocks) 2097151
nofiles(descriptors) unlimited
but from SH -c "ulimit -a" (DataStage Administrator)
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 1572864
stack(kbytes) 4194304
memory(kbytes) unlimited
coredump(blocks) 0
nofiles(descriptors) unlimited
We can note we have 2 differences between the 2 commands. I do not know why ?
For information, in dsenv script we added months ago:
ulimit -d unlimited
ulimit -m unlimited
# ulimit -s unlimited
ulimit -f unlimited
Nothing else in DSPARAMS.
--------
This is <Server name>.apt file
{
node "node1"
{
fastname "<Server name>"
pools ""
resource disk "/DataStage/data/PX1/<project name>/DS" {pools ""}
resource scratchdisk "/DataStage/data/PX1/<project name>/SCRATCH" {pools ""}
}
... node "node6"
{
fastname "<Server name>"
pools ""
resource disk "/DataStage/data/PX6/<project name>/DS" {pools ""}
resource scratchdisk "/DataStage/data/PX6/<project name>/SCRATCH" {pools ""}
}
}
We have enough disk space, we verify File System during the run of the job, no significative evolution.
We have 6 File System, one by node, with more than 30Gb free by FS.
We check up also tmp directory: no problem of disk space.
Do you have some ideas ?
Thanks for your help.