waitForWriteSignal(): Premature EOF on node

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
js103755
Participant
Posts: 7
Joined: Tue Apr 14, 2015 4:16 am
Location: USA

waitForWriteSignal(): Premature EOF on node

Post by js103755 »

Hello everyone,

I've been struggling with this issue since last couple of days. I've a job which is combining six datasets using funnel stage and the output of funnel stage is given as input to aggregator stage. The aggregator is grouping the different columns based on one id column and writing the output to a dataset.

The job has been running fine from last couple of months but in last migration its failing with below error log,

AGG_REV,0: Failure during execution of operator logic.
AGG_REV,0: Input 0 consumed 315815 records.
AGG_REV,0: Output 0 produced 2729 records.
AGG_REV,0: Fatal Error: Unable to allocate communication resources
STLMNT_UPD,0: Failure during execution of operator logic.
STLMNT_UPD,0: Input 0 consumed 0 records.
STLMNT_UPD,0: Output 0 produced 0 records.
STLMNT_UPD,0: Fatal Error: waitForWriteSignal(): Premature EOF on node fanucci Bad file descriptor
node_node1: Player 9 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node1), player 9 - Unexpected termination by Unix signal 11(SIGSEGV).
node_node1: Player 10 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node1), player 10 - Unexpected exit status 1.
APT_PMsectionLeader(1, node1), player 8 - Unexpected exit status 1.
APT_PMsectionLeader(2, node2), player 10 - Unexpected exit status 1.
main_program: Step execution finished with status = FAILED.


AGG_REV is aggregator stage name and STLMNT_UPD is target dataset name.

The job runs fine in dev, but its failing in QA.
I've looked through the forums but didn't get any solution. I checked in the target directory there is no descriptor file, but still the error says bad file descriptor. help please.
- Jay
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

what happened before these errors?

Was there a warning before your very first log file line here? Any previous errors at all? It looks like this section of the log is actually after your real problem.
js103755
Participant
Posts: 7
Joined: Tue Apr 14, 2015 4:16 am
Location: USA

Post by js103755 »

I didn't get any other error/warning messages, before the first line. I got the usual messages, see below.

Code: Select all

Parallel job initiated
Parallel job default NLS map UTF-8, default locale OFF
main_program: IBM InfoSphere DataStage Enterprise Edition 11.5.0.7555 
Copyright (c) 2001, 2005-2015 IBM Corporation. All rights reserved
main_program: The timezone environment variable TZ is currently not set in your environment which can lead to significant performance degradation. It is recommended that you set TZ=:/etc/localtime in your environment.
main_program: conductor uname: -s=Linux; -r=2.6.32-573.22.1.el6.x86_64; -v=#1 SMP Thu Mar 17 03:23:39 EDT 2016; -n=fanucci; -m=x86_64
main_program: orchgeneral: loaded
orchsort: loaded
orchstats: loaded
main_program: APT_SortedGroup2Operator::describeOperator nkeys: 1
main_program: APT configuration file: /opt/Infosphere/InfoServer/Server/Configurations/default.apt
{
	node "node1"
	{
		fastname "fanucci"
		pools ""
		resource disk "/opt/tempdata/datasets1" {pools ""}
		resource disk "/opt/tempdata/datasets2" {pools ""}
		resource disk "/opt/tempdata/datasets3" {pools ""}
		resource scratchdisk "/opt/tempdata/scratch1" {pools ""}
		resource scratchdisk "/opt/tempdata/scratch2" {pools ""}
		resource scratchdisk "/opt/tempdata/scratch3" {pools ""}
	}
	node "node2"
	{
		fastname "fanucci"
		pools ""
		resource disk "/opt/tempdata/datasets4" {pools ""}
		resource disk "/opt/tempdata/datasets5" {pools ""}
		resource disk "/opt/tempdata/datasets6" {pools ""}
		resource scratchdisk "/opt/tempdata/scratch4" {pools ""}
		resource scratchdisk "/opt/tempdata/scratch5" {pools ""}
		resource scratchdisk "/opt/tempdata/scratch6" {pools ""}
	}
}
- Jay
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

is this load bigger than you had been running?
Maybe try setting the agg stage to sort mode?
js103755
Participant
Posts: 7
Joined: Tue Apr 14, 2015 4:16 am
Location: USA

Post by js103755 »

The load is the usual what we've been running. Also, the aggregator was set to sort mode only.
- Jay
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

You've told the aggregator to expect sorted data. So have you partitioned and sorted the data by the keys that the aggregator is using? Have you set up your funnel stage to preserve the sort order if you have indeed sorted the data upstream?

Mike
js103755
Participant
Posts: 7
Joined: Tue Apr 14, 2015 4:16 am
Location: USA

Post by js103755 »

I'm sorting the data in the input of aggregator stage, before that it's as it is. In the iput of aggregator stage I've used hash partition and sorting in ascending order. In the input of target dataset I've used same partition to collect the data.

It should work in this way as well right?
- Jay
js103755
Participant
Posts: 7
Joined: Tue Apr 14, 2015 4:16 am
Location: USA

Post by js103755 »

Its working now, I kept one column at a time in the aggregation and tested the job. After testing for all columns, I found out that there are two columns which are causing problem when the preserve type is set to true. one has type bigint other has decimal. When I'm setting the preserve type to false and giving the type in the stage then the job is running without any issue. This is very strange, in the incoming data there is no way the type would change.

I was browsing IBM support page and found the exact issue:

JR53787: PARALLEL JOB FAILS WHEN AGGREGATOR STAGE USES BIGINT OR DOUBLE DATA TYPES WITH PRESERVE TYPE PROPERTY SET AT TRUE.
http://www-01.ibm.com/support/docview.w ... wg1JR53787

There is a patch install for this, but our DataStage install is the latest version and I assume they would have included these patch fixes in the latest version of install. Anyways, I will analyse the input data and try to find out if there is any issue with the data which might be causing the preserve type to fail.
Thanks all for your inputs.
- Jay
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

js103755 wrote:There is a patch install for this, but our DataStage install is the latest version and I assume they would have included these patch fixes in the latest version of install.
I would make no such assumption. :wink:

Verify.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sjfearnside
Premium Member
Premium Member
Posts: 278
Joined: Wed Oct 03, 2007 8:45 am

Post by sjfearnside »

Please let us know if the patch was the solution for your issue or something else?

Thanks
Post Reply