Unexpected Job Failure

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
saur_classic
Participant
Posts: 9
Joined: Tue Aug 01, 2006 9:43 pm

Unexpected Job Failure

Post by saur_classic »

Hi,

I am having a very tough time with Datastage.

A Datastage job is made to load from Flat file to a teradata table. The Job runs fine for a todays file and then i recomplie it and try to run the same job with yesterdays file but the job fails. The Metadata for both the files is same.

the job have one Sequential file, one transformer and one teradata enterprise stage.

The Exact Series of Error goes like this :-

1. STG_TSDP_TSM_AUDIT,0: Failure during execution of operator logic.
2. STG_TSDP_TSM_AUDIT,0: Input 0 consumed 0 records.
3. STG_TSDP_TSM_AUDIT,0: Fatal Error: setupDataTransfer: column descriptor count does not match field count
4. node_node2a: Player 3 terminated unexpectedly.
5. main_program: Unexpected exit status 1
6. TF_TSMAudit,0: Failure during execution of operator logic.
7. TF_TSMAudit,0: Input 0 consumed 190 records.
8. TF_TSMAudit,0: Output 0 produced 190 records.
9. TF_TSMAudit,0: Fatal Error: Unable to allocate communication resources
10. node_node2a: Player 2 terminated unexpectedly.
11. main_program: Unexpected exit status 1
12. STG_TSDP_TSM_AUDIT: Teradata write operator was run from a step which did not run to completion.The recovery which will take place depends on how far the write progressed. Please see the messages which follow for direction on manual recovery and cleanup
13. TeraWrite died before any recoverable work was completed. Automatic cleanup will be performed. The following tables will be deleted:
ORCH_WORK_5b1a8c06
Database.ERR_1528466438_1
Database.ERR_1528466438_2
14.STG_TSDP_TSM_AUDIT: TeraGenericQuery Error: Request Close failed: Error Code = 305 Session return code = 305 , DBC return code = 305 DBC message: CLI2: NOREQUEST(305): Specified request does not exist.
15. main_program: Step execution finished with status = FAILED.

I have tried almost everything but still i am not able to find the exact cause of error.

Can anyone help me out in this?

Regards,
Saur
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

does the metadata for table change from yesterday??
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

saur_classic, although you stated that "the Metadata for both files is the same", the error message from DataStage is
...column descriptor count does not match field count...
So there is a difference between the format of the two files.
saur_classic
Participant
Posts: 9
Joined: Tue Aug 01, 2006 9:43 pm

Unexpected job failure

Post by saur_classic »

ArndW wrote:saur_classic, although you stated that "the Metadata for both files is the same", the error message from DataStage is
...column descriptor count does not match field count...
So there is a difference between the format of the two files.

Dear keshav0307,ArndW

I have Manually checked the Metadata of the first 10 records and it looks pretty much the same.

I also tried loading the whole file in pieces and as strange as it may sound but the Job ran Successfully for the First 148 records. then i inserted another 50 records and recompiled,ran the Job but this time it failed.

Does this mean that metadata is changing for records between 148 to 200 ?

Is there some way to catch hold of these kind of records.

because if the metadata of the source file is changing in between then i have to Grab Neck of the people responsible of it.

i have wasted 4 days to figure out the exact cause.

Please suggest.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Re: Unexpected job failure

Post by ArndW »

saur_classic wrote:...Does this mean that metadata is changing for records between 148 to 200 ?...
Probably. If you change your job to run on 1 node or in sequential mode, and output to a peek stage instead of to Teradata you should find the row that is causing your problems.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Doesn't a reject option available in Seq file stage? I there there is one in CFF stage.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
saur_classic
Participant
Posts: 9
Joined: Tue Aug 01, 2006 9:43 pm

Re: Unexpected job failure

Post by saur_classic »

ArndW wrote:
saur_classic wrote:...Does this mean that metadata is changing for records between 148 to 200 ?...
Probably. If you change your job to run on 1 node or in sequential mode, and output to a peek stage instead of to Teradata you should find the row that is causing your problems.
Hi ArndW,

The Job ran Succesfully once for the yesterdays file.

After few hours i recomplied the job and tried running the job again but this time it failed with the same errors.

Although it doesn't make sense but how is it possible that without any changes to Job or Source file the job runs once but fails later?

Is it something to do with project Settings/Enviornment variables etc?

Regards,
Saur

Regards,
Saurabh
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Saur_classic - The job should run the same way with the same input data each time. If you are 100% certain that the source file is the same and the Teradata table has the same state then you are right in looking for the problem in DataStage. Have you tried a 1-node configuration or sequential execution and writing only to a sequential file or peek stage to see if the error is reproduceable?
MaheshKumar Sugunaraj
Participant
Posts: 84
Joined: Thu Dec 04, 2003 9:55 pm

Re: Unexpected job failure

Post by MaheshKumar Sugunaraj »

The Job ran Succesfully once for the yesterdays file. After few hours i recomplied the job and tried running the job again but this time it failed with the same errors.

Hi Saurabh

As per the above statement, Once the job has run successfully with data why do have to recomplie the job again, Unless u do some changes recomplie is not necessary.

The issue would be that the data which ur trying to load does not match with what the teradata table defn. , U might have some kind of conversion happening - U need to check if trying to insert a different datatype to what is has been defined, is so the u could use CAST function.

Hope the above is useful.

With Regards
M
saur_classic
Participant
Posts: 9
Joined: Tue Aug 01, 2006 9:43 pm

Re: Unexpected job failure

Post by saur_classic »


The issue would be that the data which ur trying to load does not match with what the teradata table defn. , U might have some kind of conversion happening - U need to check if trying to insert a different datatype to what is has been defined, is so the u could use CAST function.


Hi All,

Yes i am using some conversion like changing String to Timestamp, Null Handling etc but that's not the cause of the Problem.

The main Question is Why the Job is running successfully once or twice but failing other times?

I have not touched designer or Source file. Also the Job is not recompiled between different runs. 1st run it ran successfully 2nd run it failed then I reset the Job from Director and ran again.
3rd time it again failed with the same errors and then again it was reset.
4th time it Ran Successfully.

I m using 4 node configuration file and also tried populating the table in sequential mode.

Am i missing some settings for teradata/Datastage which i should have done?
Having Posted this problem on such a good Forum and no Concrete Solution amazes me.

Is there anyone who can make sense to this behaviour of Datastage?

Regards,








[/b]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

saur_classic. I'm amazed that you are amazed that nobody has answered your question. As volunteers the members here mainly suggest paths to try to analyze problems. In addition, your job as a developer also includes testing different settings to see if you can find the cause.
It might be less amazing if you were to actually read some of the responses, including:
ArndW wrote:...Have you tried a 1-node configuration or sequential execution and writing only to a sequential file or peek stage to see if the error is reproduceable...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you using TENACITY? If not, an process that can not get a Teradata session will abort. It may be this that you are seeing, and would explain the apparent randomness of the behaviour - sometimes processes can all get sessions, occasionally one can't. TENACITY will allow such sessions to wait.

(Ask your Teradata DBA for more information about TENACITY.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saur_classic
Participant
Posts: 9
Joined: Tue Aug 01, 2006 9:43 pm

Post by saur_classic »

ray.wurlod wrote:Are you using TENACITY? If not, an process that can not get a Teradata session will abort. It may be this that you are seeing, and would explain the apparent randomness of the behaviour - sometimes ...

Thanks Arndw,ray,

Arndw, I have tried writing to Peek stage with 1 node configuration file and the job ends OK. it never fails.

Ray, The Parameters we are putting in Teradata enterprise stage are requestedsessions=8,sessionsperplayer=2,synctimeout=160. On a Four node configuration file.

I am not sure of TENACITY. How to set it for TERADATA Enterprise Stage?

regards
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I believe the timeout variable represents Tenacity.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

saur_classic wrote:...I have tried writing to Peek stage with 1 node configuration file and the job ends OK. it never fails...
It does seem to be related to the Teradata output. If you change back to your normal configuration file and run the job a couple of times and it still doesn't fail you can be fairly certain of the stage causing the problem.
Post Reply