Unexpected Job Failure

saur_classic · Post by **saur_classic** » Sun Feb 04, 2007 5:54 am

Hi,

I am having a very tough time with Datastage.

A Datastage job is made to load from Flat file to a teradata table. The Job runs fine for a todays file and then i recomplie it and try to run the same job with yesterdays file but the job fails. The Metadata for both the files is same.

the job have one Sequential file, one transformer and one teradata enterprise stage.

The Exact Series of Error goes like this :-

1. STG_TSDP_TSM_AUDIT,0: Failure during execution of operator logic.
2. STG_TSDP_TSM_AUDIT,0: Input 0 consumed 0 records.
3. STG_TSDP_TSM_AUDIT,0: Fatal Error: setupDataTransfer: column descriptor count does not match field count
4. node_node2a: Player 3 terminated unexpectedly.
5. main_program: Unexpected exit status 1
6. TF_TSMAudit,0: Failure during execution of operator logic.
7. TF_TSMAudit,0: Input 0 consumed 190 records.
8. TF_TSMAudit,0: Output 0 produced 190 records.
9. TF_TSMAudit,0: Fatal Error: Unable to allocate communication resources
10. node_node2a: Player 2 terminated unexpectedly.
11. main_program: Unexpected exit status 1
12. STG_TSDP_TSM_AUDIT: Teradata write operator was run from a step which did not run to completion.The recovery which will take place depends on how far the write progressed. Please see the messages which follow for direction on manual recovery and cleanup
13. TeraWrite died before any recoverable work was completed. Automatic cleanup will be performed. The following tables will be deleted:
ORCH_WORK_5b1a8c06
Database.ERR_1528466438_1
Database.ERR_1528466438_2
14.STG_TSDP_TSM_AUDIT: TeraGenericQuery Error: Request Close failed: Error Code = 305 Session return code = 305 , DBC return code = 305 DBC message: CLI2: NOREQUEST(305): Specified request does not exist.
15. main_program: Step execution finished with status = FAILED.

I have tried almost everything but still i am not able to find the exact cause of error.

Can anyone help me out in this?

Regards,
Saur

keshav0307 · Post by **keshav0307** » Sun Feb 04, 2007 6:01 am

does the metadata for table change from yesterday??

ArndW · Post by **ArndW** » Sun Feb 04, 2007 6:05 am

saur_classic, although you stated that "the Metadata for both files is the same", the error message from DataStage is

...column descriptor count does not match field count...

So there is a difference between the format of the two files.

saur_classic · Post by **saur_classic** » Sun Feb 04, 2007 6:32 am

ArndW wrote:saur_classic, although you stated that "the Metadata for both files is the same", the error message from DataStage is
...column descriptor count does not match field count...
So there is a difference between the format of the two files.

Dear keshav0307,ArndW

I have Manually checked the Metadata of the first 10 records and it looks pretty much the same.

I also tried loading the whole file in pieces and as strange as it may sound but the Job ran Successfully for the First 148 records. then i inserted another 50 records and recompiled,ran the Job but this time it failed.

Does this mean that metadata is changing for records between 148 to 200 ?

Is there some way to catch hold of these kind of records.

because if the metadata of the source file is changing in between then i have to Grab Neck of the people responsible of it.

i have wasted 4 days to figure out the exact cause.

Please suggest.

ArndW · Post by **ArndW** » Sun Feb 04, 2007 7:54 am

saur_classic wrote:...Does this mean that metadata is changing for records between 148 to 200 ?...

Probably. If you change your job to run on 1 node or in sequential mode, and output to a peek stage instead of to Teradata you should find the row that is causing your problems.

kumar_s · Post by **kumar_s** » Sun Feb 04, 2007 6:37 pm

Doesn't a reject option available in Seq file stage? I there there is one in CFF stage.

saur_classic · Post by **saur_classic** » Mon Feb 05, 2007 1:11 am

ArndW wrote:
saur_classic wrote:...Does this mean that metadata is changing for records between 148 to 200 ?...
Probably. If you change your job to run on 1 node or in sequential mode, and output to a peek stage instead of to Teradata you should find the row that is causing your problems.

Hi ArndW,

The Job ran Succesfully once for the yesterdays file.

After few hours i recomplied the job and tried running the job again but this time it failed with the same errors.

Although it doesn't make sense but how is it possible that without any changes to Job or Source file the job runs once but fails later?

Is it something to do with project Settings/Enviornment variables etc?

Regards,
Saur

Regards,
Saurabh

ArndW · Post by **ArndW** » Mon Feb 05, 2007 1:54 am

Saur_classic - The job should run the same way with the same input data each time. If you are 100% certain that the source file is the same and the Teradata table has the same state then you are right in looking for the problem in DataStage. Have you tried a 1-node configuration or sequential execution and writing only to a sequential file or peek stage to see if the error is reproduceable?

MaheshKumar Sugunaraj · Mon Feb 05, 2007 9:19 am

The Job ran Succesfully once for the yesterdays file. After few hours i recomplied the job and tried running the job again but this time it failed with the same errors.

Hi Saurabh

As per the above statement, Once the job has run successfully with data why do have to recomplie the job again, Unless u do some changes recomplie is not necessary.

The issue would be that the data which ur trying to load does not match with what the teradata table defn. , U might have some kind of conversion happening - U need to check if trying to insert a different datatype to what is has been defined, is so the u could use CAST function.

Hope the above is useful.

With Regards
M

saur_classic · Post by **saur_classic** » Mon Feb 05, 2007 11:49 pm

The issue would be that the data which ur trying to load does not match with what the teradata table defn. , U might have some kind of conversion happening - U need to check if trying to insert a different datatype to what is has been defined, is so the u could use CAST function.

Hi All,

Yes i am using some conversion like changing String to Timestamp, Null Handling etc but that's not the cause of the Problem.

The main Question is Why the Job is running successfully once or twice but failing other times?

I have not touched designer or Source file. Also the Job is not recompiled between different runs. 1st run it ran successfully 2nd run it failed then I reset the Job from Director and ran again.
3rd time it again failed with the same errors and then again it was reset.
4th time it Ran Successfully.

I m using 4 node configuration file and also tried populating the table in sequential mode.

Am i missing some settings for teradata/Datastage which i should have done?
Having Posted this problem on such a good Forum and no Concrete Solution amazes me.

Is there anyone who can make sense to this behaviour of Datastage?

Regards,

[/b]

ArndW · Post by **ArndW** » Tue Feb 06, 2007 2:23 am

saur_classic. I'm amazed that you are amazed that nobody has answered your question. As volunteers the members here mainly suggest paths to try to analyze problems. In addition, your job as a developer also includes testing different settings to see if you can find the cause.
It might be less amazing if you were to actually read some of the responses, including:

ArndW wrote:...Have you tried a 1-node configuration or sequential execution and writing only to a sequential file or peek stage to see if the error is reproduceable...

ray.wurlod · Post by **ray.wurlod** » Tue Feb 06, 2007 3:55 pm

Are you using TENACITY? If not, an process that can not get a Teradata session will abort. It may be this that you are seeing, and would explain the apparent randomness of the behaviour - sometimes processes can all get sessions, occasionally one can't. TENACITY will allow such sessions to wait.

(Ask your Teradata DBA for more information about TENACITY.)

saur_classic · Post by **saur_classic** » Tue Feb 06, 2007 5:48 pm

ray.wurlod wrote:Are you using TENACITY? If not, an process that can not get a Teradata session will abort. It may be this that you are seeing, and would explain the apparent randomness of the behaviour - sometimes ...

Thanks Arndw,ray,

Arndw, I have tried writing to Peek stage with 1 node configuration file and the job ends OK. it never fails.

Ray, The Parameters we are putting in Teradata enterprise stage are requestedsessions=8,sessionsperplayer=2,synctimeout=160. On a Four node configuration file.

I am not sure of TENACITY. How to set it for TERADATA Enterprise Stage?

regards

DSguru2B · Post by **DSguru2B** » Tue Feb 06, 2007 6:22 pm

I believe the timeout variable represents Tenacity.

ArndW · Post by **ArndW** » Wed Feb 07, 2007 3:01 am

saur_classic wrote:...I have tried writing to Peek stage with 1 node configuration file and the job ends OK. it never fails...

It does seem to be related to the Teradata output. If you change back to your normal configuration file and run the job a couple of times and it still doesn't fail you can be fairly certain of the stage causing the problem.

DSXchange

Unexpected Job Failure

Unexpected Job Failure

Unexpected job failure

Re: Unexpected job failure

Re: Unexpected job failure

Re: Unexpected job failure

Re: Unexpected job failure