Problems with DS Calls
Moderators: chulett, rschirm, roy
Problems with DS Calls
Hi All,
We have facing a couple of problems while trying to call some DataStage jobs or sequencers from Control M using command files.
1.) While trying to call a job from control M using a command file, every job is set to reset first and is then run. Now sometimes the reset dooesnt finish and keeps running. Now a wait of 12 seconds is given between the reset and the run time. However due to the problme stated above, my job eventually fails with a DSBadState = 2. Any idea why this happens. Basically at this point i have to stop the reset and rerun the job. The next time the job run ok.
2.) We faced this as a one off problem but i am still trying to figure out the root cause. There has been an occurence wherein a sequencer trying to call some jobs suddenly crashes with the following error.
Controller problem: Error calling DSRunJob(JobA), code=-14
[Timed out while waiting for an event]
Any idea why this has happened?
ANy help would be great.
Cheers,
We have facing a couple of problems while trying to call some DataStage jobs or sequencers from Control M using command files.
1.) While trying to call a job from control M using a command file, every job is set to reset first and is then run. Now sometimes the reset dooesnt finish and keeps running. Now a wait of 12 seconds is given between the reset and the run time. However due to the problme stated above, my job eventually fails with a DSBadState = 2. Any idea why this happens. Basically at this point i have to stop the reset and rerun the job. The next time the job run ok.
2.) We faced this as a one off problem but i am still trying to figure out the root cause. There has been an occurence wherein a sequencer trying to call some jobs suddenly crashes with the following error.
Controller problem: Error calling DSRunJob(JobA), code=-14
[Timed out while waiting for an event]
Any idea why this has happened?
ANy help would be great.
Cheers,
-
- Premium Member
- Posts: 385
- Joined: Tue Oct 07, 2003 4:55 am
It is my experience that reset a job takes 30 seconds if the system is overloaded. A sequence should handle it for you. I expect that you need to have an otherwise link in your sequence. Not all situations are trapped in a sequence if all you have is a OK or successful link and an error link then the sequence ends if you have a warning. You need an error and an otherwise link or a OK and errors are the otherwise link.
Mamu Kim
I've had the same problem this morning. It might be because your are using the "reset if required" option. I had inadvertedly used it in my sequence and ran into the same issue. Try taking it out. It seemed to have solved mine, but im not a 100% sure, my jobs are still running.
Also are you by any chance making a heavy usage of hash files in any of those jobs ?? there is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
Also are you by any chance making a heavy usage of hash files in any of those jobs ?? there is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Can you please elaborate, ideally providing a reference to the support case number? Or is it just that you didn't set the T30FILE tunable large enough?kiran_kom wrote:There is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Umm no...This is a known issue with DS windows (well known only to Ascential folks I guess). We have a jobs that make heavy usage of hash files and there are multiple instances of the same job running.ray.wurlod wrote:Can you please elaborate, ideally providing a reference to the support case number? Or is it just that you didn't set the T30FILE tunable large enough?kiran_kom wrote:There is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
this sometimes causes DS to crash. the error manifests itself as a "User limit reached" message in the &PH& directory. Ascential is working on a fix for it.
Yesterday when I was having the above mentioned problem, I also ran into this problem with hash files. I was
I dont think the above problem is related to this hash file issue. Because just now (5 mins back) my jobs failed with the same controller problem (and not they didnt have "reset if required" turned on.) I didnt find any of the "User limit reached" messages in &PH&, so I guess this is a seperate issue.
the support case number is 385112*WESray.wurlod wrote:Can you please elaborate, ideally providing a reference to the support case number? Or is it just that you didn't set the T30FILE tunable large enough?kiran_kom wrote:There is a bug in DS windows that causes it to crash if you are using lots of hash file stages at the same time.
Re: Problems with DS Calls
Did you ever resolve #2? We have the same problem occasionally and Ascential is pointing us to the shared memory parameters on our Solaris box. If you look on page 3-4 of the installation guide, they list the minimum recommended values.Viswanath wrote:Hi All,
2.) We faced this as a one off problem but i am still trying to figure out the root cause. There has been an occurence wherein a sequencer trying to call some jobs suddenly crashes with the following error.
Controller problem: Error calling DSRunJob(JobA), code=-14
[Timed out while waiting for an event]
You can see those values on a Solaris system by running /etc/sysdef and grep out the parm you're looking for. E.g. /etc/sysdef | grep SHMMNI.
I was told that SHMMNI was probably the culprit on our system. I'll let you know if it helps after we've made the change.
Re: Problems with DS Calls
A little bit off topic, but my hat off to the guesses of Ascential support. And that's meant ironically.
They'd better build some more tracing in DataStage as e.g. is the case in Oracle. Something in Oracle goes wrong, you make an iTar and 99% of the times you get a real fix very soon.
Ogmios
They'd better build some more tracing in DataStage as e.g. is the case in Oracle. Something in Oracle goes wrong, you make an iTar and 99% of the times you get a real fix very soon.
Ogmios
Re: Problems with DS Calls
smohamme wrote:rdy wrote:Did you ever resolve #2? We have the same problem occasionally and Ascential is pointing us to the shared memory parameters on our Solaris box. If you look on page 3-4 of the installation guide, they list the minimum recommended values.Viswanath wrote:Hi All,
2.) We faced this as a one off problem but i am still trying to figure out the root cause. There has been an occurence wherein a sequencer trying to call some jobs suddenly crashes with the following error.
Controller problem: Error calling DSRunJob(JobA), code=-14
[Timed out while waiting for an event]
You can see those values on a Solaris system by running /etc/sysdef and grep out the parm you're looking for. E.g. /etc/sysdef | grep SHMMNI.
I was told that SHMMNI was probably the culprit on our system. I'll let you know if it helps after we've made the change.
Hello
I was wondering whether you fixed issue #2. We have been getting this for a week and Ascential has not been able to solve it. Obviously we are using Datastage 6.x running on Solaris. I will try your suggestion too and see what happens? Also what should the SHMMNI be set at?
Thank you!
smohamme
Re: Problems with DS Calls
At one site where they ran DataStage on Solaris we fixed this problem by changing the order of some of the shared libraries in the dsenv file on recommendation of Ascential... but only after we send them our truss file of the job in action.
I don't anymore which shared libraries and the order of it.
Ogmios
I don't anymore which shared libraries and the order of it.
Ogmios
Re: Problems with DS Calls
ogmios wrote:At one site where they ran DataStage on Solaris we fixed this problem by changing the order of some of the shared libraries in the dsenv file on recommendation of Ascential... but only after we send them our truss file of the job in action.
I don't anymore which shared libraries and the order of it.
Ogmios
smohamme wrote: Thank you! If you can, please provide more details like the shared libraries and their order.
smohamme
Re: Problems with DS Calls
We have added the following library path in the dsenv file:smohamme wrote:ogmios wrote:At one site where they ran DataStage on Solaris we fixed this problem by changing the order of some of the shared libraries in the dsenv file on recommendation of Ascential... but only after we send them our truss file of the job in action.
I don't anymore which shared libraries and the order of it.
Ogmiossmohamme wrote: Thank you! If you can, please provide more details like the shared libraries and their order.
LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
and we have changed the DSRunJob.B file since it was corrupt. This did not fix our "Time Out..." issue. Ascential also informed us that the Production ETL box is over utlized. Since we instantiate our jobs 20 times, we reduced it to instantiate 12 times and after these were complete, start another run with the other 8 times. This has worked, although it does not explain why on our Dev box (which is smaller in processing power/memory) this works fine with the 20 instantiations. The difference between the 2 in the uvconfig file is:
(production) (development)
1. MFILES 200 MFILES 50
2. T30FILE 2000 T30FILE 500
3. UVSYNC 0 UVSYNC 1
4. 64BIT_FILES 1 64BIT_FILES 0
smohamme