Zombies UVSH
Moderators: chulett, rschirm, roy
Zombies UVSH
Hi ALL,
It appears that in some situations DataStage server processes becomes zoombies. It's quite difficult to reproduce this problem (Even with Ascential support) but maybe there's an answear somewhere ???!!!
It can happen on any platform but mostly on W2K.
If a job that has a large write cache enabled is stopped (VIA director) in the middle of fulshing its buffers (It can be seen VIA monitor that the pumping of records is stopped and rate is decreasing) - the job status is set to aborted instead of stopped - probably the UVSH becames a zoombie !
Or if a lookup is using a large cache enabled and it's stopped (VIA director) in the middle of building its RAM structure -
An ODBC that runs a heavy query and is stopped before a result set has been recieved
Or another variation is that the job has got a stopped status but it's UVSH is still running (Sometimes you can get log messages after the stopped entry) and evetually terminating but after a long delay. What happens is that one can think that the job has really stopped and for some good reason he issues a re-compilation (Even if there's a UVSH on air) and you can get very complicated situations like Syncronization errors and some times even successful jobs that doesn't really run etc.
Any help on this one
It appears that in some situations DataStage server processes becomes zoombies. It's quite difficult to reproduce this problem (Even with Ascential support) but maybe there's an answear somewhere ???!!!
It can happen on any platform but mostly on W2K.
If a job that has a large write cache enabled is stopped (VIA director) in the middle of fulshing its buffers (It can be seen VIA monitor that the pumping of records is stopped and rate is decreasing) - the job status is set to aborted instead of stopped - probably the UVSH becames a zoombie !
Or if a lookup is using a large cache enabled and it's stopped (VIA director) in the middle of building its RAM structure -
An ODBC that runs a heavy query and is stopped before a result set has been recieved
Or another variation is that the job has got a stopped status but it's UVSH is still running (Sometimes you can get log messages after the stopped entry) and evetually terminating but after a long delay. What happens is that one can think that the job has really stopped and for some good reason he issues a re-compilation (Even if there's a UVSH on air) and you can get very complicated situations like Syncronization errors and some times even successful jobs that doesn't really run etc.
Any help on this one
Last edited by ariear on Mon Dec 01, 2003 2:44 pm, edited 1 time in total.
Anytime you stop jobs via Director, or jobs abort, you should always make sure that the failed jobs don't leave threads out there.
Every job has a DSD.RUN thread you can consider as the "MAIN" thread, and all active stages within the job show up as DSD.StageRun. Any time jobs have an issue, you should check to make sure that there are no DSD.StageRun threads active. This is simple on unix, in fact, you can write a shell script to check "ps" to make sure all DSD.StageRun processes have a corresponding DSD.RUN.
The DSD.StageRun threads are your mysterious zombies. You can safely kill these threads. You should also recompile or clear the status of the jobs after doing so. Sometimes a zombie will interfere with the next run of the job. The job will startup and finish immediately, with no work done, and a successful state.
Code: Select all
$ ps -ef |grep phantom
radnet 11779 11776 0 09:02:02 ? 0:14 phantom DSD.StageRun loadupdIRCashIVDayAg. loadupdIRCashIVDayAg.xfm 3 0/0
radnet 1761 1760 2 08:56:27 ? 23:18 phantom DSD.RUN Batch::MasterControlIROrderDetail. 0 ParameterFile=/var/opt/dat
radnet 4992 18088 0 10:14:57 pts/14 0:00 grep phantom
radnet 11776 1761 0 09:02:02 ? 0:00 phantom DSD.RUN loadupdIRCashIVDayAg. 0/0 SourceFileDirectory=/var/opt/datastag
The DSD.StageRun threads are your mysterious zombies. You can safely kill these threads. You should also recompile or clear the status of the jobs after doing so. Sometimes a zombie will interfere with the next run of the job. The job will startup and finish immediately, with no work done, and a successful state.
Last edited by kcbland on Fri Nov 28, 2003 9:19 pm, edited 1 time in total.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 6
- Joined: Wed Feb 05, 2003 4:34 pm
- Location: Nicaragua
phantom
remember this, it is the phrase to get in to the special meetings. Say it to the doorman: "The phantom squawks at midnight"
--
DataStage Phantom Squawk
DataStage Phantom Squawk
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You're on Windoze, so if don't have a unix command interpreter like MKS toolkit you won't have a process status command like "ps". If you have NT Resource Kit, the closest equivalent will be pview.exe. This will allow you to see the individual processes in a job.
As for the swordfish reference, I had to do a google search. That reference is a little before my time. I always liked the Stooges better. Still don't know the phantom reference, though.
As far as best practices, there's nothing like standard recovery procedures. If your job dies because of one of the following, expect zombies to occasionally occur. For SERVER jobs, this has been my experience for 5+ years:
(1) Job database connection was dropped midstream
(2) Job database instance had peculiar error
(3) User STOPped job from Director in the middle of database query
(4) DBA killed job query in databsae
(5) DataStage project filesystem filled to capacity
(6) Job attempted a mathemetical expression where one of the equation components contained a NULL value
(7) Job (DS 5+) modified an argument value in a passed user-defined FUNCTION call without copying the argument to a local variable
(8) Job (DS 5+) used the BASIC STATUS() function, this one was weird, in worked in some versions and not others
So, compile a list of the types of job crashes that produced zombies. Then, develop a wysiwyg script/whatever to clean the process table of DSD.StageRun threads without DSD.RUN parents. Or just whack them by hand. This is difficult if your ETL application is turned over to a 24x7 operations center. Better off with the script approach.
As for the swordfish reference, I had to do a google search. That reference is a little before my time. I always liked the Stooges better. Still don't know the phantom reference, though.
As far as best practices, there's nothing like standard recovery procedures. If your job dies because of one of the following, expect zombies to occasionally occur. For SERVER jobs, this has been my experience for 5+ years:
(1) Job database connection was dropped midstream
(2) Job database instance had peculiar error
(3) User STOPped job from Director in the middle of database query
(4) DBA killed job query in databsae
(5) DataStage project filesystem filled to capacity
(6) Job attempted a mathemetical expression where one of the equation components contained a NULL value
(7) Job (DS 5+) modified an argument value in a passed user-defined FUNCTION call without copying the argument to a local variable
(8) Job (DS 5+) used the BASIC STATUS() function, this one was weird, in worked in some versions and not others
So, compile a list of the types of job crashes that produced zombies. Then, develop a wysiwyg script/whatever to clean the process table of DSD.StageRun threads without DSD.RUN parents. Or just whack them by hand. This is difficult if your ETL application is turned over to a 24x7 operations center. Better off with the script approach.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The Marx Brothers may have started before the Three Stooges (who were originally a vaudeville act - did you know that?), but they were still going - and therefore contemporaries, and dare I suggest, just a tad more cerebral in their humour? But that's starting off a low base!kcbland wrote:
As for the swordfish reference, I had to do a google search. That reference is a little before my time. I always liked the Stooges better. Still don't know the phantom reference, though.
Earlier this year a US project manager of a large project in India showed three Stooges films (off DVD) one lunchtime to the great bemusement of all present.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
DSD.RUN invokes DSD.StageRun withkcbland wrote:Still don't know the phantom reference, though.
Code: Select all
PHANTOM SQUAWK DSD.StageRun { command_line_options }
Code: Select all
COPY FROM file1 TO file2 ALL SQUAWK
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
O.K.
Is there any sense in writing a deamon that checks the universe process table for orphans DSD.StageRun ? A check after each job that terminates under Batch/Sequencer control ? If I understand correctly this architecture a batch job will appear only as DSD.RUN.
Is the dslictool clean_lic -a command applicable to PHANTOM processes also ?
Thanks Guys !
Is there any sense in writing a deamon that checks the universe process table for orphans DSD.StageRun ? A check after each job that terminates under Batch/Sequencer control ? If I understand correctly this architecture a batch job will appear only as DSD.RUN.
Is the dslictool clean_lic -a command applicable to PHANTOM processes also ?
Thanks Guys !
Re: phantom
Thanks for the info Ray, but this is the reference I don't get. Care to clue me in?PhantomSquawk wrote:remember this, it is the phrase to get in to the special meetings. Say it to the doorman: "The phantom squawks at midnight"
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: