Zombies UVSH

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Zombies UVSH

Post by ariear »

Hi ALL,
It appears that in some situations DataStage server processes becomes zoombies. It's quite difficult to reproduce this problem (Even with Ascential support) but maybe there's an answear somewhere ???!!!
It can happen on any platform but mostly on W2K.
If a job that has a large write cache enabled is stopped (VIA director) in the middle of fulshing its buffers (It can be seen VIA monitor that the pumping of records is stopped and rate is decreasing) - the job status is set to aborted instead of stopped - probably the UVSH becames a zoombie ! :evil:
Or if a lookup is using a large cache enabled and it's stopped (VIA director) in the middle of building its RAM structure - :evil:
An ODBC that runs a heavy query and is stopped before a result set has been recieved :evil:

Or another variation is that the job has got a stopped status but it's UVSH is still running (Sometimes you can get log messages after the stopped entry) and evetually terminating but after a long delay. What happens is that one can think that the job has really stopped and for some good reason he issues a re-compilation (Even if there's a UVSH on air) and you can get very complicated situations like Syncronization errors and some times even successful jobs that doesn't really run etc. :evil:

Any help on this one :?:
Last edited by ariear on Mon Dec 01, 2003 2:44 pm, edited 1 time in total.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Anytime you stop jobs via Director, or jobs abort, you should always make sure that the failed jobs don't leave threads out there.

Code: Select all

$ ps -ef |grep phantom
  radnet 11779 11776  0 09:02:02 ?        0:14 phantom DSD.StageRun loadupdIRCashIVDayAg. loadupdIRCashIVDayAg.xfm 3 0/0
  radnet  1761  1760  2 08:56:27 ?       23:18 phantom DSD.RUN Batch::MasterControlIROrderDetail. 0 ParameterFile=/var/opt/dat
  radnet  4992 18088  0 10:14:57 pts/14   0:00 grep phantom
  radnet 11776  1761  0 09:02:02 ?        0:00 phantom DSD.RUN loadupdIRCashIVDayAg. 0/0 SourceFileDirectory=/var/opt/datastag
Every job has a DSD.RUN thread you can consider as the "MAIN" thread, and all active stages within the job show up as DSD.StageRun. Any time jobs have an issue, you should check to make sure that there are no DSD.StageRun threads active. This is simple on unix, in fact, you can write a shell script to check "ps" to make sure all DSD.StageRun processes have a corresponding DSD.RUN.

The DSD.StageRun threads are your mysterious zombies. You can safely kill these threads. You should also recompile or clear the status of the jobs after doing so. Sometimes a zombie will interfere with the next run of the job. The job will startup and finish immediately, with no work done, and a successful state.
Last edited by kcbland on Fri Nov 28, 2003 9:19 pm, edited 1 time in total.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
PhantomSquawk
Participant
Posts: 6
Joined: Wed Feb 05, 2003 4:34 pm
Location: Nicaragua

phantom

Post by PhantomSquawk »

remember this, it is the phrase to get in to the special meetings. Say it to the doorman: "The phantom squawks at midnight"
--
DataStage Phantom Squawk
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The password is "swordfish".

The password is ALWAYS "swordfish".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

Thanks for the confimation Ken, (And for the anigmatic passwords).
Any good practices except NOT STOPPING JOBS USING DIRECTOR :?: OR CAREFULLY CHECK AFTER INEVITABLE STOPS :!:
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You're on Windoze, so if don't have a unix command interpreter like MKS toolkit you won't have a process status command like "ps". If you have NT Resource Kit, the closest equivalent will be pview.exe. This will allow you to see the individual processes in a job.

As for the swordfish reference, I had to do a google search. That reference is a little before my time. I always liked the Stooges better. Still don't know the phantom reference, though.

As far as best practices, there's nothing like standard recovery procedures. If your job dies because of one of the following, expect zombies to occasionally occur. For SERVER jobs, this has been my experience for 5+ years:

(1) Job database connection was dropped midstream
(2) Job database instance had peculiar error
(3) User STOPped job from Director in the middle of database query
(4) DBA killed job query in databsae
(5) DataStage project filesystem filled to capacity
(6) Job attempted a mathemetical expression where one of the equation components contained a NULL value
(7) Job (DS 5+) modified an argument value in a passed user-defined FUNCTION call without copying the argument to a local variable
(8) Job (DS 5+) used the BASIC STATUS() function, this one was weird, in worked in some versions and not others

So, compile a list of the types of job crashes that produced zombies. Then, develop a wysiwyg script/whatever to clean the process table of DSD.StageRun threads without DSD.RUN parents. Or just whack them by hand. This is difficult if your ETL application is turned over to a 24x7 operations center. Better off with the script approach.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kcbland wrote:
As for the swordfish reference, I had to do a google search. That reference is a little before my time. I always liked the Stooges better. Still don't know the phantom reference, though.
The Marx Brothers may have started before the Three Stooges (who were originally a vaudeville act - did you know that?), but they were still going - and therefore contemporaries, and dare I suggest, just a tad more cerebral in their humour? But that's starting off a low base!

Earlier this year a US project manager of a large project in India showed three Stooges films (off DVD) one lunchtime to the great bemusement of all present.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kcbland wrote:Still don't know the phantom reference, though.
DSD.RUN invokes DSD.StageRun with

Code: Select all

PHANTOM SQUAWK DSD.StageRun { command_line_options }
SQUAWK is a synonym in the VOC for REPORTING. It used to cause gales of laughter as the preferred option for COPY for Prime INFORMATION.

Code: Select all

COPY FROM file1 TO file2 ALL SQUAWK
With the PHANTOM verb, and NOTIFY ON in effect, it causes (forces?) the child process to notify the parent, which is where the [Done] message in the &PH& record for the job comes from. Now you know.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

O.K.
Is there any sense in writing a deamon that checks the universe process table for orphans DSD.StageRun ? A check after each job that terminates under Batch/Sequencer control ? If I understand correctly this architecture a batch job will appear only as DSD.RUN.
Is the dslictool clean_lic -a command applicable to PHANTOM processes also ?

Thanks Guys !
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Re: phantom

Post by kcbland »

PhantomSquawk wrote:remember this, it is the phrase to get in to the special meetings. Say it to the doorman: "The phantom squawks at midnight"
Thanks for the info Ray, but this is the reference I don't get. Care to clue me in?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nope, that one's got me, too. :oops:
PhantomSquawk, care to enlighten the world?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply