dsjob within ksh scripts

hobocamp · Post by **hobocamp** » Tue Nov 29, 2011 1:05 pm

At our firmrun most of our DS jobs with one of a few ksh scripts.

There are a few standard audit-type steps that get executed before and after the main DS processing. These audit steps involve loading information to a few Oracle tables.

This process has been in place with very few problems for a few years. But lately, we've seen run times increase quite dramatically.

However, it's not the running of the datastage jobs themselves, but the wait time between them.

For example, as the last steps in one of our scripts, two jobs are called, one after the other, using dsjob.

The first is this:
DSCMD_BTCHAUDIT="$DSHOME/bin/dsjob -domain $Hostname -user $DsId -password $DsPwd -server $Hostname -run -wait -jobstatus $pBtchNm $pFileNm $pSysCde $pBtchNbr $pEffDt $pUserId $pBtchTyp $pBtchAuditSt $pStartTs $pEndTs $pEtlLdFileDir $pOdsDsn $pOdsId $pOdsPwd $ProjNm $BtchAuditJob"

Right after, is the second one:
DSCMD_INSETLREJ="$DSHOME/bin/dsjob -domain $Hostname -user $DsId -password $DsPwd -server $Hostname -run -wait -jobstatus $pBtchNm $pFileNm $pSysCde $pBtchNbr $pEffDt $pSrcFileDir $pLdFileDir $pTmpFileDir $pRejFileDir $pHashFileDir $pOdsDsn $pOdsId $pOdsPwd $ProjNm $InsEtlRejJob"

It's not unusual to see 15 - 20 minutes of wait time between the end of the first and the kickoff of the second.

One guess is that it may just be a system contstraint on our server, as I'm not able to replicate the issue in our Dev/UAT environment. But any suggestions as to something I could check would be greatly appreciated.
Thanks.
Tom

qt_ky · Post by **qt_ky** » Tue Nov 29, 2011 10:16 pm

I used to see slowdowns like that, and also sluggish response in Director, on projects with a lot of server job activity. Run this project command in Administrator to get a count of the status file and let us know what you find. Try it on multiple projects and compare the counts between Production and Dev/UAT.

COUNT &PH&

In my case it was resolved be clearing the status file regularly, as mentioned in the topic on this link (I did it manually about once a week):

viewtopic.php?t=86926

&PH& can grow and affect performance. There's a technote on the IBM site with more details:

"What is the &PH& directory used for in DataStage and does it need to be cleaned out?"

http://www-01.ibm.com/support/docview.w ... wg21414210

roy · Post by **roy** » Wed Nov 30, 2011 2:13 pm

Besides what was laready said you can see that kicking your second job is depended on the end of the first one since you use -wait in your run option, so you might experiancing an increase of execution time for the 1st job, did the data processed had an increase on records going into it lately?
IHTH( I hopr This helps),

PaulVL · Post by **PaulVL** » Wed Nov 30, 2011 3:08 pm

What flavor of Unix are you using?

AIX? Red Hat? Suse?

If it's Suse (not sure if this is present in Red Hat) you might want to go looking on your server if you have a fragmented directory structure. One way to find this is to "ls" some of the directories. If one of them takes minutes to return results... that could account for your delay.

(I've seen this happen.)

Are you running GRID?

Are you running one BIG project or many small ones on your install?

By big I mean quantity of jobs within the project, not quantity of data within a job execution.

hobocamp · Post by **hobocamp** » Thu Dec 01, 2011 8:45 am

Thanks for the suggestions so far. I'm in the process of examing our %PH% files and their cleanup.

Paul - Our OS is Solaris 9. No grid.

And we do have all of our jobs in one main project. We're not a huge shop - total of all jobs is around 1200.

qt_ky · Post by **qt_ky** » Sat Jan 14, 2012 6:27 pm

Did you have any luck resolving the delays?