Project Directory Growing Big

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
poorna_76
Charter Member
Charter Member
Posts: 190
Joined: Thu Jul 08, 2004 10:42 am

Project Directory Growing Big

Post by poorna_76 »

Hi,

Our Project Directory RootDir\Ascentail\DataStage\Projects\DEV is growing too big.

The reasons we think may be due to temporary files/intermediate files created during the project.

Can anybody guide us to what are the Folders( or fiels in that folder)we can safely delete, that may be of no use and will not have any effect on the Proejct.

I mean some thing like clearing &PH& folder.

Thanks in Advance.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: Project Directory Growing Big

Post by ogmios »

A hint: backup your project. Delete the jobs, reimport them and reschedule them (if that's required for a development environment). The problem you're probably having is that the project log hash files are becoming too big as they only grow and never shrink.

For the rest, If you create temporary files really in the project directory you can erase them, but backup your project before going to that way... one slip up and you "lose" your project.

Ogmios.

P.S. If you would create temporary files in the project directory, rewrite the jobs to put them somewhere else. Own files in the project directories are a nightmare for DataStage cold installs (when migrating to a new computer e.g.).

P.P.S. And of course you can clean up the &PH& directory but you have already automated that. I have not yet seen one DataStage site that does not cleanup &PH& :wink:
In theory there's no difference between theory and practice. In practice there is.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Log files DO shrink. You do know about the Administrator's option to clean up old logs for each project, right?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Are you sure log files shrink? Clearing a job log actually deletes rows at a time, it does not use a truncating type statement. Therefore if a log file is a DYNAMIC hash file, the only way to shrink it completely is either issue a "CLEAR.FILE" command to recover it back to the minimum modulus and empty the overflow or delete the file and recreate it.

I did a quick test and confirmed that the data section does shrink, but the overflow stays inflated.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You also need to get your data and intermediate files out of the project directory and into a dedicated data directory. Are you using the localhost account to store your hash files? Are you writing intermediate files to a subdirectory of your project? Put all files and hash files used by the jobs into a seperate area and you will find it more manageable.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Space within a dynamic hashed file can also be reclaimed using the RESIZE command (but not below the number of groups specified by the MINIMUM.MODULUS parameter).

Code: Select all

RESIZE RT_LOGnnn * * *
The three asterisks mean "do not change any of the tuning parameters".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

How it seems to work "from outside the box" is that the log hash files are the biggest size they ever were. If you've once got an enormous log file it will stay enormous afterwards independent of how much "clear log"s you do, unless you do a RESIZE.

Maybe a enhancement request for Ascential: do an automatic RESIZE of the log hash files after executing the "Clear log", it can't be that hard to implement.

The easiest solution I've found for big projects is export and import again (especially in a development environment). A BASIC job can also be written to go through the log hash files and do a "clear log" (not deleting the control records) and RESIZE them but I've never bothered to implement it.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Re: Project Directory Growing Big

Post by datastage »

ogmios wrote:A hint: backup your project. Delete the jobs, reimport them and reschedule them (if that's required for a development environment). The problem you're probably having is that the project log hash files are becoming too big as they only grow and never shrink.

Ogmios.
is there a technical difference between doing a (1) backup: (2) delete jobs: (3) reimport versus just doing a (1) backup: (2) reimport ? I know personally I would have just done the export/import and not bothered with deleting jobs as a middle step
ogmios wrote:P.P.S. And of course you can clean up the &PH& directory but you have already automated that. I have not yet seen one DataStage site that does not cleanup &PH& :wink:
I have, but its always a site that is new to DS and doesn't have any DS developers would years of experience.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

vmcburney wrote:You also need to get your data and intermediate files out of the project directory and into a dedicated data directory. Are you using the localhost account to store your hash files? Are you writing intermediate files to a subdirectory of your project? Put all files and hash files used by the jobs into a seperate area and you will find it more manageable.
In the Datastage 3.x and 4.x era, I used to prefer having staging and hashed files withing the project directory, maybe everyone's data sets were much smaller then, but certainly today's best practice is a Vincent mentions here.

Also, I know there is a good thread out there with some easy ways to create individual file pointers to hashed files stored outside of the localuv account, but does anyone have a method or script to automate doing this for all hashed files?
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
poorna_76
Charter Member
Charter Member
Posts: 190
Joined: Thu Jul 08, 2004 10:42 am

Post by poorna_76 »

Thanks for all the ideas.

I want to mention here the things we are doing & not doing.

We are writting all the intermediate files in seperate directory other than ProjectDirectory.

We are not creating the hashfiles outside the UV.
Is there a way we can delete the hash files that are created temporarlily during the project.
I mean that get created every time we run a job.


During our project development, we some have missed specifying temp directory for sort stage.
Result is we have lot of soa folders.


Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Deleting Temporary Hashed Files

Post by ray.wurlod »

Is there a way we can delete the hash files that are created temporarily during the project?

Yes, it's called designing your job streams so that the hashed files are deleted.

Probably the easiest method is to use the Administrator client (Command window) to execute the commands to delete the hashed files. Then multi-select all those commands and choose Save, to save the list of commands under a single name.

That single name can subsequently be used, for example in an after-job routine ExecTCL or called from DSExecute, to delete all the hashed files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply