Hash file job - individual records going to separate files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
edmtenzing
Premium Member
Premium Member
Posts: 22
Joined: Wed Jul 23, 2008 5:59 pm
Contact:

Hash file job - individual records going to separate files

Post by edmtenzing »

Hi all,

I am facing a strange issue that I've never come across before, around hash file creation. I have a process that does two SQL extracts from a database, performs some basic transformation and outputs to a hash file. Here is a screenshot of the job:

Image

The problem is that the other day one of the transforms (bottom one in the image above) outputted individual records to separate files and I cannot make sense of it. So instead of the transformer outputting records to a single hash file, it ended up outputting 150,000ish individual files to the hash directory (the 176K in the above image is from the latest run, but this count read 150Kish on problematic run in question). The impact was that we ended up running out of inodes on our UNIX box because the file creation meant we exceeded the inode threshold. As a result, other DS jobs fell over as there was no space to write to logs, temp directories etc.

On recompile and restart of the job, it restarted and ran as normal, creating a typical hash file with a DATA.30 of 85MB and a OVER.30 of 26MB.

If anyone has encountered a similar issue, I'd really like to hear what you found to be the cause and how you ensured it didn't happen again. I'm hesitant to keep this job running (even though it is seemingly running fine now) in case it topples every job running in production.

Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Check that you have exactly the same file name on both input links to the Hashed File stage - not only identically spelled, but also identically cased. Incidentally, your link row count is 1.7 million, not 176K.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
edmtenzing
Premium Member
Premium Member
Posts: 22
Joined: Wed Jul 23, 2008 5:59 pm
Contact:

Post by edmtenzing »

Hi Ray, my apologies - it aborted around 150K (I should've left out the bit about the record count in the image).

L_CSTID_TF_O is outputting to hash file named ac_ar_lookup

L_CST_MKT_SEGMENT_TF_O is outputting to a hash file called cst_mkt_segment_id_lookup

It is the former that is having the issue. I can confirm that spelling and casing is all ok.

I should add that this job has been in production since 2008 and this is the first time I, or anyone in my team, has come across this problem.

Cheers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

We've seen it before here and I saw it back in the day. One such example is here and there were two or three others I could find.
-craig

"You can never have too many knives" -- Logan Nine Fingers
edmtenzing
Premium Member
Premium Member
Posts: 22
Joined: Wed Jul 23, 2008 5:59 pm
Contact:

Post by edmtenzing »

Thank you Craig, this is helpful. Cheers
Post Reply