How to calculate the size of the text file

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

There are two references to a hashed file like - D_HashedFileXXX and HashedFileXXX.
Inside HashedFileXXX directory you have two files - DATA.30 and OVER.30.
Check the size of these too.
Last edited by narasimha on Thu Feb 15, 2007 7:01 pm, edited 1 time in total.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
paddu
Premium Member
Premium Member
Posts: 232
Joined: Tue Feb 22, 2005 11:14 am
Location: California

Post by paddu »

sorry about that

here are the size of the files

DATA.30 691,870 KB
OVER.30 223,234 KB
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just for interest, do the calculations like I did earlier. Keep one byte per delimiter and use 14 bytes/record as the storage overhead. Post your calculations, so I can be certain you understand.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
paddu
Premium Member
Premium Member
Posts: 232
Joined: Tue Feb 22, 2005 11:14 am
Location: California

Post by paddu »

Ray-"Just for interest, do the calculations like I did earlier. Keep one byte per delimiter and use 14 bytes/record as the storage overhead. Post your calculations, so I can be certain you understand. "

I did not follow you Ray :? exactly . Why we need delimiter while calculating for Hashed files?


We downloaded Datastage client from the IBM site so Unsupported utilities was not provided to us.
I am not sure how to calculate for Hashed files.

I did something like this

10+3+8+14=35 bytes per line For 15446662 records equates to 540633170 bytes

May be i need to know more about Hashed files

Thanks
paddu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

paddu wrote:Why we need delimiter while calculating for Hashed files?

10+3+8+14=35 bytes per line \
For 15446662 records equates to 540633170 bytes
You need the delimiter because it's in there (though it's a "field mark" in hashed files).

By default hashed files are only filled to 80% of capacity, so your calculation is in the ballpark. ALlowing for the 80% (the "split load" tuneable parameter), your calculation would yield 675,791,463 bytes, whereas you observed slightly more than this.

There are other complexities relating to headers, free space, overflowed groups and oversized records, that I deliberately avoided. Your calculation, as I noted, is in the ballpark.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply