How to avoid overruns on Hash Files, FileSets, & DataSet

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chandankambli
Participant
Posts: 14
Joined: Sun Jun 11, 2006 2:16 pm

How to avoid overruns on Hash Files, FileSets, & DataSet

Post by chandankambli »

Hello Experts:

How to avoid overruns on Hash Files, FileSets, & DataSets?

What measures to be taken when overruns takes place during the job execution?

Pls. this is very important for me to understand.
Thanks experts.
datastage_learner
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

What do you mean by overruns. If you mean re-running the same job then it depends upon what option you specify. For example for hashed files if you specify "clear file" then it will do as it says, clear the file and then load it again.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

DSguru2B wrote:What do you mean by overruns.
Exactly. Even I didn't get what the OP meant by 'overrun'?

Chandan, Could you explain what exactly you mean by overrun?

Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Are you looking into situtions which can fill up the allocated space for a Hash file, FileSets, & DataSets?
If yes these conditions are specific to your OS.
Of course there are ways to alter the defaults.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you referring to over writing the created file during other flows, you have a option to re write it after clearing and etc.,
If you referring to Memory overflow when loaded into Physical memory, it is again depends on your input data size and available memory size.
If you are referring to running out of space, its again need to be decided based on the input data size.
With this vague question, posters are prompted to reply at their own observation. So as pls give the specific information to get specific answer.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
chandankambli
Participant
Posts: 14
Joined: Sun Jun 11, 2006 2:16 pm

Post by chandankambli »

I am sorry to get back to the forum due to weekend. Anyways, I am refering to the max size that these stages allow say 2gb.

I have a server/parallel jobs and we get source data from Mainframes in a
Seq.file -> Hash File/DataSet/FileSet -> Database.

how to take care of the situations when during the execution time (production job) it goes beyond that limits?

What measures to be taken when overruns takes place during the job execution?

What check we can take to avoid situations?
Thanks experts.
datastage_learner
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You need to identify your bottlenecks. Is your data volume changing? What are your server utilizations?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

You need to estimate the size, before hand.
For example if you need to store more than 2 Gb of data in a hashed file, you can resize the file to 64 Bit.

You can use HFC calculator to get estimates of the size of your hashed files.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Hashed files can be created or resized with 64-bit internal pointers to avoid the 2GB limit.

Data Sets and File Sets automatically adjust, by creating additional segment files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandankambli
Participant
Posts: 14
Joined: Sun Jun 11, 2006 2:16 pm

Post by chandankambli »

During the production we get data sometimes more than 2gb.

Can you explain more about the "HFC calculator", Pls.?

How do we keep track if the load gets bigger than 2gb during productoion jobs.

Can you provide ideas wrt HF/DS/FS, pls?
Thanks experts.
datastage_learner
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

During the production we get data sometimes more than 2gb.

If you are not sure if your data is going to be below 2 Gb, you need to go for a 64-bit, to be on the safer side.

Can you explain more about the "HFC calculator", Pls.?

HFC.exe is available on your DS installation CD. You can search in the forum for more details on it.

How do we keep track if the load gets bigger than 2gb during productoion jobs.

If the data gets bigger than 2 Gb, you will have problems.

Can you provide ideas wrt HF/DS/FS, pls?

Ray has explained above for Data Sets and File Sets.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
chandankambli
Participant
Posts: 14
Joined: Sun Jun 11, 2006 2:16 pm

Post by chandankambli »

Perfect answers provided by all the experts. I appreciate your response.
Thanks experts.
datastage_learner
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

So overruns actually meant over-loads. If you are satisfied with the answers and have a clear direction of action, you can mark the thread as Resolved.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply