Questions regarding Hash files and hash file stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Thanks Ray :)
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Ray / Craig,

Once I've optimized the hashed file and have all the OVER.30 data moved over to DATA.30, do I have to modify the existing DataStage jobs? By modify I mean change the modulus and other values in the hashed file stage.

Below are the various jobs which touch this hashed file
1) Initial job - Clears the hashed file by inserting one record (with @NULL values) for all fields
2) Lookup file creation job - Inserts all the records into the hashed file
3) Normal job - which is the normal ETL job which looks up on the hashed file.

Jobs (1) and (2) are run each weekend.
Job (3) run from Monday through Friday.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure if you have a legitimate need to separate #s 1 & 2 or not but you certainly don't need to insert any kind of null record to clear a hashed file. Just setting that Option is all that is needed, it will happen whether you write a record to it or not.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Craig,

How about changing the values in Hashed file stage in the job after the Hashed file has been optimized? Isn't that necessary?

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't believe so, they should remain intact even when the hashed file is cleared. If that's wrong, I'm sure Ray will be along shortly with a correction. :wink:

Now, if you dropped and recreated it each time that would be a different story... there you would have to ensure they were set properly in the Options section of the hashed file stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

We're not dropping and recreating them every time.

Thanks Craig :)
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Hello,

1) Why does HFC shows a separation value for dynamic hashed file? Please correct me if I'm wrong but I thought separation values are only for static hashed files and for dynamic hashed files we can only specify group size.

If I'm wrong then how to specify separation value for dynamic hashed file? I don't see a option in Designer. Should it be done via TCL?


2) Does optimizing a dynamic hashed file hold good only when a subset of records are selected from a hashed file?

I ask this question based on one of my posts - viewtopic.php?p=387741


Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

GROUP.SIZE 1 = separation 4
GROUP.SIZE 2 = separation 8
Any other value in HFC will cause a warning message to be displayed.
The CREATE.TABLE statement generated by HFC will include a GROUP.SIZE 2 clause if required. GROUP.SIZE 1 is the default, so does not have to be included.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

What's the purpose/advantage of having a hashed file stage between a transformer and a DRS stage for the lookup process? Why not a direct look up on DRS stage without the hashed file stage?

Does this have something to do with performance?

Thank you.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No idea what the purpose might be - it seems to me to be a silly design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

:D

Thanks Ray
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Define "between". If the DRS stage loads the Hashed File and then the job references the Hashed File there's nothing inherently silly about that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Craig,

Existing job design below

Code: Select all


                DRS
                 |
                 |
           Hashed File
                 |
                 |
DRS-------->Transformer--------->DRS

It's all in a single job, NOT in two separate jobs.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I've done that myself many times. And yes, can definitely have something to do with performance. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Craig,
And yes, can definitely have something to do with performance
"something" ??? :twisted:

Waiting to hear from Ray.
Post Reply