Questions regarding Hash files and hash file stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Craig,

Thanks for your quick reply. I was under the assumption that only hash partitioning in parallel jobs works by creating and storing the hash value.

More details would be very helpful.

Thanks,
mav
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The hash value in the partitioning algorithm is divided by the number of nodes and the remainder is the node number to be used. Similarly the hash value in the hashing algorithm (for a hashed file) is divided by the number of groups in the hashed file and the remainder (plus 1) is the group number to be used. This is multiplied by the page size to yield the address of the group in the file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Found the article I was thinking of, you can find it in the Learning Center here. There's also a post here with some discussion and a link to another product's pdf on their dynamic file implementation, similar enough to be helpful here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Thanks Craig :)
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Looks like I'm still not clear on this.

Could anyone please explain the following?

No. of groups (modulus) .... 112526 current ( minimum 1, 0 empty, 23893 overflowed, 1602 badly )

What does "112526 current" mean? I assume they are the number of groups. Then what do the rest of terms mean and how are they related to "current"? Why don't they add up to 112526? How are the above term related to the files Data.30 and Over.30?

And when I resize the hashed file why does current and minimum become same?
No. of groups (modulus) .... 133963 current ( minimum 133963, 0 empty, 8588 overflowed, 0 badly )
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In a dynamic hashed file the number of groups (also known as modulus) can change second by second as data are added and removed. Hence the term "current" as at the time the utility (possibly ANALYZE.FILE) was run.

The minimum value for number of groups is set by the MINIMUM.MODULUS keyword in CREATE.FILE or RESIZE commands. I don't believe it should automatically be set to current by RESIZE, but don't doubt your results.

A group consists of one page (or "buffer") in the DATA.30 file and zero or more pages in the OVER.30 file, linked by pointers in a special kind of double-linked list that allows for repairs.

A group with zero pages in OVER.30 is said to be well-tuned. These groups account for the arithmetic discrepancy as they're not explicitly reported.

Empty groups suggest that the file is considerably over-sized, and could be made physically smaller.

A group with one page in OVER.30 is said to be overflowed. It will cease being overflowed when that group splits during the regular dynamic file growth.

A group with more than one page in OVER.30 is said to be badly overflowed, and will require two or more cycles of splits in order not to be overflowed.

A perfectly tuned hashed file has no overflowed or badly overflowed or empty groups. This is almost impossible to achieve in practical processing (it's also affected by the hashing algorithm and GROUP.SIZE settings), so we aim to minimize the number of overflowed groups.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And here I was thinking about chiming in on this... thank goodness I didn't have time last night. LOL
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Maybe mavrick21 will mull over that answer for four more years before coming back with the next question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Thank you!
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

4 years! Didn't realize until you mentioned it.
Post Reply