HASH Partition not working for Checksum values

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

HASH Partition not working for Checksum values

Post by rohit_mca2003 »

Hi,

I need to join the columns (using join stage) which have MD5 hash values (using Checksum stage for this).

I have same data in source and target so expected to match all the records but join is not happening properly. I am doing HASH partition before join. When analysed output of HASH partition then it is giving different result (count in each partition is different) for source and target records for each partition.

It seems partition does not happen in same way for source and target.
Please help if you know the reason and how I can resolve this as I have to join on column having MD5 hash values.

Thanks.
Rohit
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

if you do a hash partition on the md5 value, it should put them together properly. As you hashing on multiple keys that could be different?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right, was thinking much the same thing.

If that turns out to not be the issue, I for one would need more clarification about certain aspects of this. For example, the join is "not happening properly" because the partitioning is wrong (i.e. matching join keys don't go to the same partition) or do you mean something else. And it seems to me the simplest test to see if your core logic is sound is to run it on a single node. Is that something you've tried?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

Post by rohit_mca2003 »

When I say join is not happening properly it means if I run the join in sequence or in entire partitions then it is working fine.
but with HASH partition, partition does not seems to be working fine and records from both side (with same key) seems to be on different partition.
Rohit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... which unfortunately doesn't answer the question asked.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... in continuing to ponder this, it seems we can infer an answer. So the join itself is in fact working, assuming that "in sequence" means "sequentially" a.k.a. either on a single node or the stage being constrained. Which means we're back to exactly what are you partitioning on? Please detail for us (words, screenshot) that information.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply