How to prevent CHECKSUM stage to re-arrange column names

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

How to prevent CHECKSUM stage to re-arrange column names

Post by rohit_mca2003 »

Hi Everyone,

I have issue while computing hash value using 'CHECKSUM' stage. It seems that CHECKSUM stage re-arranges the columns as per their names while computing the hash value.

Example:
----------
Source --> Col1, Col2, Test_Val

Case1: If I generate checksum keeping order as (Col1, Col2, Test_Val)
Case2: If I generate checksum keeping order as (Col1, Test_Val, Col2).

In both cases, I get same checksum value (but order of column I passed was different).

Is there any way I can restrict this stage to re-arrange the columns while computing the hash value.

Thanks.
Rohit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Can't answer your question but I suspect the answer is 'no'. But I have to ask - does it really matter? What's important (IMHO) are the columns selected and the fact that they are used / ordered in a consistent manner. As someone who has had to clean up a mess made by developers hashing the same columns in a different order, I would gladly take that aspect of it out of their hands. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

Post by rohit_mca2003 »

Thanks Craig.
I need to generate CHECKSUM based on the column order as there is requirement to have keys in specific order.
Existing application is running in production where hash was generated based on order of columns and to match with existing hash values we need to follow same order of keys/columns.
Rohit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Best to open a support case then, the documentation doesn't seem to show that as an option. Out of curiosity, what 'existing application' was used to generate the checksum and what did it use to generate them? I've had 'issues' trying to match checksums from different systems, hence the question.
-craig

"You can never have too many knives" -- Logan Nine Fingers
abhiramk
Participant
Posts: 2
Joined: Tue Dec 22, 2015 2:34 pm
Location: India

Post by abhiramk »

Before computing the checksum, the Checksum stage arranges the columns in the alphabetical order of column names (irrespective of the order they appear in your input column tab).

I have tried and tested a couple of possibilities and there seem to be no option to rearrange the column order in which checksum is calculated. It always takes the alphabetical order.

To add more, the checksum stage adds a pipe symbol at the end of each column and calculates HASH using MD5 algorithm.

For e.g, if you are using two columns COL1 and COL2 to compute HASH/Checksum, COL1 holds ALICE and COL2 holds BOB, checksum is computed on ALICE|BOB|

This "ALICE|BOB|" string doesn't change even if COL2 is your first column and COL1 is your second column. The columns are first arranged as COL1,COL2 and a pipe operator is inserted at the end of each column and checksum is computed using MD5 hash algorithm.

So, even if there is an option to rearrange the columns (which obviously is a speculation), the hash computed by your application will not match with the Datastage computed hash unless your application too adds a pipe symbol at the end of each column.
Cheers
Abhiram
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

can you DIY?

if you explicitly concat your data into a single column the checksum will be in the order you specify with any delimiters you specify (or not).

You can also toss the internal checksum and use an external one that is shared across your platforms, instead of trying to replicate datastage's checksum externally.
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

Post by rohit_mca2003 »

chulett wrote:Best to open a support case then, the documentation doesn't seem to show that as an option. Out of curiosity, what 'existing application' was used to generate the checksum and what did it use to generate them? I've had 'issues' trying to match checksums from different systems, hence the question.
Earlier data was feeded from database so it is easy to concat the columns in SQL but the source has been changed to file and we can not read columns as concatenated.
Rohit
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

Post by rohit_mca2003 »

To answer everyone's query. we have resolved this issue and may be would be good use case for future.

I am aware that DataStage puts '|' after each field passed to 'checksum' operator and it uses HASH MD5.

I tried 2 solutions and both worked:

1. In generic stage, I wrote small code for 'transform' operator and concatenate the fields in order which I wanted. As my job is generic is nature so I used parameters for this. Then passed that concatenated column to 'checksum' operator.

2. We built logic to rename and prefix (using alphabet and numbers) all the source columns in such a way that even after re-arrangement of as per column names will be in same order.

Thanks for all the help. I am marking this as resolved with workaround.
Anyways I will raise a ticket for this.
Rohit
Post Reply