DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
This topic is not resolved, but there is a WORKAROUND.
Author Message
rohit_mca2003
Participant



Joined: 08 Oct 2008
Posts: 36

Points: 437

Post Posted: Mon Jan 15, 2018 4:37 am Reply with quote    Back to top    

DataStage® Release: 9x
Job Type: Parallel
OS: Unix
Additional info: CHECKSUM stage re-arranges the columns while computing hash
Hi Everyone,

I have issue while computing hash value using 'CHECKSUM' stage. It seems that CHECKSUM stage re-arranges the columns as per their names while computing the hash value.

Example:
----------
Source --> Col1, Col2, Test_Val

Case1: If I generate checksum keeping order as (Col1, Col2, Test_Val)
Case2: If I generate checksum keeping order as (Col1, Test_Val, Col2).

In both cases, I get same checksum value (but order of column I passed was different).

Is there any way I can restrict this stage to re-arrange the columns while computing the hash value.

Thanks.

_________________
Rohit
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42479
Location: Denver, CO
Points: 218511

Post Posted: Mon Jan 15, 2018 8:57 am Reply with quote    Back to top    

Can't answer your question but I suspect the answer is 'no'. But I have to ask - does it really matter? What's important (IMHO) are the columns selected and the fact that they are used / ordered in a consistent manner. As someone who has had to clean up a mess made by developers hashing the same columns in a different order, I would gladly take that aspect of it out of their hands. Wink

_________________
-craig

I know I don't say this enough, but I like when you talk to me. It's much better than when nobody talks to me. Or when people that I don't like will not stop talking to me.
Rate this response:  
Not yet rated
rohit_mca2003
Participant



Joined: 08 Oct 2008
Posts: 36

Points: 437

Post Posted: Mon Jan 15, 2018 10:10 pm Reply with quote    Back to top    

Thanks Craig.
I need to generate CHECKSUM based on the column order as there is requirement to have keys in specific order.
Existing application is running in production where hash was generated based on order of columns and to match with existing hash values we need to follow same order of keys/columns.

_________________
Rohit
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42479
Location: Denver, CO
Points: 218511

Post Posted: Tue Jan 16, 2018 8:15 am Reply with quote    Back to top    

Best to open a support case then, the documentation doesn't seem to show that as an option. Out of curiosity, what 'existing application' was used to generate the checksum and what did it use to generate them? I've had 'issues' trying to match checksums from different systems, hence the question.

_________________
-craig

I know I don't say this enough, but I like when you talk to me. It's much better than when nobody talks to me. Or when people that I don't like will not stop talking to me.
Rate this response:  
Not yet rated
abhiramk
Participant



Joined: 22 Dec 2015
Posts: 1
Location: India
Points: 23

Post Posted: Thu Feb 08, 2018 5:11 am Reply with quote    Back to top    

Before computing the checksum, the Checksum stage arranges the columns in the alphabetical order of column names (irrespective of the order they appear in your input column tab).

I have tried and tested a couple of possibilities and there seem to be no option to rearrange the column order in which checksum is calculated. It always takes the alphabetical order.

To add more, the checksum stage adds a pipe symbol at the end of each column and calculates HASH using MD5 algorithm.

For e.g, if you are using two columns COL1 and COL2 to compute HASH/Checksum, COL1 holds ALICE and COL2 holds BOB, checksum is computed on ALICE|BOB|

This "ALICE|BOB|" string doesn't change even if COL2 is your first column and COL1 is your second column. The columns are first arranged as COL1,COL2 and a pipe operator is inserted at the end of each column and checksum is computed using MD5 hash algorithm.

So, even if there is an option to rearrange the columns (which obviously is a speculation), the hash computed by your application will not match with the Datastage computed hash unless your application too adds a pipe symbol at the end of each column.

_________________
Cheers
Abhiram
Rate this response:  
Not yet rated
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 302

Points: 2987

Post Posted: Mon Feb 12, 2018 10:05 am Reply with quote    Back to top    

can you DIY?

if you explicitly concat your data into a single column the checksum will be in the order you specify with any delimiters you specify (or not).

You can also toss the internal checksum and use an external one that is shared across your platforms, instead of trying to replicate datastage's checksum externally.
Rate this response:  
Not yet rated
rohit_mca2003
Participant



Joined: 08 Oct 2008
Posts: 36

Points: 437

Post Posted: Fri Mar 02, 2018 3:07 am Reply with quote    Back to top    

chulett wrote:
Best to open a support case then, the documentation doesn't seem to show that as an option. Out of curiosity, what 'existing application' was used to generate the checksum and what did it use to generate them? I've had 'issues' trying to match checksums from different systems, hence the question.


Earlier data was feeded from database so it is easy to concat the columns in SQL but the source has been changed to file and we can not read columns as concatenated.

_________________
Rohit
Rate this response:  
Not yet rated
rohit_mca2003
Participant



Joined: 08 Oct 2008
Posts: 36

Points: 437

Post Posted: Fri Mar 02, 2018 3:24 am Reply with quote    Back to top    

To answer everyone's query. we have resolved this issue and may be would be good use case for future.

I am aware that DataStage puts '|' after each field passed to 'checksum' operator and it uses HASH MD5.

I tried 2 solutions and both worked:

1. In generic stage, I wrote small code for 'transform' operator and concatenate the fields in order which I wanted. As my job is generic is nature so I used parameters for this. Then passed that concatenated column to 'checksum' operator.

2. We built logic to rename and prefix (using alphabet and numbers) all the source columns in such a way that even after re-arrangement of as per column names will be in same order.

Thanks for all the help. I am marking this as resolved with workaround.
Anyways I will raise a ticket for this.

_________________
Rohit
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours