DataStage MD5 Implementation
Moderators: chulett, rschirm, roy
DataStage MD5 Implementation
Does anyone have an MD5 "Stage" for DataStage 11.5 or tips of how to create one?
Also, it would need to match the perl digest::md5 results?
Also, it would need to match the perl digest::md5 results?
Well, given that MD5 is an industry standard, any certified MD5 calculator should spit out the same result.
Introducing perl to the mix just to calculate that might be overkill.... depends how you would call it of course. If you spin up perl to calculate md5 for each row... that could be costly (up/down, up/down, up/down, etc...).
An external program to read it a file and concatenate the MD5 value... possible.
A Routine to add MD5 and put that in your transformer stage... possible.
Not sure if any databases out there has MD5 functions that can be called via stored procedure.
Introducing perl to the mix just to calculate that might be overkill.... depends how you would call it of course. If you spin up perl to calculate md5 for each row... that could be costly (up/down, up/down, up/down, etc...).
An external program to read it a file and concatenate the MD5 value... possible.
A Routine to add MD5 and put that in your transformer stage... possible.
Not sure if any databases out there has MD5 functions that can be called via stored procedure.
I know I'm resurrecting an old thread, but just encountered this, and it IS now documented.
The checksum stage does use MD5, but unfortunately the checksum stage changes the data being hashed without telling you so that it won't match an externally generated hash value unless they also add pipes to the data values in the appropriate places.
DataStage Checksum stage, how is the result computed?
http://www-01.ibm.com/support/docview.w ... wg22009454
The checksum stage does use MD5, but unfortunately the checksum stage changes the data being hashed without telling you so that it won't match an externally generated hash value unless they also add pipes to the data values in the appropriate places.
DataStage Checksum stage, how is the result computed?
http://www-01.ibm.com/support/docview.w ... wg22009454
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
You do have to add the pipes to do a proper checksum. Without a separator between fields your checksum could get false positive matches. The main issue is that it adds the pipe onto the end of the string you are performing the checksum on. Most manually coded MD5 functions will only add separators between field and not an extra one on the end. You can't remove that last pipe.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Interesting.
My classic example when teaching newbies about this keeps it simple. Imagine a record with two fields, with "A" in the first field and "BC" in the second. And then a change comes through with "AB" in the first and "C" in the second.
Without the pipes (or some separator):
1: ABC
2: ABC
The checksum would be identical.
Once more with feeling (and pipes):
1: A|BC
2: AB|C
And you golden.
:D
My classic example when teaching newbies about this keeps it simple. Imagine a record with two fields, with "A" in the first field and "BC" in the second. And then a change comes through with "AB" in the first and "C" in the second.
Without the pipes (or some separator):
1: ABC
2: ABC
The checksum would be identical.
Once more with feeling (and pipes):
1: A|BC
2: AB|C
And you golden.
:D
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Sorry for jumping in a bit late on this thread - I've been offline. I've built MD5 Operators a couple of times, they are quite easy. One can either find public-domain c++ code on the net or use a small interlude program which calls the libcrpyt md5 algorithm.
Compile this as a library from your OS and then create a Parallel routine definition as an "external function" where the library parth points the library object built above. It has one return value of type "char*" and one Input Parameter of the same type.
Code: Select all
#include <stdio.h> // Library containing "sprintf" //
#include <string.h> // Definitions for "strlen" and "strcpy" //
#include <openssl/md5.h> // MD5 Definition //
//==========================================================//
char* md5(char* InString) { // Method called from Datastage //
unsigned char digest[MD5_DIGEST_LENGTH]; // Binary function return value from md5 //
static char mdString[33]; // Function response string //
MD5_CTX ctx; // MD5 control structure definition //
MD5_Init(&ctx); // MD5 control structure initialization //
MD5_Update(&ctx, InString, strlen(InString)); // Compute the MD5 value //
MD5_Final(digest, &ctx); // Fill "digest" and "ctx" contents //
for(int i = 0; i < MD5_DIGEST_LENGTH; i++) { // Loop to move result into character-array as 16 Hex values//
sprintf(&mdString[i*2], "%02x", (unsigned int)digest[i]); // Convert using "sprintf" //
} // end of for-next each character // //
return (char*)mdString; // Return computed md5 string //
} // end of method md5 //----------------------------------------------------------//
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>