Page 1 of 1

Advice on joining on 'unexact' match

Posted: Wed Jan 30, 2013 8:36 am
by fit_dsx
Need some advice. I don't know if I should be using QualityStage (for which I have no experiece) or DataStage or just down and dirty SQL.

Problem:
I have invoice numbers in a column in a table from the AR System.

I have invoice numbers in a a column from the ledger system.

I need to be able to join on these two columns.

However, the invoice numbers in the ledger system are the same as the AR System EXCEPT for extraneous data appended to the invoice in the ledger system so it is not an exact match. Also, it is not consistent. For example, data from different countries might have different data appended, so it is not a straight-forward RULE.

I seemingly need to do a 'like' join somehow. or have many 'CASE' statements that define each RULE to clean up the extraneous data appended to the invoice number and then do the join.

I am not sure how to best attack this problem. Does anyone have any thoughts?

Posted: Wed Jan 30, 2013 8:55 am
by Nagaraj
if i have understood it correctly.

Format the ledger system column data in a standard format so that it joins exactly with AR system,

stripping of the extraneous data in a separate column and append it later when you are done with join for the ledger system record by having an identifier too upstream for both the sources.

Posted: Wed Jan 30, 2013 11:16 am
by fit_dsx
Thaks for the reply...Sorry..I am not sure if are you restating my problem statement or giving me a solution?

My problem is how to 'standardize' the invoice columns so that a match can be made between the two systems

The 'REAL' invoice numbers are 'contained' within the LEDGER SYSTEM column. There are extra characters and no specific rules as to where the extra characters are. There could be 3 characters at the beginning...there could be 4 characters at the end..etc...etc

Example Ledger System Invoice : 0000123456
Example Ledger System Invoice : 00123456XXX
Example Ledger System Invoice : XX123456000

Example AR System Table Invoice: 123456 (I think this can be declared the STANDARD)

Is there a way within QualityStage and or DataStage to be able to systematically cleanse the data from the Ledger System so that it is known that the invoice is really: 123456 so that a join can be made between the two tables from the two systems?

Posted: Wed Jan 30, 2013 1:25 pm
by ray.wurlod
You can perform a Reference Match using QualityStage and UNCERT as the match rule.

Posted: Wed Jan 30, 2013 7:19 pm
by stuartjvnorton
You could use an uncert as Ray says, but if there is no junk at the ends then you could match different numbers.

Do you have a known consistent format in your AR table? Eg it's always 6 digits, so you have 000123 and not 123? In the ledger table?
The key to a solution is working out which behaviours you can depend on and the ones you need to account for.

Posted: Wed Jan 30, 2013 7:32 pm
by ray.wurlod
Good points.

Posted: Thu Jan 31, 2013 7:25 am
by rjdickson
This data may benefit from a custom Standardization rule that removes leading and trailing alpha characters and zeroes..

Posted: Thu Jan 31, 2013 4:29 pm
by ray.wurlod
... but maybe not ALL trailing zeroes. Think it through. You will need to keep six numeric characters.

Posted: Thu Jan 31, 2013 4:42 pm
by rjdickson
Well, yes of course! :D