Advice on joining on 'unexact' match

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
fit_dsx
Premium Member
Premium Member
Posts: 2
Joined: Mon Feb 27, 2012 4:32 pm

Advice on joining on 'unexact' match

Post by fit_dsx »

Need some advice. I don't know if I should be using QualityStage (for which I have no experiece) or DataStage or just down and dirty SQL.

Problem:
I have invoice numbers in a column in a table from the AR System.

I have invoice numbers in a a column from the ledger system.

I need to be able to join on these two columns.

However, the invoice numbers in the ledger system are the same as the AR System EXCEPT for extraneous data appended to the invoice in the ledger system so it is not an exact match. Also, it is not consistent. For example, data from different countries might have different data appended, so it is not a straight-forward RULE.

I seemingly need to do a 'like' join somehow. or have many 'CASE' statements that define each RULE to clean up the extraneous data appended to the invoice number and then do the join.

I am not sure how to best attack this problem. Does anyone have any thoughts?
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

if i have understood it correctly.

Format the ledger system column data in a standard format so that it joins exactly with AR system,

stripping of the extraneous data in a separate column and append it later when you are done with join for the ledger system record by having an identifier too upstream for both the sources.
fit_dsx
Premium Member
Premium Member
Posts: 2
Joined: Mon Feb 27, 2012 4:32 pm

Post by fit_dsx »

Thaks for the reply...Sorry..I am not sure if are you restating my problem statement or giving me a solution?

My problem is how to 'standardize' the invoice columns so that a match can be made between the two systems

The 'REAL' invoice numbers are 'contained' within the LEDGER SYSTEM column. There are extra characters and no specific rules as to where the extra characters are. There could be 3 characters at the beginning...there could be 4 characters at the end..etc...etc

Example Ledger System Invoice : 0000123456
Example Ledger System Invoice : 00123456XXX
Example Ledger System Invoice : XX123456000

Example AR System Table Invoice: 123456 (I think this can be declared the STANDARD)

Is there a way within QualityStage and or DataStage to be able to systematically cleanse the data from the Ledger System so that it is known that the invoice is really: 123456 so that a join can be made between the two tables from the two systems?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can perform a Reference Match using QualityStage and UNCERT as the match rule.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

You could use an uncert as Ray says, but if there is no junk at the ends then you could match different numbers.

Do you have a known consistent format in your AR table? Eg it's always 6 digits, so you have 000123 and not 123? In the ledger table?
The key to a solution is working out which behaviours you can depend on and the ones you need to account for.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Good points.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

This data may benefit from a custom Standardization rule that removes leading and trailing alpha characters and zeroes..
Regards,
Robert
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... but maybe not ALL trailing zeroes. Think it through. You will need to keep six numeric characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Well, yes of course! :D
Regards,
Robert
Post Reply