Reference Match Performance

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
kevink
Participant
Posts: 7
Joined: Wed Oct 16, 2013 2:14 pm
Location: Sydney

Reference Match Performance

Post by kevink »

We have a reference match on standardized address and area data. The reference data set has 25 million rows, and loads into the job at only 1700 rows per second. There are only 12,000 rows at a time in the source data set.

Can the experts please share ways to improve the performance of a reference match job? We would like to be able to run this match hourly but it currently runs 3.5 hours. :(
Kevin K Tashadow
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the source and reference data have a common key, then you can build a temporary table of the source data keys, and extract the reference data from that joined to the actual reference data, thereby processing only the reference records that are actually needed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kevink
Participant
Posts: 7
Joined: Wed Oct 16, 2013 2:14 pm
Location: Sydney

Post by kevink »

Unfortunately there is no common key in the two data sets. Is there any other strategy we might be able to use?
Kevin K Tashadow
Post Reply