Manipulate standardization rule

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
nilanjan
Participant
Posts: 16
Joined: Fri Jan 18, 2013 4:12 am

Manipulate standardization rule

Post by nilanjan »

Hi,
I want to manipulate standardization output columns.Can i do that? Suppose i want to generate RVSNDX column for Primaryword1,Primaryword2 and Primaryword3 together.Can i do that by concatinating all three columns?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes you can, but it's probably a waste of time. RVSNDX (and Soundex) will only look at 4-6 characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For those particular combinations you are correct. It is not possible to assert that you are generally correct. Other words are similar in the right hand end, particularly names of corporate entities.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

You're obviously not using QualityStage to do this soundex. Soundex in a Transformer stage?

If you want a soundex that works on longer strings, you'll have to write it yourself. Though why you would, I'm not sure: phonetically it's quite loose, and it also falls down where the first letter of the strings differ.
So KrispyKreeme and CrispyKreeme will never match, regardless of how long you make the key.

If you want to use the ruleset properly (and back towards the original queation), you'll have to look at the fields you are given and understand what it does. If you need to, you can change the PAT and DCT files to add RVSNDX and NYSIIS fields to MatchPrimaryName3 (and 4 and 5 if need be) as well.
nilanjan
Participant
Posts: 16
Joined: Fri Jan 18, 2013 4:12 am

Post by nilanjan »

Stuart,
Yes u r right.I want to implement more powerful phonetic algorithm(reversesoundex,metaphone,double metaphone etc.) but as i m new to this tool,i really don't understand how to do that.I never change PAT or DCT files.It is very tricky to change anything in those files i guess.Can u just describe in detail how i can do that?
nilanjan
Participant
Posts: 16
Joined: Fri Jan 18, 2013 4:12 am

Post by nilanjan »

ray.wurlod wrote:For those particular combinations you are correct. It is not possible to assert that you are generally correct. Other words are similar in the right hand end, particularly names of corporate entitie ...
Ray,
As i m not a premium user,i am unable to see your complete reply.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

You can't change the phonetic algorithms that are used in the QS rulesets.
They are part of the PAL.

The DCT file is just the output metadata for the ruleset. The QS user guide will explain it to you. As for the PAT file, read the Pattern Action Language Reference to understand what is in there. If I was doing it, I'd look to how it currently populates MatchPrimaryWord2NYSIIS and MAtchPrimaryWord2RVSNDX, and apply the same logic to MatchPrimaryWord3.

You would be able to write your own custom function in C/C++ to implement any phonetic algorithm and use it from within a transformer stage (although some like Double Metaphone may produce 2 output strings, so that will affect how you will use it). That is not QualityStage, however.
Post Reply