Convert function in rule set

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
BuddingDev
Premium Member
Premium Member
Posts: 43
Joined: Wed Feb 08, 2012 8:12 pm
Location: United States

Convert function in rule set

Post by BuddingDev »

Hi,

I have to drop a invalid input record by looking into the reference table in one specific rule set.

So I have created a reference table with two tabs, first has real value which supposed to be matched with an input record and second tab has word INVALID in it.
Now in my .PAT file I have made following modification.


? |?|? |? |? |? |?

COPY_S [1] Person_NamePrefix
COPY_S [2] TEMP
CONVERT TEMP @INVALIDNAME.TBL TKN
[{TEMP}= "INVALID"]
RETYPE [2] 0

But I dont see desired outcome by testing this rule set. The invalid value is still getting populated.

Any kind of suggestions are much welcomed.

Thanks
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

This is because in your second pattern test, you did not include the original pattern, so [2] is not populated. Try the following instead:

Code: Select all

? |?|? |? |? |? |? 
COPY_S [1] Person_NamePrefix 
COPY_S [2] TEMP 
CONVERT TEMP @INVALIDNAME.TBL TKN 

? |?|? |? |? |? |? | [{TEMP}= "INVALID"] 
RETYPE [2] 0 
The "trick" is to repeat the original pattern, and add the test for the temporary variable.

I hope this helps!
Regards,
Robert
BuddingDev
Premium Member
Premium Member
Posts: 43
Joined: Wed Feb 08, 2012 8:12 pm
Location: United States

Post by BuddingDev »

Thanks for the quick suggestion.
For some reason it is still not dropping those invalid words which are passed in the input.

I have placed this code in my .PAT file at the beginning right after checking delimiters.
I tried changing ? to + to reflect one word which I am passing during test but still no success.

Here is the full code.
+|+|+|+|+ |+ |+|
COPY_S [1] Person_NamePrefix
COPY_S [2] TEMP
CONVERT TEMP @INVALIDNAME.TBL TKN

+|+|+|+|+ |+ |+ | [{TEMP}= "INVALID"]
RETYPE [2] 0

COPY_S [3] Person_GivenName2
COPY_S [4] Person_GivenName3
COPY_S [5] Person_GivenName4
COPY_S [6] Person_FamilyName
COPY_S [7] Person_NameSuffix

The requirement is to drop invalid word showing up in any of the person name field should be dropped but I started developing a code by picking second field.

Thanking You in advance!
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Can you provide an example of the invalid words? Are they words, for example, that are valid in other contexts?
Regards,
Robert
BuddingDev
Premium Member
Premium Member
Posts: 43
Joined: Wed Feb 08, 2012 8:12 pm
Location: United States

Post by BuddingDev »

Actually I am trying to drop any vulgar words coming from source file.
It has only one word in one row because I had to put one word in one line for classification tokenization sake.

You can imagine any common vulgar word. I don't want to get blocked for this reason

:D
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That could be counter-productive, and probably as fraught as the Indian government's stated intention to block any internet web site that contains "inappropriate" content. This is clearly a movable feast; what's appropriate today, or to one political party, might not be tomorrow or to a different political party.

"Vulgar" also has to be taken in context. A "bastard file" is a perfectly valid name.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Understood :lol:

Another way to deal with your issue is to use overrides. Let's assume 'FOO' is one of the words you want to remove, then add a classification override of FOO with a class of 0 (zero). The 0 class means 'drop' the data. So, if your input were
RAY FOO WURLOD
then the 'input pattern' would be F+ (assuming USNAME) because FOO was dropped.

Does this approach work for you?
Regards,
Robert
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Maybe I'm not thinking this out enough, but if you have a list of inappropriate words, why not just put them in the CLS file?

NAUGHTYWORD BLEEP X


Then in the PAT, up-front where the initial retypes get done:

0*X
retype [1] 0

Gets rid of all inappropriate words, regardless of where they occur.
BuddingDev
Premium Member
Premium Member
Posts: 43
Joined: Wed Feb 08, 2012 8:12 pm
Location: United States

Post by BuddingDev »

Both technique did work for dealing with invalid words. :)

However I decided to add all invalid words to classification table due to their high numbers it was little tedious to add them in overrides.

Thank you so much all for your valuable advice. Having DSXchange kind of community really makes a difference.

Have a wonderful day!
Post Reply