Page 1 of 1

Standardization process failed

Posted: Mon Oct 24, 2011 12:50 pm
by akonda
I am trying to standardize the international addresses using standardize stage. Below is the error I am getting when I am using JPAREA rule set, can somebody please suggest me

Error;

Standardization process failed, The classificationt table has duplicate entry.

where can I see the duplicate entries, Is it possible to remove duplicate entries if any. ?

thanks
Arun

Posted: Mon Oct 24, 2011 4:48 pm
by ray.wurlod
Use the Rule Set management tool and look at the CLS file.

Posted: Mon Oct 24, 2011 5:37 pm
by stuartjvnorton
Go to Standardization Rules\Japan\JPAREA in the Repository Explorer.
Double click the .SET file and then in the rule dialog, click the Test button.
If there is a duplicate term in the classification file, it will tell you which line it is on (and I think also the token itself).
Then close the dialog and open JPAREA.CLS and go to the line in question. The token at the start of the line is the issue: it can't be in the file more than once.

Look for the other occurrences of that token and you'll have 2 options:
1. decide which one[s] you think it is safe to remove (your call as to which, but they were put there for a reason, so expect some effect of doing this).
2. you'll have to give it a new token type that is used by a new proc you'll have to write. The proc will use some context to work out which type it should be in that specific situation, and re-classify the token. Look at Multiple_Semantics in AUADDR for an example of this.

Posted: Tue Oct 25, 2011 8:14 am
by akonda
I went to the repository in Datastage designer and tested the .SET file.

Then the same error occuring. "Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS

" But not telling any specificline. also says,
I am not sure where I've to do the token intialization.

Posted: Tue Oct 25, 2011 10:05 am
by ray.wurlod
How did you "test the SET file" and what happened?

Posted: Tue Oct 25, 2011 10:56 am
by akonda
When I clicked on the test button in "Rule Management - JPAREA" window, it showing below error.

"Standardization process failed, The classificationt table has duplicate entry. Initialization of tokenization environment failed. c:/IBM/..../JPAREA.CLS

let me know if this is not the correct way of testing ??

Posted: Tue Oct 25, 2011 5:19 pm
by stuartjvnorton
Hmm, when I just tried it out using an AU ruleset, it didn't show the line number (my bad).
What it did show was the actual offending token on the very next line of the error message.

Surely that gives you enough to find the other occurrences of that token and work from there.


As for how to test, the test dialog lets you pick up almost everything without having to create and run a job.

Posted: Wed Oct 26, 2011 9:10 am
by akonda
For me it is not showing me any token informaiton, what I am not udnerstanding is "where can I initialize the token", because it says

"Initialization of tokenization environment failed."

Posted: Wed Oct 26, 2011 4:32 pm
by stuartjvnorton
When it says it can't initialise the tokenisation environment, it means that it can't load the ruleset. As it states, it can't do that because you have a duplicate token in the classification file.

If for some reason it doesn't tell you the offending token, you'll just have to open up the CLS file and manually look for the duplicate.

Posted: Fri Oct 28, 2011 7:15 am
by akonda
CLS file is not in understandable language. we raised a PMR to IBM to override the JAPAN ruleset.

Posted: Fri Oct 28, 2011 9:39 am
by ray.wurlod
The CLS file should be understandable (if you understand Japanese). It is a list with four columns:
  • a token (word) that might appear in your data

    the standard form of that token

    a letter designating the class of that token within the rule set

    (optional) an uncertainty threshold number, such as 850
The JPAREA.CLS file in my installation is perfectly understandable. It may be that yours is corrupted.