DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
hitmanthesilentassasin
Participant



Joined: 13 Mar 2007
Posts: 150

Points: 1382

Post Posted: Mon Nov 10, 2014 5:41 pm Reply with quote    Back to top    

DataStage® Release: 8x
Job Type: Parallel
OS: Unix
Additional info: An alternate way to handling multiple tokens to manual codin
Hi,

I am aware that multiple tokens can be handled via manual programming. However, I am looking for an alternate way to manual programming as I have a very huge list of multiple tokens in thousands to be standardized to their corresponding values. I know it works when I have to standardize a single token to multiple words in classification table but how to make it work the other way round?

Thanks!!
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Mon Nov 10, 2014 6:13 pm Reply with quote    Back to top    

Hi,

Can you please provide a few example of what the input would look like, and what you would like the output to be? If the tokens you are looking for are actually part of a longer string of tokens, then please provide that context in the example, too.

I think I understand what you are trying to do, but want to make sure so as to not assume Laughing

_________________
Regards,
Robert
Rate this response:  
Not yet rated
hitmanthesilentassasin
Participant



Joined: 13 Mar 2007
Posts: 150

Points: 1382

Post Posted: Mon Nov 10, 2014 7:38 pm Reply with quote    Back to top    

Hi Robert - I am trying to standardize cities and suburb names, to have it handled as a single token I have to concatenate the string within the code and then retype it to the classification I am looking for. But the concern here is since the number of names with multiple tokens are too many the patterns would go in 1000s.

For Example: to identify New York I have to use the combination of "New" followed by "York" and the concat the 2 tokens then retype it to the city classification.Only if I could classify "New York" to a specific classification code using classification table, I dont have to retype for all the suburbs and cities with multiple names.

Do you know any trick that can be applied here?
Rate this response:  
Not yet rated
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Wed Nov 12, 2014 1:49 am Reply with quote    Back to top    

Take a look at USPREP. It has a table called 'USCITIES.TBL' that is used in a way that sounds similar to your requirement. The Pattern Action Language looks for two or four tokens to look up. Cities like 'NEW YORK' and 'ALAMO HEIGHTS' and 'AVON BY THE SEA' are found by the rule set.

Basically, it has Pattern Action Language that handles 1, 2, 3, or 4 word city names based on the USCITIES table.

You should be able to use this as a technique.

I hope this helps!

_________________
Regards,
Robert
Rate this response:  
Not yet rated
hitmanthesilentassasin
Participant



Joined: 13 Mar 2007
Posts: 150

Points: 1382

Post Posted: Wed Nov 12, 2014 11:55 pm Reply with quote    Back to top    

Thanks Robert.

This is very close to what I was looking for but not the same.

With the help of the the table I am able to validate if the city is present or not. But I cant process it any further like suppress the city name. I think I missed to mention at the beginning that the standardization is part of the organizational names. where in I would want to suppress the names at specific occurrence.
Rate this response:  
Not yet rated
stuartjvnorton
Participant



Joined: 19 Apr 2007
Posts: 523
Location: Melbourne
Points: 3890

Post Posted: Thu Nov 13, 2014 4:53 pm Reply with quote    Back to top    

Are you saying you have the name of a branch office included in the company name?

If you can find it, you can move it somewhere safe and then push the rest of the name through the rest of the name parse/stan.
Rate this response:  
Not yet rated
hitmanthesilentassasin
Participant



Joined: 13 Mar 2007
Posts: 150

Points: 1382

Post Posted: Thu Nov 13, 2014 5:34 pm Reply with quote    Back to top    

Yes, I am trying to standardize branches and franchises to the same name. I cant unplug all the names because some of suburb names are part of the name itself. Hence, I need to have the names tokenized so as to be able to identify if the given suburb is part of the org name or a location indicator.
Rate this response:  
Not yet rated
stuartjvnorton
Participant



Joined: 19 Apr 2007
Posts: 523
Location: Melbourne
Points: 3890

Post Posted: Thu Nov 13, 2014 11:32 pm Reply with quote    Back to top    

This is the real issue.
Working out *when* you should take it out is harder than working out *how* to take it out.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours