Split multiple State tokens from single field

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
mpine
Charter Member
Charter Member
Posts: 4
Joined: Thu Mar 10, 2005 4:43 pm
Location: Leawood, KS

Split multiple State tokens from single field

Post by mpine »

I am trying to clean up a file. The file is in the following format:
40212 VA, TN
40419 CA,KY - IN
40520 RI,P.A.

The pattern report comes out like this:
00082394 63.618% S [X] | NH
00024016 18.543% SS [X] | OH,MS
00009435 7.285% SSS [X] | CA,FL,OH

It also has other entries but I would be happy with getting these
paterns cleaned up. I need to resulting file to look like:

40312 VA TN
40319 CA KY IN
40320 RI PA

Can you tell me the best way to accomplish this?
DSkkk
Charter Member
Charter Member
Posts: 70
Joined: Fri Nov 05, 2004 1:10 pm

be happy with getting these

Post by DSkkk »

Hi- So basically what u want is u need to eliminate the special characters from ur Data file. I am sure this can be achieved by using a Parse Stage
in Quality Stage where u can specify which character should be replaced by what. But this is slightly Cumbersome because u need to specify all the special characters that might occur and which u dont want. So the bottomline is that u can do this in a much efficient way in Datastage.

Cheers
DSkkk
g.kiran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can't you just make sure that "-", "," and "." are in your STRIPLIST ?

Not sure why you'd want to force the third character of the first column to be "3", so I'm assuming that the problem there was with the keyboard (or, at least, between the keyboard and the chair). :lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mpine
Charter Member
Charter Member
Posts: 4
Joined: Thu Mar 10, 2005 4:43 pm
Location: Leawood, KS

follow up

Post by mpine »

ray.wurlod wrote:Can't you just make sure that "-", "," and "." are in your STRIPLIST ?

Not sure why you'd want to force the third character of the first column to be "3", so I'm assuming that the problem there was with the keyboard (or, at least, between the keyboard and the chair). :lol:
It was a problem between the keyboard and the chair :D

Running a word investigate stage determines the states without a problem using the STRIPLIST, but I am not sure how to actually move the S (state) tokens it identifies into separate fields so I can use the pivot stage in DataStage to make the file how I would like. The final product would be as follows:

Out of QualityStage somehow:

40312 VA TN
40319 CA KY IN
40320 RI PA

Out of DataStage using a pivot stage (I think):

40312 VA
40312 TN
40319 CA
40319 KY
40319 IN
40320 RI
40320 PA
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pivot stage is a good call. Just make sure that its input link has enough columns to cope with the largest possible list of states. (50 is probably over the top, but is fail safe!)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply