Remove multiple special characters from leading and trailing

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

rahuljha26
Participant
Posts: 7
Joined: Wed Sep 20, 2017 4:15 am

Remove multiple special characters from leading and trailing

Post by rahuljha26 »

Remove multiple special characters from leading and trailing of a string in datastage.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy

could you please suggest how to do of the above scenario.
Rahul Jha
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

They don't look all that special to me. :wink:

If you can create a list of the characters to remove, you can use the Convert function documented here if you scroll down a bit. Convert them all to "" to simply remove them. If you'd rather build a list of the characters to keep then do an exact search here for "double convert", a useful technique based on what to keep rather than what you want to remove.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, crap... just dawned on me I missed the "leading and trailing" part of this. :? Seems to me you may just need to substring out the middle... use Index to find the first and last "non-special" character in the string and cut that chunk out of the middle.

I'm also curious if "^" is a single character or if it takes on the meaning of "control" here? Meaning is "^&" one character or two characters? I'm assuming two but thought it prudent to confirm.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

You could use multiple nested Trim() functions--one for each special character. Please consult the product documentation on the Trim() function to specify options for leading and trailing characters.
Choose a job you love, and you will never have to work a day in your life. - Confucius
rahuljha26
Participant
Posts: 7
Joined: Wed Sep 20, 2017 4:15 am

Post by rahuljha26 »

Thanks craig for replying the post...

can you elaborate more how to find the first and last "non-special" character in the string using the Index function.
Any example will be helpful for this.

i tried to use the index function. i am mentioning the below code.

Index(Trim(columnname),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',1)

output :0

Your comment will be highly appreciable for the above.

Thanks,
Rahul
Rahul Jha
rahuljha26
Participant
Posts: 7
Joined: Wed Sep 20, 2017 4:15 am

Post by rahuljha26 »

Thanks qt_ky for replying the post .

can we have other method to do the above scenario. Instead of multiple nested Trim() function.
Rahul Jha
sriven786
Participant
Posts: 37
Joined: Wed Nov 08, 2017 1:36 pm

Post by sriven786 »

Is your Delimiter: ^&

Please clarify.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy

Tried with Convert as below
Convert(char(10):char(11):char(35):char(36):char(33):char(64):char(37):char(42):,' ','!@##$$Rami^&Reddy*&^%$')

!@##$$Rami^&Reddy*&^%$ converted as Rami^&Reddy&^ (As we are not replacing these 2 special Characters)
Venkata Srini
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If that reply was directed my way, I didn't ask anything about a delimiter... but it looks like you answered my question indirectly. And that's not how one would use Index for this, you ask it to look for that specific string of 52 characters, not any single one of them. Having said that, I don't think my suggestion of using that function was exactly spot on so I'll have to ponder this a bit today to see what else comes to mind.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rahuljha26
Participant
Posts: 7
Joined: Wed Sep 20, 2017 4:15 am

Post by rahuljha26 »

Thanks Srini for ur reply...

No srini this is not delimeter &^.We don't have any delimeter in the input source. As i mentioned in the example that we will have input data like and we don't know the input data.we need to remove the special characters from leading and trailing.we are not removing any special characters in between alphabet characters.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should only like this: Rami^&Reddy
Rahul Jha
sriven786
Participant
Posts: 37
Joined: Wed Nov 08, 2017 1:36 pm

Post by sriven786 »

Thanks for Clarifying the requirement.

I was trying to Convert all Alphabets to some Value and and after that find the First Occurrence of that

But looks like the CONVERT Function is not working as Expected

Input String: !@##$$Rami^&Reddy*&^%$'
Derivation:
convert(CHAR(82):CHAR(97):CHAR(101):CHAR(100):CHAR(121):CHAR(105):CHAR(109),CHAR(49),'!@##$$Rami^&Reddy*&^%$')

Expecting: !@##$$1111^&11111*&^%$'
Actual Output: !@##$$1^&1*&^%$

Looks like it's converting only the First CHAR(82) (R to 1) and ignoring all others

Tried as convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','1','!@##$$Rami^&Reddy*&^%$')
but this is also resulting as !@##$$^&*&^%$
in this case, it just drops all Alphabets from the String
Venkata Srini
rahuljha26
Participant
Posts: 7
Joined: Wed Sep 20, 2017 4:15 am

Post by rahuljha26 »

Thanks Srini...

yeah i did also the same thing and my output is also coming like the above output. i am wondering why it is not replacing the alphabet with 1.
Rahul Jha
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need to look at the description of the function again. It takes all characters in the "from" list and replaces each with the corresponding (positional) character in the "to" list. In other words, to replace them all with a "1" you would need the same number of "1"s in the second list.

Was just coming here to say that in all honesty I'm thinking that your best option here is to write something in C++ and that I'm a bit surprised that UCDI hasn't already come along and said something along those lines. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
rahuljha26
Participant
Posts: 7
Joined: Wed Sep 20, 2017 4:15 am

Post by rahuljha26 »

We don't know in which position we have which special characters.so multiple nested Trim() function also we can't use.
Rahul Jha
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

You have a logical paradox in your requirements: certain characters are to be removed, except when they are not to be removed. The difference is positional, which is why it's a paradox.

Why must you preserve the characters on the inside? Is this a hard requirement (like an audit trail), or is it arbitrary?

I suggest you move up a step, and identify the purpose of the data. In your example, you have a person's name. If the data is a name, there should be no reason to preserve the characters between the first and last name. You should be able to just identify all invalid characters, replace them with Space(), do a final Trim to remove leading and trailing and reduce the internal spaces to one.

It looks to me like you have bad, messy data (usually because users are lazy), and you are being told to clean it up by the users who don't know how to properly construct a requirement. You may not have any control over the users, but sometimes they just have to be told "no, you can't have it this way."
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

I'm with him ^^^

but if it turns out that you need some awful convoluted logic to do this, write a C or basic routine. Trying to drop a 15+ deep nested replace with if-thens in the middle is going to be unpossible to debug and even if you get it working it will be extra challenging to upgrade or modify it later.
Post Reply