Page 1 of 2

Remove multiple special characters from leading and trailing

Posted: Sat Nov 11, 2017 11:32 am
by rahuljha26
Remove multiple special characters from leading and trailing of a string in datastage.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy

could you please suggest how to do of the above scenario.

Posted: Sat Nov 11, 2017 3:24 pm
by chulett
They don't look all that special to me. :wink:

If you can create a list of the characters to remove, you can use the Convert function documented here if you scroll down a bit. Convert them all to "" to simply remove them. If you'd rather build a list of the characters to keep then do an exact search here for "double convert", a useful technique based on what to keep rather than what you want to remove.

Posted: Sat Nov 11, 2017 4:56 pm
by chulett
Well, crap... just dawned on me I missed the "leading and trailing" part of this. :? Seems to me you may just need to substring out the middle... use Index to find the first and last "non-special" character in the string and cut that chunk out of the middle.

I'm also curious if "^" is a single character or if it takes on the meaning of "control" here? Meaning is "^&" one character or two characters? I'm assuming two but thought it prudent to confirm.

Posted: Mon Nov 13, 2017 8:56 am
by qt_ky
You could use multiple nested Trim() functions--one for each special character. Please consult the product documentation on the Trim() function to specify options for leading and trailing characters.

Posted: Tue Nov 14, 2017 3:58 am
by rahuljha26
Thanks craig for replying the post...

can you elaborate more how to find the first and last "non-special" character in the string using the Index function.
Any example will be helpful for this.

i tried to use the index function. i am mentioning the below code.

Index(Trim(columnname),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',1)

output :0

Your comment will be highly appreciable for the above.

Thanks,
Rahul

Posted: Tue Nov 14, 2017 4:03 am
by rahuljha26
Thanks qt_ky for replying the post .

can we have other method to do the above scenario. Instead of multiple nested Trim() function.

Posted: Tue Nov 14, 2017 9:41 am
by sriven786
Is your Delimiter: ^&

Please clarify.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy

Tried with Convert as below
Convert(char(10):char(11):char(35):char(36):char(33):char(64):char(37):char(42):,' ','!@##$$Rami^&Reddy*&^%$')

!@##$$Rami^&Reddy*&^%$ converted as Rami^&Reddy&^ (As we are not replacing these 2 special Characters)

Posted: Tue Nov 14, 2017 10:05 am
by chulett
If that reply was directed my way, I didn't ask anything about a delimiter... but it looks like you answered my question indirectly. And that's not how one would use Index for this, you ask it to look for that specific string of 52 characters, not any single one of them. Having said that, I don't think my suggestion of using that function was exactly spot on so I'll have to ponder this a bit today to see what else comes to mind.

Posted: Tue Nov 14, 2017 12:30 pm
by rahuljha26
Thanks Srini for ur reply...

No srini this is not delimeter &^.We don't have any delimeter in the input source. As i mentioned in the example that we will have input data like and we don't know the input data.we need to remove the special characters from leading and trailing.we are not removing any special characters in between alphabet characters.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should only like this: Rami^&Reddy

Posted: Tue Nov 14, 2017 2:27 pm
by sriven786
Thanks for Clarifying the requirement.

I was trying to Convert all Alphabets to some Value and and after that find the First Occurrence of that

But looks like the CONVERT Function is not working as Expected

Input String: !@##$$Rami^&Reddy*&^%$'
Derivation:
convert(CHAR(82):CHAR(97):CHAR(101):CHAR(100):CHAR(121):CHAR(105):CHAR(109),CHAR(49),'!@##$$Rami^&Reddy*&^%$')

Expecting: !@##$$1111^&11111*&^%$'
Actual Output: !@##$$1^&1*&^%$

Looks like it's converting only the First CHAR(82) (R to 1) and ignoring all others

Tried as convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','1','!@##$$Rami^&Reddy*&^%$')
but this is also resulting as !@##$$^&*&^%$
in this case, it just drops all Alphabets from the String

Posted: Tue Nov 14, 2017 2:56 pm
by rahuljha26
Thanks Srini...

yeah i did also the same thing and my output is also coming like the above output. i am wondering why it is not replacing the alphabet with 1.

Posted: Tue Nov 14, 2017 3:29 pm
by chulett
You need to look at the description of the function again. It takes all characters in the "from" list and replaces each with the corresponding (positional) character in the "to" list. In other words, to replace them all with a "1" you would need the same number of "1"s in the second list.

Was just coming here to say that in all honesty I'm thinking that your best option here is to write something in C++ and that I'm a bit surprised that UCDI hasn't already come along and said something along those lines. :wink:

Posted: Tue Nov 14, 2017 3:55 pm
by rahuljha26
We don't know in which position we have which special characters.so multiple nested Trim() function also we can't use.

Posted: Wed Nov 15, 2017 8:56 am
by FranklinE
You have a logical paradox in your requirements: certain characters are to be removed, except when they are not to be removed. The difference is positional, which is why it's a paradox.

Why must you preserve the characters on the inside? Is this a hard requirement (like an audit trail), or is it arbitrary?

I suggest you move up a step, and identify the purpose of the data. In your example, you have a person's name. If the data is a name, there should be no reason to preserve the characters between the first and last name. You should be able to just identify all invalid characters, replace them with Space(), do a final Trim to remove leading and trailing and reduce the internal spaces to one.

It looks to me like you have bad, messy data (usually because users are lazy), and you are being told to clean it up by the users who don't know how to properly construct a requirement. You may not have any control over the users, but sometimes they just have to be told "no, you can't have it this way."

Posted: Wed Nov 15, 2017 12:09 pm
by UCDI
I'm with him ^^^

but if it turns out that you need some awful convoluted logic to do this, write a C or basic routine. Trying to drop a 15+ deep nested replace with if-thens in the middle is going to be unpossible to debug and even if you get it working it will be extra challenging to upgrade or modify it later.