Remove multiple special characters from leading and trailing

rahuljha26 · Post by **rahuljha26** » Sat Nov 11, 2017 11:32 am

Remove multiple special characters from leading and trailing of a string in datastage.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy

could you please suggest how to do of the above scenario.

chulett · Post by **chulett** » Sat Nov 11, 2017 3:24 pm

They don't look all that special to me.

If you can create a list of the characters to remove, you can use the Convert function documented here if you scroll down a bit. Convert them all to "" to simply remove them. If you'd rather build a list of the characters to keep then do an exact search here for "double convert", a useful technique based on what to keep rather than what you want to remove.

chulett · Post by **chulett** » Sat Nov 11, 2017 4:56 pm

Well, crap... just dawned on me I missed the "leading and trailing" part of this.

Seems to me you may just need to substring out the middle... use Index to find the first and last "non-special" character in the string and cut that chunk out of the middle.

I'm also curious if "^" is a single character or if it takes on the meaning of "control" here? Meaning is "^&" one character or two characters? I'm assuming two but thought it prudent to confirm.

qt_ky · Post by **qt_ky** » Mon Nov 13, 2017 8:56 am

You could use multiple nested Trim() functions--one for each special character. Please consult the product documentation on the Trim() function to specify options for leading and trailing characters.

rahuljha26 · Post by **rahuljha26** » Tue Nov 14, 2017 3:58 am

Thanks craig for replying the post...

can you elaborate more how to find the first and last "non-special" character in the string using the Index function.
Any example will be helpful for this.

i tried to use the index function. i am mentioning the below code.

Index(Trim(columnname),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',1)

output :0

Your comment will be highly appreciable for the above.

Thanks,
Rahul

rahuljha26 · Post by **rahuljha26** » Tue Nov 14, 2017 4:03 am

Thanks qt_ky for replying the post .

can we have other method to do the above scenario. Instead of multiple nested Trim() function.

sriven786 · Post by **sriven786** » Tue Nov 14, 2017 9:41 am

Is your Delimiter: ^&

Please clarify.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy

Tried with Convert as below
Convert(char(10):char(11):char(35):char(36):char(33):char(64):char(37):char(42):,' ','!@##$$Rami^&Reddy*&^%$')

!@##$$Rami^&Reddy*&^%$ converted as Rami^&Reddy&^ (As we are not replacing these 2 special Characters)

chulett · Post by **chulett** » Tue Nov 14, 2017 10:05 am

If that reply was directed my way, I didn't ask anything about a delimiter... but it looks like you answered my question indirectly. And that's not how one would use Index for this, you ask it to look for that specific string of 52 characters, not any single one of them. Having said that, I don't think my suggestion of using that function was exactly spot on so I'll have to ponder this a bit today to see what else comes to mind.

rahuljha26 · Post by **rahuljha26** » Tue Nov 14, 2017 12:30 pm

Thanks Srini for ur reply...

No srini this is not delimeter &^.We don't have any delimeter in the input source. As i mentioned in the example that we will have input data like and we don't know the input data.we need to remove the special characters from leading and trailing.we are not removing any special characters in between alphabet characters.

e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should only like this: Rami^&Reddy

sriven786 · Post by **sriven786** » Tue Nov 14, 2017 2:27 pm

Thanks for Clarifying the requirement.

I was trying to Convert all Alphabets to some Value and and after that find the First Occurrence of that

But looks like the CONVERT Function is not working as Expected

Input String: !@##$$Rami^&Reddy*&^%$'
Derivation:
convert(CHAR(82):CHAR(97):CHAR(101):CHAR(100):CHAR(121):CHAR(105):CHAR(109),CHAR(49),'!@##$$Rami^&Reddy*&^%$')

Expecting: !@##$$1111^&11111*&^%$'
Actual Output: !@##$$1^&1*&^%$

Looks like it's converting only the First CHAR(82) (R to 1) and ignoring all others

Tried as convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','1','!@##$$Rami^&Reddy*&^%$')
but this is also resulting as !@##$$^&*&^%$
in this case, it just drops all Alphabets from the String

rahuljha26 · Post by **rahuljha26** » Tue Nov 14, 2017 2:56 pm

Thanks Srini...

yeah i did also the same thing and my output is also coming like the above output. i am wondering why it is not replacing the alphabet with 1.

chulett · Post by **chulett** » Tue Nov 14, 2017 3:29 pm

You need to look at the description of the function again. It takes all characters in the "from" list and replaces each with the corresponding (positional) character in the "to" list. In other words, to replace them all with a "1" you would need the same number of "1"s in the second list.

Was just coming here to say that in all honesty I'm thinking that your best option here is to write something in C++ and that I'm a bit surprised that UCDI hasn't already come along and said something along those lines.

rahuljha26 · Post by **rahuljha26** » Tue Nov 14, 2017 3:55 pm

We don't know in which position we have which special characters.so multiple nested Trim() function also we can't use.

FranklinE · Post by **FranklinE** » Wed Nov 15, 2017 8:56 am

You have a logical paradox in your requirements: certain characters are to be removed, except when they are not to be removed. The difference is positional, which is why it's a paradox.

Why must you preserve the characters on the inside? Is this a hard requirement (like an audit trail), or is it arbitrary?

I suggest you move up a step, and identify the purpose of the data. In your example, you have a person's name. If the data is a name, there should be no reason to preserve the characters between the first and last name. You should be able to just identify all invalid characters, replace them with Space(), do a final Trim to remove leading and trailing and reduce the internal spaces to one.

It looks to me like you have bad, messy data (usually because users are lazy), and you are being told to clean it up by the users who don't know how to properly construct a requirement. You may not have any control over the users, but sometimes they just have to be told "no, you can't have it this way."

UCDI · Post by **UCDI** » Wed Nov 15, 2017 12:09 pm

I'm with him ^^^

but if it turns out that you need some awful convoluted logic to do this, write a C or basic routine. Trying to drop a 15+ deep nested replace with if-thens in the middle is going to be unpossible to debug and even if you get it working it will be extra challenging to upgrade or modify it later.