Remove multiple special characters from leading and trailing
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 7
- Joined: Wed Sep 20, 2017 4:15 am
Remove multiple special characters from leading and trailing
Remove multiple special characters from leading and trailing of a string in datastage.
e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy
could you please suggest how to do of the above scenario.
e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy
could you please suggest how to do of the above scenario.
Rahul Jha
They don't look all that special to me.
If you can create a list of the characters to remove, you can use the Convert function documented here if you scroll down a bit. Convert them all to "" to simply remove them. If you'd rather build a list of the characters to keep then do an exact search here for "double convert", a useful technique based on what to keep rather than what you want to remove.
If you can create a list of the characters to remove, you can use the Convert function documented here if you scroll down a bit. Convert them all to "" to simply remove them. If you'd rather build a list of the characters to keep then do an exact search here for "double convert", a useful technique based on what to keep rather than what you want to remove.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Well, crap... just dawned on me I missed the "leading and trailing" part of this. Seems to me you may just need to substring out the middle... use Index to find the first and last "non-special" character in the string and cut that chunk out of the middle.
I'm also curious if "^" is a single character or if it takes on the meaning of "control" here? Meaning is "^&" one character or two characters? I'm assuming two but thought it prudent to confirm.
I'm also curious if "^" is a single character or if it takes on the meaning of "control" here? Meaning is "^&" one character or two characters? I'm assuming two but thought it prudent to confirm.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 7
- Joined: Wed Sep 20, 2017 4:15 am
Thanks craig for replying the post...
can you elaborate more how to find the first and last "non-special" character in the string using the Index function.
Any example will be helpful for this.
i tried to use the index function. i am mentioning the below code.
Index(Trim(columnname),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',1)
output :0
Your comment will be highly appreciable for the above.
Thanks,
Rahul
can you elaborate more how to find the first and last "non-special" character in the string using the Index function.
Any example will be helpful for this.
i tried to use the index function. i am mentioning the below code.
Index(Trim(columnname),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',1)
output :0
Your comment will be highly appreciable for the above.
Thanks,
Rahul
Rahul Jha
-
- Participant
- Posts: 7
- Joined: Wed Sep 20, 2017 4:15 am
Is your Delimiter: ^&
Please clarify.
e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy
Tried with Convert as below
Convert(char(10):char(11):char(35):char(36):char(33):char(64):char(37):char(42):,' ','!@##$$Rami^&Reddy*&^%$')
!@##$$Rami^&Reddy*&^%$ converted as Rami^&Reddy&^ (As we are not replacing these 2 special Characters)
Please clarify.
e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should: Rami^&Reddy
Tried with Convert as below
Convert(char(10):char(11):char(35):char(36):char(33):char(64):char(37):char(42):,' ','!@##$$Rami^&Reddy*&^%$')
!@##$$Rami^&Reddy*&^%$ converted as Rami^&Reddy&^ (As we are not replacing these 2 special Characters)
Venkata Srini
If that reply was directed my way, I didn't ask anything about a delimiter... but it looks like you answered my question indirectly. And that's not how one would use Index for this, you ask it to look for that specific string of 52 characters, not any single one of them. Having said that, I don't think my suggestion of using that function was exactly spot on so I'll have to ponder this a bit today to see what else comes to mind.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 7
- Joined: Wed Sep 20, 2017 4:15 am
Thanks Srini for ur reply...
No srini this is not delimeter &^.We don't have any delimeter in the input source. As i mentioned in the example that we will have input data like and we don't know the input data.we need to remove the special characters from leading and trailing.we are not removing any special characters in between alphabet characters.
e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should only like this: Rami^&Reddy
No srini this is not delimeter &^.We don't have any delimeter in the input source. As i mentioned in the example that we will have input data like and we don't know the input data.we need to remove the special characters from leading and trailing.we are not removing any special characters in between alphabet characters.
e.g.: Input Data like: !@##$$Rami^&Reddy*&^%$
Output Data Should only like this: Rami^&Reddy
Rahul Jha
Thanks for Clarifying the requirement.
I was trying to Convert all Alphabets to some Value and and after that find the First Occurrence of that
But looks like the CONVERT Function is not working as Expected
Input String: !@##$$Rami^&Reddy*&^%$'
Derivation:
convert(CHAR(82):CHAR(97):CHAR(101):CHAR(100):CHAR(121):CHAR(105):CHAR(109),CHAR(49),'!@##$$Rami^&Reddy*&^%$')
Expecting: !@##$$1111^&11111*&^%$'
Actual Output: !@##$$1^&1*&^%$
Looks like it's converting only the First CHAR(82) (R to 1) and ignoring all others
Tried as convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','1','!@##$$Rami^&Reddy*&^%$')
but this is also resulting as !@##$$^&*&^%$
in this case, it just drops all Alphabets from the String
I was trying to Convert all Alphabets to some Value and and after that find the First Occurrence of that
But looks like the CONVERT Function is not working as Expected
Input String: !@##$$Rami^&Reddy*&^%$'
Derivation:
convert(CHAR(82):CHAR(97):CHAR(101):CHAR(100):CHAR(121):CHAR(105):CHAR(109),CHAR(49),'!@##$$Rami^&Reddy*&^%$')
Expecting: !@##$$1111^&11111*&^%$'
Actual Output: !@##$$1^&1*&^%$
Looks like it's converting only the First CHAR(82) (R to 1) and ignoring all others
Tried as convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','1','!@##$$Rami^&Reddy*&^%$')
but this is also resulting as !@##$$^&*&^%$
in this case, it just drops all Alphabets from the String
Venkata Srini
-
- Participant
- Posts: 7
- Joined: Wed Sep 20, 2017 4:15 am
You need to look at the description of the function again. It takes all characters in the "from" list and replaces each with the corresponding (positional) character in the "to" list. In other words, to replace them all with a "1" you would need the same number of "1"s in the second list.
Was just coming here to say that in all honesty I'm thinking that your best option here is to write something in C++ and that I'm a bit surprised that UCDI hasn't already come along and said something along those lines.
Was just coming here to say that in all honesty I'm thinking that your best option here is to write something in C++ and that I'm a bit surprised that UCDI hasn't already come along and said something along those lines.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 7
- Joined: Wed Sep 20, 2017 4:15 am
You have a logical paradox in your requirements: certain characters are to be removed, except when they are not to be removed. The difference is positional, which is why it's a paradox.
Why must you preserve the characters on the inside? Is this a hard requirement (like an audit trail), or is it arbitrary?
I suggest you move up a step, and identify the purpose of the data. In your example, you have a person's name. If the data is a name, there should be no reason to preserve the characters between the first and last name. You should be able to just identify all invalid characters, replace them with Space(), do a final Trim to remove leading and trailing and reduce the internal spaces to one.
It looks to me like you have bad, messy data (usually because users are lazy), and you are being told to clean it up by the users who don't know how to properly construct a requirement. You may not have any control over the users, but sometimes they just have to be told "no, you can't have it this way."
Why must you preserve the characters on the inside? Is this a hard requirement (like an audit trail), or is it arbitrary?
I suggest you move up a step, and identify the purpose of the data. In your example, you have a person's name. If the data is a name, there should be no reason to preserve the characters between the first and last name. You should be able to just identify all invalid characters, replace them with Space(), do a final Trim to remove leading and trailing and reduce the internal spaces to one.
It looks to me like you have bad, messy data (usually because users are lazy), and you are being told to clean it up by the users who don't know how to properly construct a requirement. You may not have any control over the users, but sometimes they just have to be told "no, you can't have it this way."
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
I'm with him ^^^
but if it turns out that you need some awful convoluted logic to do this, write a C or basic routine. Trying to drop a 15+ deep nested replace with if-thens in the middle is going to be unpossible to debug and even if you get it working it will be extra challenging to upgrade or modify it later.
but if it turns out that you need some awful convoluted logic to do this, write a C or basic routine. Trying to drop a 15+ deep nested replace with if-thens in the middle is going to be unpossible to debug and even if you get it working it will be extra challenging to upgrade or modify it later.