Removing special characters,spaces,nulls in multiple columns
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 89
- Joined: Sat Jul 24, 2010 11:33 pm
Removing special characters,spaces,nulls in multiple columns
Hi,
I have flat files , transformed and moved to staging.
As output files are moved from datastage server to staging folder. when we are trying to load the data we are seeing special characters.
For this we have used pad char environment variable and trim d but we have atleast 10 columns .
My question is that is there any simple way to remove the special characters in datastage using sed or awk or shell script.
I have flat files , transformed and moved to staging.
As output files are moved from datastage server to staging folder. when we are trying to load the data we are seeing special characters.
For this we have used pad char environment variable and trim d but we have atleast 10 columns .
My question is that is there any simple way to remove the special characters in datastage using sed or awk or shell script.
-
- Participant
- Posts: 89
- Joined: Sat Jul 24, 2010 11:33 pm
There are really many ways to do this, but I am not certain that convert would be the best method. You could -
- Write a C routine to remove unprintable characters
Write a sed/awk command that will do the same and invoke via a sequencer or during sourcing when using a sequential file
You could import your sequential file into one column by using a delimiter not present in any row and then operate on that one column and then write it out using the correct metadata.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Participant
- Posts: 89
- Joined: Sat Jul 24, 2010 11:33 pm
Right, but is it 1 or 2 special characters that keep showing up or are they random?
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Participant
- Posts: 89
- Joined: Sat Jul 24, 2010 11:33 pm
Maybe something like -
or the tr command will work, awk as well, but you will likely have to call a sub process
Again, there are a lot of ways to do this - this is just one of many.
Code: Select all
sed "s/[^a-z|0-9]//g;" file1 > file2
Again, there are a lot of ways to do this - this is just one of many.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 89
- Joined: Sat Jul 24, 2010 11:33 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 89
- Joined: Sat Jul 24, 2010 11:33 pm
Hi Ray,
Iam not experienced like to discuss more on that. Iam sorry if the terms used by me caused any concerns or my question is confusing. Iam new to the datawarehousing so i do not know what terms to be used in this forum so that it wont raises more questions while answering it. Please let me know if you have any list so that I should not follow in this forum.
Iam not experienced like to discuss more on that. Iam sorry if the terms used by me caused any concerns or my question is confusing. Iam new to the datawarehousing so i do not know what terms to be used in this forum so that it wont raises more questions while answering it. Please let me know if you have any list so that I should not follow in this forum.
Well, characters like that have been called "special" since I started working in IT back in 1984 so I know what you mean and I doubt there is anyone who frequents this forum that did not understand what you meant.
You can still use the sed command I gave above and that will work with some tweaks. If you do a google search on removing/replacing "special" or unprintable characters there will be pages of examples which remove printable as well as unprintable.
The characters you want to remove are special
You can still use the sed command I gave above and that will work with some tweaks. If you do a google search on removing/replacing "special" or unprintable characters there will be pages of examples which remove printable as well as unprintable.
The characters you want to remove are special
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
1984 huh. I was still carrying around a sippi cup.mhester wrote:Well, characters like that have been called "special" since I started working in IT back in 1984
So many special character, that too in every column is weird. I havent been in IT for as long as Mike but even I can say that you need to investigate the origin of this data.
Nevertheless, you can use the CONVERT() function to remove these characters.
In some fields these characters maybe ok, like an email address or comment fields so use a scalpel approach rather than a machete.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.