DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
This topic has been marked "Resolved."
Author Message
Vrisha



Group memberships:
Premium Members

Joined: 15 Jul 2017
Posts: 55
Location: Texas,USA
Points: 655

Post Posted: Wed Sep 06, 2017 12:21 pm Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Windows
Hi,

I have a requirement to remove multiple ascii characters from the input data.
Eg- input data has below values

column name - Description
==================
Paratransit¿Web-Based¿Trip¿Boo
TRE DFW ROW Maintenance Tie Re

Input data length - char (30)
Output data length - Varchar(30)

I am getting an error 'A data conversion error was encountered in bulk load mode for row '


1)Transformation rule - Convert(char(160):char(168):char(194),'',TrimB(TrimF(NullToEmpty(In_ACACTIVITY.DESCRIPTION)))- error as above

2) CONVERT(CHAR(194), "", CONVERT(CHAR(191), "", Convert(char(160),'',TrimB(TrimF(NullToEmpty(In_ACACTIVITY.DESCRIPTION))))))-- error as above

3) When I changed the datalength to 40 everything got loaded.But not allowed to do so.

4)I changed the bulk load to write mode as 'Insert' and table action to
'Create' in target oracle stage and getting error of 'Error code: 12899, Error message: ORA-12899: value too large for column "MSPO_TEST"."DESC_PROJECTS2"."DESCRIPTION" (actual: 34, maximum: 30).


How should I remove the multiple ascii characters from the input data? I am able to remove one ascii character char(160) using the Ereplace function (which I posted in the forum).

Please let me know what is the mistake I am doing. Thanks.

_________________
Suja
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42765
Location: Denver, CO
Points: 220367

Post Posted: Wed Sep 06, 2017 12:29 pm Reply with quote    Back to top    

Perhaps you should look into why they are like that and see what needs to be done to correctly 'translate' them rather than simply removing them... just a thought.

_________________
-craig

Research shows that 6 out of 7 dwarves aren't happy
Rate this response:  
Vrisha



Group memberships:
Premium Members

Joined: 15 Jul 2017
Posts: 55
Location: Texas,USA
Points: 655

Post Posted: Wed Sep 06, 2017 12:50 pm Reply with quote    Back to top    

Hi Craig,

Source application team can't provide quality data. They told to remove the ascii characters using datastage and load into target.

In normal source oracle database, we can't see ascii characters. So using the export wizard, I uploaded into csv file and through eyecheck I found char values of 160,191,194 in the input.

I don't know how to write a logic to remove / translate them apart from what I tried above.

_________________
Suja
Rate this response:  
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 336

Points: 3396

Post Posted: Wed Sep 06, 2017 1:50 pm Reply with quote    Back to top    

I found this to be extra painful in datastage and just did it in C.

make a lookup table like this

static const unsigned char pt[256] = {
' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ','!','"','#','$','%','&','\'','(',
')','*','+',',','-','.','/','0','1','2','3','4','5','6','7','8','9',':',';','<',
'=','>','?','@','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P',
'Q','R','S','T','U','V','W','X','Y','Z','[','\\',']','^','_','`','a','b','c','d',
'e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x',
'y','z','{','|','}','~',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',
' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' '
};

and then just loop over the string
newstring[i] = pt[oldstring[i]];


and then do an internal and external trim on space.
don't forget to fix the end of string : newstring[max] = 0;
Rate this response:  
Vrisha



Group memberships:
Premium Members

Joined: 15 Jul 2017
Posts: 55
Location: Texas,USA
Points: 655

Post Posted: Thu Sep 07, 2017 6:30 am Reply with quote    Back to top    

Thank you, UCDI.

Your solution is a bit confusing for me. I want to use ereplace or convert to remove multiple ascii characters.

I don't know what mistake I am doing in the syntax of Convert for handling multiple ascii characters

_________________
Suja
Rate this response:  
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 336

Points: 3396

Post Posted: Thu Sep 07, 2017 9:11 am Reply with quote    Back to top    

ereplace is horrible for this. you have to string like 100 of them together, its slow and the syntax is absolutely insane. It looks like
ereplace(ereplace(ereplace(ereplace .... which loops over the same string over 100 times replacing 1 bad character at a time in each loop.

my solution is to use c or basic or whatever to do it all at once in 1 loop.

If you are 100% sure you only have 3 bad characters and will only ever in the future have those 3 characters, e-replace should work when chained like this.

But you said you only spot checked and your source is unreliable, so I advise getting ALL the nonstandard characters out at once.

My lookup table replace approach kills everything that is not a normal character, not just the 3 you have found so far.

it is simple code. You make a lookup table of the entire ascii table (0-255) and replace the string data value with the lookup table value which is space for all junk characters and no change for the usable values. Then you can clean up extra spaces in datastage or the C whichever you prefer.

I can give you the basic code instead if you prefer, which you can use directly in a basic_transformer instead of a regular transformer. Its a little slower, but talking seconds per millions of records, hardly worth a mention in the grand scheme.
Rate this response:  
Vrisha



Group memberships:
Premium Members

Joined: 15 Jul 2017
Posts: 55
Location: Texas,USA
Points: 655

Post Posted: Thu Sep 07, 2017 10:14 am Reply with quote    Back to top    

The problem is resolved

I checked Ray's post ( Line Breaks in the flat files ) and find the clue. It worked.

Steps
1)Created 3 stage variables
svChar160 - Char(160,@TRUE)
svChar191 - Char(191,@TRUE)
svChar194 - Char(194,@TRUE)

2) Transformation rule in transformer
Convert(svChar160:svChar191:svChar194,' ',TrimB(TrimF(NullToEmpty(In_ACACTIVITY.DESCRIPTION))))

Source data - Paratransit¿Web-Based¿Trip¿Boo

Target data - ParatransitWeb-BasedTripBoo

Thank you, Ray for your valuable post.

_________________
Suja
Rate this response:  
Vrisha



Group memberships:
Premium Members

Joined: 15 Jul 2017
Posts: 55
Location: Texas,USA
Points: 655

Post Posted: Thu Sep 07, 2017 1:06 pm Reply with quote    Back to top    

Thank you UCDI for your valuable suggestion.

_________________
Suja
Rate this response:  
BSridhar
Participant



Joined: 25 May 2018
Posts: 4

Points: 19

Post Posted: Fri May 25, 2018 11:00 am Reply with quote    Back to top    

Char and convert are not supporting for the characters whose ASCII is greater than 256. What to do in this case?

Thanks
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54398
Location: Sydney, Australia
Points: 295054

Post Posted: Sat May 26, 2018 4:01 am Reply with quote    Back to top    

UniChar() for Unicode data. ASCII stops at 255.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
BSridhar
Participant



Joined: 25 May 2018
Posts: 4

Points: 19

Post Posted: Tue May 29, 2018 8:32 am Reply with quote    Back to top    

Hi Ray,

I couldn't find UniChar in version 11.5. Could you please guide me to find? I am sorry.

Thank You
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54398
Location: Sydney, Australia
Points: 295054

Post Posted: Tue May 29, 2018 5:09 pm Reply with quote    Back to top    

That suggests that you don't have NLS enabled, in which case you can't process anything other than ASCII.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 336

Points: 3396

Post Posted: Wed May 30, 2018 11:17 am Reply with quote    Back to top    

BSridhar wrote:
Char and convert are not supporting for the characters whose ASCII is greater than 256. What to do in this case?

not sure about datastage but 'most' software if told a 'text' file is in 1-byte code (ascii or one-off encodings) it will read unicode letters as a group of multiple ascii symbols.

For english and generally western languages, most of the unicode becomes some sort of space / blank symbol and the ascii letter, so hello looks like h e l l o and so on. However any truly non-english or non-ascii codes will make a mess and any code that attempts to clean it up will damage the actual data.

so while it is 'sort of' safe to hack around on 'american english text' that happens to be stored in unicode as if it were ascii, you are playing with fire to do this. If you are using unicode, use unicode for everything. If you are using ascii, you can promote to unicode if needed or stay in ascii. Avoid mixing them as much as possible. And there are other encodings from older systems that you have to use sometimes too, mostly 1 byte but NOT ascii.

Ill do a little of the hacking but not in datastage -- I have some code that can read dataset headers which are apparently in unicode for the internal textual parts... Smile
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours