DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
This topic has been marked "Resolved."
Author Message
emaher
Participant



Joined: 14 Apr 2014
Posts: 2
Location: Seattle
Points: 45

Post Posted: Thu Aug 22, 2019 5:36 pm Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
I have a .csv file exported from our cisco phone system.
It uses semicolon as a delimiter for the columns and appears to have a semicolon after the last column as well as a LF character.

The file appears to have the following escaped (using a backslash): null/empty columns, semicolons & line-feeds that appear in the data.

My parallel job is structured as:
[sequential file stage] -> [transformer stage] -> [netezza connector stage]

My import process spits out warnings (resulting in skipped rows) whenever it sees either a semicolon or a LF within a string value. Could be that one or the other is failing it, not really sure.

I have attempted various iterations of convert and ereplace within the transformer stage and each time I see the following warnings and the problem records are skipped.

- Delimiter for field "field name" not found: input: {some data from the row}, at offset: x
- Field "field name" has import error and no default value: data: {some data from the row}, at offset x

The rest of the data will load- just not rows that are broken due to embedded delimiter or LF. (all broken rows appear to have both occurrences as represented below)

I have tried using a server job instead of a parallel job along with the "Contains terminators" set to 'Yes' but this fails (no rows get loaded).

It seems as if I should be able to tell DataStage to ignore characters immediately following the backslash, but I cannot figure out where/how to do this.

Code:

 1 78000274787.0;0;1;1;2;2;\ ;1;89334;\ ;3;-1;+12025550101;2019-08-21 19:40:41.541;...
 2 78000274769.0;1;1;1;1;4;er failures\; nested exception is: \$
 3    5589 ^Icom.cisco.wfapi.WFExecutionException: Too many transfer failures;3;-1;2025550101;
 4 78000274788.0;0;1;1;1;1;\ ;3;-1;2025550101;2;-1;29748;2019-08-21 19:40:45.863;
 5 78000274790.0;0;1;1;3;1;\ ;1;87574;\ ;2;-1;29830;2019-08-21 19:40:57.714;2019-08-21 19:46:12.265;


Above is a sample of a few records:
Line 1: Good data, imports fine
Line 2: there is an escaped semicolon (after the word 'failures') and an escaped LF at the end
Line3: Is the continuation of the data that was pushed to the next line due to the LF in the row above.
Line 4: Good data, imports fine
Line 5: Good data, imports fine

This represents what the line-endings look like, in case that's important. Note the '$' which is the LF representation when viewing with Vim.
Code:

1 ... ;\ ;N;f;f;f;f;f;-1;\ ;000000000068790706EE000A00000000;0;-1;$
2 78000274769.0;1;1;1;1;4;er failures\; nested exception is: \$
3 ... ;\ ;N;f;f;f;f;f;-1;0000000000060FBC0214926200000000;\ ;0;-1;$
4 ... ;\ ;N;f;f;f;f;f;-1;0000000000060F960214927200000000;\ ;0;-1;$
5 ... ;\ ;N;f;f;f;f;f;-1;000000000068790D06EE004100000000;000000000068790D06EE003F00000000;0;-1;$



As a last resort I am thinking of running the file through a Sed pre-process, but I keep thinking that DS can handle this simple task.

Thank you in advance.
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 43011
Location: Denver, CO
Points: 221947

Post Posted: Sat Aug 24, 2019 8:52 am Reply with quote    Back to top    

First off, thanks for mentioning that you tried a Server job. This issue has been talked about for years here and that is always the "go to" solution that will get trotted out. I've taken that route back in the day and generally it works just fine. Confused

And technically, it really isn't all that simple. Any process consuming a sequential file on UNIX "knows" that a linefeed is the end of a record, even Vim as you can see. I also don't believe they can escaped but would have to spend some time researching that. If nothing comes of that research (or your post here) perhaps open a case with your official support provider?

Your best defense against something like this - if at all possible - is getting the source system to stop allowing record or field terminators in the data before it is exported to flat file. That is the "simple task" since the problem (and solution) is obvious at that point. Wink Of course, that may not be any kind of possible - meaning the much more difficult burden of identifying them falls on you. And the transformer is too late to handle anything like this.

I believe that you'll find and perhaps others who have dealt with this will chime in, is to work out your last resort using something like awk or sed or perl or whatever your weapon of choice is there. If you can put together something that will read the file and send to standard output clean records, you can use that in the Filter option of the Sequential File stage so you don't need any kind of separate pre-processing step.

Good luck!

_________________
-craig

The muffin man is seated at the table in the laboratory of the Utility Muffin Research Kitchen reaching for an oversized chrome spoon
Rate this response:  
Not yet rated
emaher
Participant



Joined: 14 Apr 2014
Posts: 2
Location: Seattle
Points: 45

Post Posted: Fri Aug 30, 2019 10:17 am Reply with quote    Back to top    

Thank you for responding.
Unfortunately, and like many seem to have experienced, I am not able to affect change in terms of how the file is output and simply have to work with what I get.
I ended up going down the path of using a sed command and was able to add it to the Filter property of the Sequential File Stage. This was pretty handy as I was not looking forward to adding the extra overhead of creating a custom script and dealing with extra file operations.
The escape character within the data made this a fairly straight forward command.
Again, thank you for your input.
Eric
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours