Page 1 of 1

XML Error Handling on incomplete file

Posted: Fri Sep 29, 2017 7:27 am
by FranklinE
We have a situation which we solved on the file transfer side, but the behavior of the parallel job which processes the xml file leaves me confused. I've checked the manual, and the wording of the description of the XML Input General tab is rather confusing.

EDIT: I forgot to add that due to auto-purge I don't have the Director messages from the cycle in which the file was incomplete. :(

The file's transfer was interrupted mid-flight, resulting in an incomplete data string and no closing tag.

The first two stages are External Source and XML Input, both from the Px palette.

The job finishes okay, but no rows are processed.

Before I start experimenting, I'm looking for advice about what to expect. Is there a setting on either stage that will allow me to force the job to abort when there is one or more missing closing tag? We are already looking into using Unix scripting, but I like to have options.

Explanation of Additional Info: Ernie Ostic helped me get started on using the XML stages, and the job in question was the result, developed under 8x and migrated to 11x without remedial coding. :D

Posted: Fri Sep 29, 2017 8:02 am
by PaulVL
My advice: Craft a better handshaking strategy. Your XML parsing job should never have been kicked off if the file transfer was incomplete.

bad/incomplete data is worse than no data.

Posted: Fri Sep 29, 2017 8:06 am
by FranklinE
Yes, Paul, that's what we did. The file watcher was finding the file handle before the FTP completed. We fixed that.

I'm looking at this from a "generic" point of view. Every other file process I use gives me low-level control over the status of the file. XML processing is still new to me, in that I only have two jobs (out of 350) that use XML formats. I'm looking at that proportion to grow, and I like to use single-point solutions. Call me spoiled, you'd be accurate. :lol:

EDIT: It was a freak coincidence. The FTP itself was delayed a few minutes, and the file watcher was on a ten-minute cycle. Murphy's Law applies.

Posted: Fri Sep 29, 2017 9:08 am
by UCDI
I have found the keep it local approach to be the best for that.
So what we do is drop the file to an intermediate folder from the network then do a unix move command once that is complete to the correct folder, where the detection can happen safely. If it is all on the same disk, you should not be able to catch a partial file as the file moves on the same disk do not move the data on the disk, they just re-arrange its entry in the file system.

you can do it many other ways -- 'we done yet' secondary files, or flip one of the obsolete or unused bits on the file (chmod), actual process communications, and more. The above is just one overly simple yet effective way.

you are new to xml so FYI, if you have bandwidth issues sending huge xml files around you can use bzip2 on them. That algorithm does magic to xml.

Posted: Fri Sep 29, 2017 9:24 am
by FranklinE
Thanks for the tips on moving files, but that is not my question. We've corrected the cause of the incomplete file, I'm looking for advice on how to handle incomplete xml files in DataStage.

Murphy's Law suggested corollary: you find one reason for a failure, and two more appear. I cannot code for every contingency, nor is that anything like best practice. I want my job to abort, and I'm looking for ways to code for it.

Posted: Mon Oct 02, 2017 4:12 am
by eostic
In my experience, the xmlInput Stage will check for well-formed-ness successfully with its various options for validation and and reject...including missing end tags and such. It may depend on the kind of error, but it certainly checks many kinds of things (actually, it isn't doing the checking itself, it is calling the open source xerces and xalan xml processors to do the validation and parsing work. You should be able to nicely detect "most" problems and send the content and the reason down the reject link based on the pull down values...or let the Job abort (the fatal options). There may be some kinds of bad xml that will fail this.....

Consider also, passing the xml into a hierarchical stage, maybe not for parsing, but for and even stronger syntax check...the xmlParser Step has a deeper ability to review the structure, but it does require an xsd (but you can generate one of those using various open source solutions).

Ernie

Posted: Mon Oct 02, 2017 8:31 am
by FranklinE
Thanks, Ernie. I'd hoped to get away with minimal coding changes, but the hierarchical stage, being new to our design, looks like a promising approach.

Posted: Mon Oct 02, 2017 11:26 am
by eostic
I am surprised that xmlInput isn't able to capture a missing end tag....it may depend on your particular document and location of the tag.....?

Ernie

Posted: Mon Oct 02, 2017 11:41 am
by FranklinE
The job was migrated 8.7 to 11.5. I paid it no further attention after the successful compile and regression testing. After be careful what you wish for is be careful what you don't know. :wink:

Posted: Mon Oct 02, 2017 11:46 am
by FranklinE
Transformation error mappings current settings:
Fatal -- reject
Error -- reject
Warning -- warning

Looks like it was right in front of me, the usual place for things to successfully hide from me. I'll play with Fatal and Error to come up with what fulfills my requirements.

Posted: Mon Oct 02, 2017 3:56 pm
by eostic
Yeah...it's actually a pretty cool feature. You don't even need to bother with an xsd or formal validation....not-well-formed will kick out most of the time. I like to choose the reject option, and then have a big giant column on the user-named reject link, and then send that and the reason to a sequential file and then deal with it there.......

Ernie

Posted: Mon Oct 09, 2017 8:29 am
by FranklinE
Marking this resolved, as it isn't an open issue here at this point. I just need to be prepared for future repeats of the incident.

Thanks to all who responded.