XML Error Handling on incomplete file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

XML Error Handling on incomplete file

Post by FranklinE »

We have a situation which we solved on the file transfer side, but the behavior of the parallel job which processes the xml file leaves me confused. I've checked the manual, and the wording of the description of the XML Input General tab is rather confusing.

EDIT: I forgot to add that due to auto-purge I don't have the Director messages from the cycle in which the file was incomplete. :(

The file's transfer was interrupted mid-flight, resulting in an incomplete data string and no closing tag.

The first two stages are External Source and XML Input, both from the Px palette.

The job finishes okay, but no rows are processed.

Before I start experimenting, I'm looking for advice about what to expect. Is there a setting on either stage that will allow me to force the job to abort when there is one or more missing closing tag? We are already looking into using Unix scripting, but I like to have options.

Explanation of Additional Info: Ernie Ostic helped me get started on using the XML stages, and the job in question was the result, developed under 8x and migrated to 11x without remedial coding. :D
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

My advice: Craft a better handshaking strategy. Your XML parsing job should never have been kicked off if the file transfer was incomplete.

bad/incomplete data is worse than no data.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Yes, Paul, that's what we did. The file watcher was finding the file handle before the FTP completed. We fixed that.

I'm looking at this from a "generic" point of view. Every other file process I use gives me low-level control over the status of the file. XML processing is still new to me, in that I only have two jobs (out of 350) that use XML formats. I'm looking at that proportion to grow, and I like to use single-point solutions. Call me spoiled, you'd be accurate. :lol:

EDIT: It was a freak coincidence. The FTP itself was delayed a few minutes, and the file watcher was on a ten-minute cycle. Murphy's Law applies.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

I have found the keep it local approach to be the best for that.
So what we do is drop the file to an intermediate folder from the network then do a unix move command once that is complete to the correct folder, where the detection can happen safely. If it is all on the same disk, you should not be able to catch a partial file as the file moves on the same disk do not move the data on the disk, they just re-arrange its entry in the file system.

you can do it many other ways -- 'we done yet' secondary files, or flip one of the obsolete or unused bits on the file (chmod), actual process communications, and more. The above is just one overly simple yet effective way.

you are new to xml so FYI, if you have bandwidth issues sending huge xml files around you can use bzip2 on them. That algorithm does magic to xml.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Thanks for the tips on moving files, but that is not my question. We've corrected the cause of the incomplete file, I'm looking for advice on how to handle incomplete xml files in DataStage.

Murphy's Law suggested corollary: you find one reason for a failure, and two more appear. I cannot code for every contingency, nor is that anything like best practice. I want my job to abort, and I'm looking for ways to code for it.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

In my experience, the xmlInput Stage will check for well-formed-ness successfully with its various options for validation and and reject...including missing end tags and such. It may depend on the kind of error, but it certainly checks many kinds of things (actually, it isn't doing the checking itself, it is calling the open source xerces and xalan xml processors to do the validation and parsing work. You should be able to nicely detect "most" problems and send the content and the reason down the reject link based on the pull down values...or let the Job abort (the fatal options). There may be some kinds of bad xml that will fail this.....

Consider also, passing the xml into a hierarchical stage, maybe not for parsing, but for and even stronger syntax check...the xmlParser Step has a deeper ability to review the structure, but it does require an xsd (but you can generate one of those using various open source solutions).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Thanks, Ernie. I'd hoped to get away with minimal coding changes, but the hierarchical stage, being new to our design, looks like a promising approach.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I am surprised that xmlInput isn't able to capture a missing end tag....it may depend on your particular document and location of the tag.....?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

The job was migrated 8.7 to 11.5. I paid it no further attention after the successful compile and regression testing. After be careful what you wish for is be careful what you don't know. :wink:
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Transformation error mappings current settings:
Fatal -- reject
Error -- reject
Warning -- warning

Looks like it was right in front of me, the usual place for things to successfully hide from me. I'll play with Fatal and Error to come up with what fulfills my requirements.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Yeah...it's actually a pretty cool feature. You don't even need to bother with an xsd or formal validation....not-well-formed will kick out most of the time. I like to choose the reject option, and then have a big giant column on the user-named reject link, and then send that and the reason to a sequential file and then deal with it there.......

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Marking this resolved, as it isn't an open issue here at this point. I just need to be prepared for future repeats of the incident.

Thanks to all who responded.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply