Having issues reading XML files with line breaks

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
cosec
Premium Member
Premium Member
Posts: 230
Joined: Tue May 08, 2007 8:10 pm

Having issues reading XML files with line breaks

Post by cosec »

Hi All,

I am trying to read the contents of an XML file in to a Text File but encounter error when there is a line break in the XML file.

Job Design
XML Source File -> XML Input Stage -> Transformer -> Text File

The job works fine when there is no line break within the XML fields.

However, when there is a line break in one of the fields I encounter error as follows(I have indicated that the column can have terminators):
XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 226): Invalid character (Unicode: 0x0)

XML Source File structure example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><TBL_A><FIELD1>513</FIELD1><FIELD2>AAAA
01/BBBB</FIELD2></TBL_A>

Any suggestions on how I could avoid the error but without removing the line breaks ?

I would greatly appreciate your advice.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Can you confirm for us what the first stage in your job is - sequential file stage? If so, suggest you replace it with a Folder stage (ideally just passing in the filename and letting the XML Input stage do the reading) and see if the problem persists.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

You can process the entire content, or just the name of the file.....the Folder Stage has a "built-in" Table Definition..... notice how it contains the filename and the "record"...this is the actual content of the whole file......

...then, in the xmlInput Stage, you check whether your column contains content or just a URL.

Either way, the CRLFs are ignored, by design, as xml requires.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

True dat. No clue what the limit is any more but back in the day when we were processing "large" XML files, it seemed better to just pass the URL to the XML Input stage and let it do all of the work. From what I recall. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Yes...true...and largely a factor of the downstream Stage. If you are using a Server Job, the Folder Stage can probably "lift" a bigger XML document than the xmlInput Stage can handle. The xmlInput Stage is usually good up till 200 megabytes or so..... The Hierarchical Stage can read dramatically larger documents, but comes with a price....it requires an xsd and has a more complex learning curve. For small documents that are largely transactional, and when you are just reading them, it's almost always better to just use xmlInput.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That same number is in the back of my mind... 200 to 250MB was our limit on our HPUX system, I do believe. Thankfully, our sources generally were of the mind to flood us with a metric crap-ton of small files. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply